GH-49620: [Ruby] Add support for custom metadata in Message#49621
GH-49620: [Ruby] Add support for custom metadata in Message#49621kou merged 1 commit intoapache:mainfrom
Conversation
|
|
There was a problem hiding this comment.
Pull request overview
Adds Arrow IPC Message custom-metadata support to the Ruby red-arrow-format implementation, including propagation onto Schema, RecordBatch, and dictionary batches.
Changes:
- Introduces
ArrowFormat::Dictionaryto carry dictionary batch data plus per-message metadata. - Adds read/write support for Flatbuffers
Message.custom_metadataand exposes it asmessage_metadataonSchema/RecordBatch/Dictionary. - Refactors Flatbuffers custom-metadata serialization into a shared helper (
FB.build_custom_metadata) and updates readers/writers + tests accordingly.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| ruby/red-arrow-format/test/test-writer.rb | Updates dictionary handling for new Dictionary wrapper and adds message-metadata tests for schema/record batch/dictionaries. |
| ruby/red-arrow-format/lib/arrow-format/streaming-writer.rb | Writes Message.custom_metadata for schema/record batch/dictionary messages; updates dictionary writing to use Dictionary#array. |
| ruby/red-arrow-format/lib/arrow-format/streaming-pull-reader.rb | Reads Message.custom_metadata and wraps dictionary batches as ArrowFormat::Dictionary with message_metadata. |
| ruby/red-arrow-format/lib/arrow-format/schema.rb | Adds message_metadata to Schema; uses shared custom-metadata builder for schema metadata serialization. |
| ruby/red-arrow-format/lib/arrow-format/record-batch.rb | Adds message_metadata to RecordBatch. |
| ruby/red-arrow-format/lib/arrow-format/readable.rb | Threads message custom-metadata through schema/record batch parsing. |
| ruby/red-arrow-format/lib/arrow-format/flatbuffers.rb | Adds FB.build_custom_metadata helper for KeyValue vector construction. |
| ruby/red-arrow-format/lib/arrow-format/file-writer.rb | Switches footer custom-metadata construction to shared helper. |
| ruby/red-arrow-format/lib/arrow-format/file-reader.rb | Passes Message.custom_metadata into record batch reads; wraps dictionary batches as Dictionary with message_metadata. |
| ruby/red-arrow-format/lib/arrow-format/field.rb | Switches field custom-metadata construction to shared helper. |
| ruby/red-arrow-format/lib/arrow-format/dictionary.rb | New Dictionary wrapper class for dictionary batch data + message metadata. |
| ruby/red-arrow-format/lib/arrow-format/array.rb | Updates DictionaryArray#to_a to work with Dictionary objects (dictionary.array). |
Comments suppressed due to low confidence (1)
ruby/red-arrow-format/test/test-writer.rb:201
ArrowFormat::Dictionaryis created with a hard-coded id of0, but the correspondingArrowFormat::DictionaryTypehas its ownid(type.id). Using a mismatched id can make dictionary batches ambiguous and will become a correctness issue if/when the writer/reader starts validating dictionary ids. Consider usingtype.idhere instead of0.
when ArrowFormat::DictionaryType
validity_buffer = convert_buffer(red_arrow_array.null_bitmap)
indices_buffer = convert_buffer(red_arrow_array.indices.data_buffer)
dictionary_array = convert_array(red_arrow_array.dictionary)
dictionary = ArrowFormat::Dictionary.new(0, dictionary_array)
type.build_array(red_arrow_array.size,
validity_buffer,
indices_buffer,
[dictionary])
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def read_schema(fb_schema, fb_message_custom_metadata=nil) | ||
| fields = fb_schema.fields.collect do |fb_field| | ||
| read_field(fb_field) | ||
| end | ||
| message_metadta = read_custom_metadata(fb_message_custom_metadata) | ||
| Schema.new(fields, | ||
| metadata: read_custom_metadata(fb_schema.custom_metadata)) | ||
| metadata: read_custom_metadata(fb_schema.custom_metadata), | ||
| message_metadata: message_metadta) |
There was a problem hiding this comment.
In read_schema, the local variable is misspelled as message_metadta. Even though it currently works, the typo makes the code harder to read and easy to propagate; rename it to message_metadata for consistency with the keyword argument and other usages.
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance |
There was a problem hiding this comment.
The Apache license header in this new file appears to be truncated/missing the standard opening line (e.g., "Licensed to the Apache Software Foundation (ASF) under one..."). Please update it to match the canonical header used in the other files in this directory to avoid license/compliance issues.
| red_arrow_value_type.build_array(raw_dictionary_more) | ||
| dictionary_more = convert_array(red_arrow_dictionary_more) | ||
| dictionary_array_more = convert_array(red_arrow_dictionary_more) | ||
| dictionary_more = ArrowFormat::Dictionary.new(dictionary_id + 1, |
There was a problem hiding this comment.
In the non-chunked_dictionaries? branch, dictionary_more is constructed with dictionary_id + 1 even though the field's DictionaryType was created with dictionary_id (1). The dictionary batch id should match the schema's dictionary encoding id; consider using dictionary_id here to keep the test data consistent with the Arrow IPC model.
| dictionary_more = ArrowFormat::Dictionary.new(dictionary_id + 1, | |
| dictionary_more = ArrowFormat::Dictionary.new(dictionary_id, |
ac752e8 to
41ed69c
Compare
|
+1 |
Rationale for this change
Messagecan attach custom metadata:arrow/format/Message.fbs
Lines 152 to 157 in 61ef672
What changes are included in this PR?
ArrowFormat::Dictionaryfor dictionaryMessageand attaching them toArrowFormat::Schema,ArrowFormat::RecordBatchandArrowFormat::DictionaryArrowFormat::Schema,ArrowFormat::RecordBatchandArrowFormat::DictionaryasMessage's custom metadataAre these changes tested?
Yes.
Are there any user-facing changes?
Yes.