Skip to content

GH-49620: [Ruby] Add support for custom metadata in Message#49621

Merged
kou merged 1 commit intoapache:mainfrom
kou:ruby-custom-metadata-message
Apr 1, 2026
Merged

GH-49620: [Ruby] Add support for custom metadata in Message#49621
kou merged 1 commit intoapache:mainfrom
kou:ruby-custom-metadata-message

Conversation

@kou
Copy link
Copy Markdown
Member

@kou kou commented Mar 31, 2026

Rationale for this change

Message can attach custom metadata:

arrow/format/Message.fbs

Lines 152 to 157 in 61ef672

table Message {
version: org.apache.arrow.flatbuf.MetadataVersion;
header: MessageHeader;
bodyLength: long;
custom_metadata: [ KeyValue ];
}

What changes are included in this PR?

  • Add ArrowFormat::Dictionary for dictionary
  • Add support for reading custom metadata in Message and attaching them to ArrowFormat::Schema, ArrowFormat::RecordBatch and ArrowFormat::Dictionary
  • Add support for writing custom metadata in ArrowFormat::Schema, ArrowFormat::RecordBatch and ArrowFormat::Dictionary as Message's custom metadata

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.

Copilot AI review requested due to automatic review settings March 31, 2026 05:16
@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #49620 has been automatically assigned in GitHub to PR creator.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Arrow IPC Message custom-metadata support to the Ruby red-arrow-format implementation, including propagation onto Schema, RecordBatch, and dictionary batches.

Changes:

  • Introduces ArrowFormat::Dictionary to carry dictionary batch data plus per-message metadata.
  • Adds read/write support for Flatbuffers Message.custom_metadata and exposes it as message_metadata on Schema/RecordBatch/Dictionary.
  • Refactors Flatbuffers custom-metadata serialization into a shared helper (FB.build_custom_metadata) and updates readers/writers + tests accordingly.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
ruby/red-arrow-format/test/test-writer.rb Updates dictionary handling for new Dictionary wrapper and adds message-metadata tests for schema/record batch/dictionaries.
ruby/red-arrow-format/lib/arrow-format/streaming-writer.rb Writes Message.custom_metadata for schema/record batch/dictionary messages; updates dictionary writing to use Dictionary#array.
ruby/red-arrow-format/lib/arrow-format/streaming-pull-reader.rb Reads Message.custom_metadata and wraps dictionary batches as ArrowFormat::Dictionary with message_metadata.
ruby/red-arrow-format/lib/arrow-format/schema.rb Adds message_metadata to Schema; uses shared custom-metadata builder for schema metadata serialization.
ruby/red-arrow-format/lib/arrow-format/record-batch.rb Adds message_metadata to RecordBatch.
ruby/red-arrow-format/lib/arrow-format/readable.rb Threads message custom-metadata through schema/record batch parsing.
ruby/red-arrow-format/lib/arrow-format/flatbuffers.rb Adds FB.build_custom_metadata helper for KeyValue vector construction.
ruby/red-arrow-format/lib/arrow-format/file-writer.rb Switches footer custom-metadata construction to shared helper.
ruby/red-arrow-format/lib/arrow-format/file-reader.rb Passes Message.custom_metadata into record batch reads; wraps dictionary batches as Dictionary with message_metadata.
ruby/red-arrow-format/lib/arrow-format/field.rb Switches field custom-metadata construction to shared helper.
ruby/red-arrow-format/lib/arrow-format/dictionary.rb New Dictionary wrapper class for dictionary batch data + message metadata.
ruby/red-arrow-format/lib/arrow-format/array.rb Updates DictionaryArray#to_a to work with Dictionary objects (dictionary.array).
Comments suppressed due to low confidence (1)

ruby/red-arrow-format/test/test-writer.rb:201

  • ArrowFormat::Dictionary is created with a hard-coded id of 0, but the corresponding ArrowFormat::DictionaryType has its own id (type.id). Using a mismatched id can make dictionary batches ambiguous and will become a correctness issue if/when the writer/reader starts validating dictionary ids. Consider using type.id here instead of 0.
    when ArrowFormat::DictionaryType
      validity_buffer = convert_buffer(red_arrow_array.null_bitmap)
      indices_buffer = convert_buffer(red_arrow_array.indices.data_buffer)
      dictionary_array = convert_array(red_arrow_array.dictionary)
      dictionary = ArrowFormat::Dictionary.new(0, dictionary_array)
      type.build_array(red_arrow_array.size,
                       validity_buffer,
                       indices_buffer,
                       [dictionary])

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +37 to +44
def read_schema(fb_schema, fb_message_custom_metadata=nil)
fields = fb_schema.fields.collect do |fb_field|
read_field(fb_field)
end
message_metadta = read_custom_metadata(fb_message_custom_metadata)
Schema.new(fields,
metadata: read_custom_metadata(fb_schema.custom_metadata))
metadata: read_custom_metadata(fb_schema.custom_metadata),
message_metadata: message_metadta)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In read_schema, the local variable is misspelled as message_metadta. Even though it currently works, the typo makes the code harder to read and easy to propagate; rename it to message_metadata for consistency with the keyword argument and other usages.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Apache license header in this new file appears to be truncated/missing the standard opening line (e.g., "Licensed to the Apache Software Foundation (ASF) under one..."). Please update it to match the canonical header used in the other files in this directory to avoid license/compliance issues.

Copilot uses AI. Check for mistakes.
red_arrow_value_type.build_array(raw_dictionary_more)
dictionary_more = convert_array(red_arrow_dictionary_more)
dictionary_array_more = convert_array(red_arrow_dictionary_more)
dictionary_more = ArrowFormat::Dictionary.new(dictionary_id + 1,
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the non-chunked_dictionaries? branch, dictionary_more is constructed with dictionary_id + 1 even though the field's DictionaryType was created with dictionary_id (1). The dictionary batch id should match the schema's dictionary encoding id; consider using dictionary_id here to keep the test data consistent with the Arrow IPC model.

Suggested change
dictionary_more = ArrowFormat::Dictionary.new(dictionary_id + 1,
dictionary_more = ArrowFormat::Dictionary.new(dictionary_id,

Copilot uses AI. Check for mistakes.
@kou kou force-pushed the ruby-custom-metadata-message branch from ac752e8 to 41ed69c Compare March 31, 2026 05:55
@kou
Copy link
Copy Markdown
Member Author

kou commented Apr 1, 2026

+1

@kou kou merged commit 9273467 into apache:main Apr 1, 2026
12 of 13 checks passed
@kou kou removed the awaiting committer review Awaiting committer review label Apr 1, 2026
@kou kou deleted the ruby-custom-metadata-message branch April 1, 2026 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants