Mark json_union_to_text output as the canonical Arrow JSON extension type#115
Merged
adriangb merged 2 commits intoJun 2, 2026
Merged
Conversation
…type datafusion-contrib#114 added the UDF but only overrode `return_type`, which returns a bare `DataType` with no field metadata, so the `Utf8View` output carried no extension type. Implement `return_field_from_args` to return a field tagged via `json_field_metadata()` (`ARROW:extension:name = arrow.json`), matching `json_get_json` / `json_get_array`, so the JSON text output is recognized as JSON downstream. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds integration coverage running json_union_to_text(json_get(...)) through a RecordBatch: the str / array (list member) / object / null union arms, nested array/object passthrough, string escaping, and numeric/bool literals — and asserts the output column is tagged with the canonical Arrow JSON extension (ARROW:extension:name = arrow.json). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
Added end-to-end batch tests: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #114. That PR added
json_union_to_textbut only overrodereturn_type, which returns a bareDataType— so theUtf8Viewoutput carried no field metadata and wasn't recognized as JSON.This implements
return_field_from_argsto return a field tagged viajson_field_metadata()(ARROW:extension:name = arrow.json,ARROW:extension:metadata = {}, legacyis_json), exactly asjson_get_json/json_get_arraydo. The function's output (canonical JSON text) is now a proper canonical Arrow JSON extension field.Added a test asserting the output field's
ARROW:extension:name == "arrow.json".Tested:
cargo test --lib json_union_to_text,cargo clippy --all-targets(pedantic),cargo fmt --check.