Skip to content

Mark json_union_to_text output as the canonical Arrow JSON extension type#115

Merged
adriangb merged 2 commits into
datafusion-contrib:mainfrom
pydantic:mark-json-union-output
Jun 2, 2026
Merged

Mark json_union_to_text output as the canonical Arrow JSON extension type#115
adriangb merged 2 commits into
datafusion-contrib:mainfrom
pydantic:mark-json-union-output

Conversation

@adriangb

@adriangb adriangb commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Follow-up to #114. That PR added json_union_to_text but only overrode return_type, which returns a bare DataType — so the Utf8View output carried no field metadata and wasn't recognized as JSON.

This implements return_field_from_args to return a field tagged via json_field_metadata() (ARROW:extension:name = arrow.json, ARROW:extension:metadata = {}, legacy is_json), exactly as json_get_json / json_get_array do. The function's output (canonical JSON text) is now a proper canonical Arrow JSON extension field.

Added a test asserting the output field's ARROW:extension:name == "arrow.json".

Tested: cargo test --lib json_union_to_text, cargo clippy --all-targets (pedantic), cargo fmt --check.

adriangb and others added 2 commits June 2, 2026 07:04
…type

datafusion-contrib#114 added the UDF but only overrode `return_type`, which returns a bare
`DataType` with no field metadata, so the `Utf8View` output carried no extension
type. Implement `return_field_from_args` to return a field tagged via
`json_field_metadata()` (`ARROW:extension:name = arrow.json`), matching
`json_get_json` / `json_get_array`, so the JSON text output is recognized as
JSON downstream.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds integration coverage running json_union_to_text(json_get(...)) through a
RecordBatch: the str / array (list member) / object / null union arms, nested
array/object passthrough, string escaping, and numeric/bool literals — and
asserts the output column is tagged with the canonical Arrow JSON extension
(ARROW:extension:name = arrow.json).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@adriangb

adriangb commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

Added end-to-end batch tests: json_union_to_text(json_get(...)) run through a RecordBatch covering the str / array (list member) / object / null union arms, nested array+object passthrough, string escaping, and number/bool literals — plus an assertion that the output column's schema carries ARROW:extension:name = arrow.json (i.e. the marking survives through the real query/projection path, not just return_field_from_args in isolation).

@adriangb adriangb merged commit 62208b0 into datafusion-contrib:main Jun 2, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant