Background
DataFusion 54 introduced first-class higher-order functions (HOFs) that take lambdas plus collection arguments and rewrite to a scalar Expr at planning time. PR #1561 exposed Python lambda syntax and the built-in HOFs (array_transform, array_filter, array_any_match, etc.) but did not expose the registration API for custom HOFs.
Upstream signature
pub fn register_higher_order_function(&self, func: Arc<dyn HigherOrderFunctionImpl>)
pub fn deregister_higher_order_function(&self, name: &str) -> Option<Arc<dyn HigherOrderFunctionImpl>>
HigherOrderFunctionImpl trait requires name(), args_count() (min, max), and invoke(&self, args: &[Expr]) -> Result<Expr>.
User value
Lets library authors add lambda-aware operators that are not in DataFusion's built-in set. Examples: array_window(arr, size, x -> ...), array_partition_by(arr, x -> key(x)), array_zip_with(a, b, (x, y) -> x * y), or domain-specific operators on JSON / graph / geo arrays. The HOF runs at logical-plan time, rewriting lambda + args into a standard Expr tree the planner optimizes -- not equivalent to a ScalarUDF.
Why deferred
Effort estimate is medium-large (~350-550 LOC) and the authoring cost is high for end users: the invoke callback must return a DataFusion Expr tree from Python, which requires familiarity with the Expr grammar. Most array-lambda needs are already covered by the built-in HOFs PR #1561 ships. No open user requests at the time of audit. Filed for tracking to complete the v54 HOF surface symmetrically once a concrete use case or extension-library ecosystem emerges.
Background
DataFusion 54 introduced first-class higher-order functions (HOFs) that take lambdas plus collection arguments and rewrite to a scalar
Exprat planning time. PR #1561 exposed Python lambda syntax and the built-in HOFs (array_transform,array_filter,array_any_match, etc.) but did not expose the registration API for custom HOFs.Upstream signature
HigherOrderFunctionImpltrait requiresname(),args_count()(min, max), andinvoke(&self, args: &[Expr]) -> Result<Expr>.User value
Lets library authors add lambda-aware operators that are not in DataFusion's built-in set. Examples:
array_window(arr, size, x -> ...),array_partition_by(arr, x -> key(x)),array_zip_with(a, b, (x, y) -> x * y), or domain-specific operators on JSON / graph / geo arrays. The HOF runs at logical-plan time, rewriting lambda + args into a standard Expr tree the planner optimizes -- not equivalent to a ScalarUDF.Why deferred
Effort estimate is medium-large (~350-550 LOC) and the authoring cost is high for end users: the
invokecallback must return a DataFusionExprtree from Python, which requires familiarity with the Expr grammar. Most array-lambda needs are already covered by the built-in HOFs PR #1561 ships. No open user requests at the time of audit. Filed for tracking to complete the v54 HOF surface symmetrically once a concrete use case or extension-library ecosystem emerges.