Conversation
|
|
||
| - The Scan API is not itself a full relational query engine. | ||
| - `LayoutReader` should not grow unknown-cardinality operator semantics. | ||
| - Vortex should not require a specific Rust async runtime such as Tokio. |
There was a problem hiding this comment.
It's weird that we would go through all of this and still assume Tokio but I haven't read all of it yet
There was a problem hiding this comment.
We already don't assume tokio, it just continues to be an explicit goal
There was a problem hiding this comment.
The double negation here implies the opposite? You want the goal to be that the runtime doesn't assume tokio? maybe I am reading too much into random ai generated strings
|
|
||
| - the host may provide a CPU scheduler | ||
| - Vortex may use it for bounded split-local CPU work | ||
| - Vortex must not assume ownership of the whole query runtime |
There was a problem hiding this comment.
What does this statement mean in practice? I think there's intent behind it but I fail to understand what this means in practice?
There was a problem hiding this comment.
I'd guess in particular in terms of use of resources, e.g. spawning threads but also unix process ownership e.g. Vortex should never crash the host. @gatesn correct me if you had sth else in mind.
There was a problem hiding this comment.
Vortex should never crash the host.
Error handling might deserve a small section in this PR. I briefly talked about this with @myrrc but I think we'll need a panic handler (the host maybe can configure) to prevent that we never crash a host.
| - split lookahead policy | ||
| - efficient materialization of output batches | ||
|
|
||
| ### What `Partitioning` Means |
There was a problem hiding this comment.
Words are hard, partitioning usually means some arrangement of data which this is not about. But maybe this is Partitioning and the other thing is Arrangament
|
|
||
| Correctness is more important than maximal pushdown. | ||
|
|
||
| ## Ordering, Limits, and Future Dynamic Filters |
There was a problem hiding this comment.
You should mention Partitioning here (or a I redefined it Arrangement). It's a super set of ordering
0ax1
left a comment
There was a problem hiding this comment.
Just a thought, maybe worthwhile clauding some ascii diagrams to illustrate some of the aspects.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Nicholas Gates <gatesn@users.noreply.github.com>
|
|
||
| This layer belongs in `vortex-scan`. | ||
|
|
||
| ### 3. Operator Layer |
There was a problem hiding this comment.
This seems like the only place where this part is called an "operator", maybe "Host Layer" is more consistent with the rest of the doc?
|
|
||
| The same Scan API should support all of these, but not all with the same level of host control. | ||
|
|
||
| ### DataFusion |
There was a problem hiding this comment.
We currently have 2 DataFusion codepath's, I think its worth expanding on the difference here
| - moderate | ||
| - simpler than Velox | ||
|
|
||
| ## Why `Partition` and `Split` Both Exist |
There was a problem hiding this comment.
just editing wise, if its this important should it be earlier on?
|
|
||
| That keeps the current implementation direction intact. | ||
|
|
||
| ## `VortexTableScan` |
There was a problem hiding this comment.
What is this? Aside from "a scan of a vortex table"?
| ### `Partition` | ||
|
|
||
| A `Partition` is the unit of work exposed by the Scan API to a host engine. |
There was a problem hiding this comment.
Is a partition completely opaque? What does it return?
|
|
||
| This lines up with: | ||
|
|
||
| - DataFusion task-per-partition execution |
There was a problem hiding this comment.
The DataFusion FileSource API is moving towards a lower level morsel-like API, where partitions return a collection of smaller splits that can end up being stolen by other paritions.
| pub struct ScanBudget { | ||
| pub max_active_splits: usize, | ||
| pub max_prefetch_splits: usize, | ||
| pub max_prefetch_bytes: u64, | ||
| pub max_inflight_reads: usize, | ||
| pub max_buffered_batches: usize, | ||
| } |
There was a problem hiding this comment.
If we expose lower level pieces, why do we need a budget? This is 5 extra knobs that I think could be more clearly be part of the API, allowing each host to navigate it in whichever way makes sense for that system/language/paradigm.
| pub offset: u64, | ||
| pub len: u64, | ||
| pub alignment: usize, | ||
| pub priority: IoPriority, |
There was a problem hiding this comment.
What's priority here? How is that different than intent ?IDK how IO looks in all the systems we're looking at, I think ideally the API should only include things we surely know how to use.
|
|
||
| This keeps the public API stable while allowing Vortex to remain storage-aware internally. | ||
|
|
||
| ## Pushdown and Host Functions |
There was a problem hiding this comment.
This seems like a key issue here, might be worth pulling up before the section about specific engines, so its more natural to expand on the differences in each one
| ### Predicate Model | ||
|
|
||
| The scan request should not treat all filters as equally pushdown-safe. | ||
|
|
||
| Conceptually: | ||
|
|
||
| ```rust | ||
| pub struct Predicate { | ||
| pub exact: Option<Expression>, | ||
| pub residual: Option<Expression>, | ||
| } | ||
| ``` | ||
|
|
||
| Meaning: | ||
|
|
||
| - `exact`: safe for Vortex to apply fully | ||
| - `residual`: must still be evaluated by the host after scan | ||
|
|
||
| In practice an engine adapter may derive these from its own expression IR. |
There was a problem hiding this comment.
Is this part of the interface or just a general comment about predicate pushdown?
| Host-specific functions should be wrapped as registered Vortex functions with extra semantic | ||
| metadata. | ||
|
|
||
| ```rust |
There was a problem hiding this comment.
Is this per-function or per-host? Do we expect host implementation to wrap every function we want to push down?
| ### Probabilistic Prefetch | ||
|
|
||
| The source should be free to rank candidate splits with a probabilistic score, for example: | ||
|
|
||
| `priority ~= P(split still needed) * stall_saved / resource_cost` | ||
|
|
||
| Where: | ||
|
|
||
| - `P(split still needed)` depends on selectivity and limits, and may later incorporate | ||
| late-arriving filter information if the Scan API grows that extension | ||
| - `stall_saved` estimates the latency hidden by early I/O | ||
| - `resource_cost` estimates bytes, memory pressure, and decode work | ||
|
|
||
| This is not a host-engine concern. It is a scan-source concern, constrained by host budgets. |
There was a problem hiding this comment.
Can a host reject a prefetch request?
This RFC looks at how we can expose deeper integration with query engine internals like scheduling, threading models, buffer pools, and so on