Skip to content

Design issues in TensorGeneratorFactory.getFunction() and dispatchByPropertyName exposed by #437 #448

@khatchad

Description

@khatchad

Summary

While investigating #437, three design issues in TensorGeneratorFactory.getFunction() and the surrounding dispatch chain surfaced. They're related — together they make resolution failures hard to recognize and hard to fix targeted-ly, and they were the proximate cause of #437 slipping through Ariadne's CI.

(1) getFunction() Silently Returns The Declared Class On Resolution Failure

getFunction() handles the LCodeBody/LRoot generic-dispatch case by inspecting the points-to set on the call's use(0):

if (declaredClass.getName().toString().equals("LCodeBody")
    || declaredClass.getName().toString().equals("LRoot")) {
  int funcVn = call.getUse(0);
  PointerKey funcKey = ...
  for (InstanceKey ik : pointerAnalysis.getPointsToSet(funcKey)) {
    if (ik instanceof ConcreteTypeKey)    return ((ConcreteTypeKey) ik).getType().getReference();
    if (ik instanceof AllocationSiteInNode) return ((AllocationSiteInNode) ik).getConcreteType().getReference();
  }
}
return declaredClass;  // ← silent fallthrough

When the loop yields zero matching keys (which happens when the function-object is a PythonPropertyRead with an empty PA points-to set), the method silently returns the declared LCodeBody/LRoot class. The dispatch table at line 888+ then sees this generic class and throws IllegalArgumentException("Unknown call: LCodeBody") — caught silently in processInstruction with no diagnostic.

Net effect: resolution failures look identical to "genuinely a generic call" to the caller, and the source is silently dropped from the analysis.

Suggested fix: change the return type to Optional<TypeReference> (or throw a typed UnresolvedCallTargetException), and force the caller to handle the unresolved case explicitly. That gives us a place to plug in fallbacks (per #437 / per (2) below).

(2) Two Parallel Dispatch Mechanisms That Aren't Bridged

dispatchByPropertyName() implements property-name dispatch via the PROPERTY_NAME_GENERATORS registry, which currently has one entry: astype.

Meanwhile, the class-hierarchy dispatch (line 888+) has 50+ entries keyed off MethodReference.getDeclaringClass().

For property-read invocations like tf.rank(...) — which are exactly the case dispatchByPropertyName is designed for — the property name is correctly resolved to a ConstantKey:rank but PROPERTY_NAME_GENERATORS has no entry, so dispatch falls through. There's no bridge between the two registries: an XML class registered for the class-hierarchy dispatch is not automatically reachable via property-name dispatch.

Suggested fix: unify the two dispatch paths. Either (a) the property-name registry is auto-populated from the XML class registry (so any class in the dispatch table is reachable both ways), or (b) dispatchByPropertyName falls back to the class-hierarchy dispatch when the property name corresponds to a registered class.

Note that XML-registered helper methods (like read_data) appear to live in a separate registry from getDeclaredMethods() — so a naive class-hierarchy scan for read_data doesn't work. They become trampoline classes nested under the parent (per the /class/-namespace pattern). Whatever the bridge looks like, it has to query the right registry.

(3) No Diagnostic Signal On Resolution Failure

Related to (1): when the PA loop in getFunction's LCodeBody/LRoot branch yields zero matching InstanceKey types, the loop just exits. No log, no diagnostic, no signal that resolution failed. The caller sees IllegalArgumentException("Unknown call: LCodeBody") thrown later (sometimes much later, through several recursive walk paths) and has to reverse-engineer what went wrong.

Suggested fix: log at INFO (or at least FINE) when:

  • the LCodeBody/LRoot resolver loop exits empty
  • the points-to set on the function-object reference is empty (independent of loop result)
  • the InstanceKey types in the points-to set are all unknown to the existing if-cases

The information is cheap to log and would have shaved hours off the #437 investigation.

Why This Matters Now

#437's root cause is exactly the (1)+(2) combination: tf.rank(...) is a property-read whose function-object has empty PA — getFunction silently returns LCodeBody, dispatch fails, source is dropped. Fixing those design issues unblocks #437 (and similar future cases) at the right layer instead of papering over them per-op.

Investigation logs and a reproducer test are on branch investigate/437-rank-classification (commit 99161500).

Workaround Attempts For #437

A dispatchByPropertyName fallback was attempted (the design (2) angle): scan the class hierarchy for trampoline classes ending in /<propertyName>/read_data, dispatch as Constant. Recovers 10 of #437's 33 tests when applied to Hybridize, but is too aggressive — it shadows specific generators (Eye, Stack, etc.) that already have dispatch-table entries, causing 100+ regressions on Ariadne's own test suite.

The right fix needs (1)'s explicit-resolution-failure signaling so the fallback only fires when the dispatch table genuinely has nothing — i.e., (1) is a prerequisite for (2)'s fix. That's why this issue exists.

Cross-Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions