Skip to content

EstimatorSpec constructor: no allocation, no field sets, returns input directly #429

@khatchad

Description

@khatchad

Problem

The <class name="EstimatorSpec"> block in com.ibm.wala.cast.python.ml/data/tensorflow.xml (line 952 in master) models the constructor as:

<class name="EstimatorSpec" allocatable="true">
  <method name="do" descriptor="()LRoot;" numArgs="7" paramNames="self mode predictions loss train_op eval_metric_ops export_outputs">
    <return value="eval_metric_ops" />
  </method>
</class>

Two semantic gaps:

  1. No allocation. There's no <new def="res" class="L.../EstimatorSpec"/>. The constructor doesn't model that calling tf.estimator.EstimatorSpec(...) produces a fresh EstimatorSpec object — so points-to analysis won't see a distinct allocation site for the spec.

  2. No field sets. None of the 6 constructor parameters (mode, predictions, loss, train_op, eval_metric_ops, export_outputs) are stored as fields on the result. So downstream field-lookup analyses for spec.loss, spec.predictions, spec.train_op, etc. won't resolve.

  3. Result aliases an input. The body returns eval_metric_ops directly, so analysis sees tf.estimator.EstimatorSpec(...) as identical to the eval_metric_ops argument. Mutations or reads on either silently affect both.

Why it matters

EstimatorSpec is the central return type of a TensorFlow model_fn (https://www.tensorflow.org/api_docs/python/tf/estimator/EstimatorSpec). Any analysis that follows estimator workflows — tracking what the model produces as predictions, what loss tensor is computed, which metrics are evaluated — depends on the field sets being modeled. The current "return one of the inputs" body silently degrades all such analysis.

This was discovered during downstream tensorflow.xml reconciliation work — Hybridize's copy has the byte-identical body, so the convergence faithfully preserves the gap. Same flavor as #427 (tf.equal returning input dtype instead of bool): a pre-existing semantic shortcut in the modeling that should be fixed independently of the broader port (#428).

Proposed fix

<class name="EstimatorSpec" allocatable="true">
  <method name="do" descriptor="()LRoot;" numArgs="7" paramNames="self mode predictions loss train_op eval_metric_ops export_outputs">
    <new def="res" class="Ltensorflow/estimator/EstimatorSpec" />
    <putfield class="LRoot" field="mode" fieldType="LRoot" ref="res" value="mode" />
    <putfield class="LRoot" field="predictions" fieldType="LRoot" ref="res" value="predictions" />
    <putfield class="LRoot" field="loss" fieldType="LRoot" ref="res" value="loss" />
    <putfield class="LRoot" field="train_op" fieldType="LRoot" ref="res" value="train_op" />
    <putfield class="LRoot" field="eval_metric_ops" fieldType="LRoot" ref="res" value="eval_metric_ops" />
    <putfield class="LRoot" field="export_outputs" fieldType="LRoot" ref="res" value="export_outputs" />
    <return value="res" />
  </method>
</class>

(Path Ltensorflow/estimator/EstimatorSpec should match whatever package the class block lives in — verify against the top-level <new def="EstimatorSpec" class="..."/> binding before committing.)

Related

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions