Problem
The <class name="EstimatorSpec"> block in com.ibm.wala.cast.python.ml/data/tensorflow.xml (line 952 in master) models the constructor as:
<class name="EstimatorSpec" allocatable="true">
<method name="do" descriptor="()LRoot;" numArgs="7" paramNames="self mode predictions loss train_op eval_metric_ops export_outputs">
<return value="eval_metric_ops" />
</method>
</class>
Two semantic gaps:
-
No allocation. There's no <new def="res" class="L.../EstimatorSpec"/>. The constructor doesn't model that calling tf.estimator.EstimatorSpec(...) produces a fresh EstimatorSpec object — so points-to analysis won't see a distinct allocation site for the spec.
-
No field sets. None of the 6 constructor parameters (mode, predictions, loss, train_op, eval_metric_ops, export_outputs) are stored as fields on the result. So downstream field-lookup analyses for spec.loss, spec.predictions, spec.train_op, etc. won't resolve.
-
Result aliases an input. The body returns eval_metric_ops directly, so analysis sees tf.estimator.EstimatorSpec(...) as identical to the eval_metric_ops argument. Mutations or reads on either silently affect both.
Why it matters
EstimatorSpec is the central return type of a TensorFlow model_fn (https://www.tensorflow.org/api_docs/python/tf/estimator/EstimatorSpec). Any analysis that follows estimator workflows — tracking what the model produces as predictions, what loss tensor is computed, which metrics are evaluated — depends on the field sets being modeled. The current "return one of the inputs" body silently degrades all such analysis.
This was discovered during downstream tensorflow.xml reconciliation work — Hybridize's copy has the byte-identical body, so the convergence faithfully preserves the gap. Same flavor as #427 (tf.equal returning input dtype instead of bool): a pre-existing semantic shortcut in the modeling that should be fixed independently of the broader port (#428).
Proposed fix
<class name="EstimatorSpec" allocatable="true">
<method name="do" descriptor="()LRoot;" numArgs="7" paramNames="self mode predictions loss train_op eval_metric_ops export_outputs">
<new def="res" class="Ltensorflow/estimator/EstimatorSpec" />
<putfield class="LRoot" field="mode" fieldType="LRoot" ref="res" value="mode" />
<putfield class="LRoot" field="predictions" fieldType="LRoot" ref="res" value="predictions" />
<putfield class="LRoot" field="loss" fieldType="LRoot" ref="res" value="loss" />
<putfield class="LRoot" field="train_op" fieldType="LRoot" ref="res" value="train_op" />
<putfield class="LRoot" field="eval_metric_ops" fieldType="LRoot" ref="res" value="eval_metric_ops" />
<putfield class="LRoot" field="export_outputs" fieldType="LRoot" ref="res" value="export_outputs" />
<return value="res" />
</method>
</class>
(Path Ltensorflow/estimator/EstimatorSpec should match whatever package the class block lives in — verify against the top-level <new def="EstimatorSpec" class="..."/> binding before committing.)
Related
Problem
The
<class name="EstimatorSpec">block incom.ibm.wala.cast.python.ml/data/tensorflow.xml(line 952 in master) models the constructor as:Two semantic gaps:
No allocation. There's no
<new def="res" class="L.../EstimatorSpec"/>. The constructor doesn't model that callingtf.estimator.EstimatorSpec(...)produces a freshEstimatorSpecobject — so points-to analysis won't see a distinct allocation site for the spec.No field sets. None of the 6 constructor parameters (
mode,predictions,loss,train_op,eval_metric_ops,export_outputs) are stored as fields on the result. So downstream field-lookup analyses forspec.loss,spec.predictions,spec.train_op, etc. won't resolve.Result aliases an input. The body returns
eval_metric_opsdirectly, so analysis seestf.estimator.EstimatorSpec(...)as identical to theeval_metric_opsargument. Mutations or reads on either silently affect both.Why it matters
EstimatorSpecis the central return type of a TensorFlowmodel_fn(https://www.tensorflow.org/api_docs/python/tf/estimator/EstimatorSpec). Any analysis that follows estimator workflows — tracking what the model produces as predictions, what loss tensor is computed, which metrics are evaluated — depends on the field sets being modeled. The current "return one of the inputs" body silently degrades all such analysis.This was discovered during downstream
tensorflow.xmlreconciliation work — Hybridize's copy has the byte-identical body, so the convergence faithfully preserves the gap. Same flavor as #427 (tf.equalreturning input dtype instead of bool): a pre-existing semantic shortcut in the modeling that should be fixed independently of the broader port (#428).Proposed fix
(Path
Ltensorflow/estimator/EstimatorSpecshould match whatever package the class block lives in — verify against the top-level<new def="EstimatorSpec" class="..."/>binding before committing.)Related
tf.equal,tf.not_equal,tf.less, ...) inherit wrong dtype fromElementWiseOperation#427 — same shape (XML semantically wrong; body returns one of the inputs / wrong dtype).tensorflow.xmlextensions into Ariadne master #428 — parent port issue. This isn't gated by the port, but adopting the fix here lands the corrected modeling on master before the consumer-side reconciliation re-syncs.read_data/read_datasetmarker allocations intensorflow.xml#380 — separateread_datamigration; doesn't apply here sinceEstimatorSpec's body has noread_datato begin with.