Skip to content

added ai2thor enviroment with a method that proposes steps and uses C…#280

Open
Vman11 wants to merge 1 commit into
mainfrom
ai2thor
Open

added ai2thor enviroment with a method that proposes steps and uses C…#280
Vman11 wants to merge 1 commit into
mainfrom
ai2thor

Conversation

@Vman11

@Vman11 Vman11 commented May 28, 2026

Copy link
Copy Markdown

…R to choose steps

@dmjoy dmjoy left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is in the right direction, but there are a few things I would like to see changed (see comments) or at least have a good handle on why the current modifications are needed for those cases.

Comment on lines +45 to +48
Unlike the Outlines engine, output is not grammar-constrained — the
schema is appended to the prompt as an instruction and the response
is parsed as JSON with a repair fallback.
"""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does look like Ollama supports structured output with a JSON schema: https://ollama.com/blog/structured-outputs. Seems like we should use that if possible

return outputs

def run_inference(self, prompts, schema):
def _parse_json(self, text: str) -> dict:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really excited about the idea of including this here. Have you been running into JSON validation etc. errors with the outlines inference engine here or? Just curious why this is even needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to have the history tracking as you have it in here for now, but I'm more inclined to merge Yoni's approach on this: https://github.com/ITM-Kitware/align-system/pull/277/changes#diff-ea512e45fac46d4935ce85a4837bdbcd27b5891a09c1ac5f6aad038076f4d497

As it maintains the full working_output history.

Comment on lines +98 to +130
tool_lines = "\n".join(f"- {t.name}: {t.description}" for t in tools)
history_lines = (
"\n".join(f"- {a.tool_name}({a.args})" for a in self._history)
if self._history else "None"
)
predict_proposer_prompt = (
f"Task: {scenario_state.unstructured}\n\n"
f"Available tools:\n{tool_lines}\n\n"
f"Action history:\n{history_lines}\n\n"
f"Generate {self.num_candidates} diverse candidate plans."
)

score_schema = (
'{"candidates":[{"actions":[{"tool_name":"MoveAhead","args":{"moveMagnitude":0.25}}],'
'"rationale":"..."}]}'
)

prompt_system = ("You are an embodied planning model.\n"
"Return ONLY valid JSON. No extra text.\n"
f"Generate {self.num_candidates} semi-diverse candidate plans.\n")
prompt = (
f"You are an embodied planning model.\n"
"Return ONLY valid JSON. No extra text.\n"
f"Generate {self.num_candidates} diverse candidate plans.\n"
f"- Each plan is 1 to {self.rollout_horizon} actions.\n"
f"- Use ONLY the tool names provided.\n"
f"- Args MUST satisfy each tool schema.\n"
f"- IMPORTANT objectId rule: For tools requiring objectId (TeleportNearObject, PickupObject, "
f"OpenObject, CloseObject, ToggleObjectOn/Off), you MUST copy the exact full objectId string "
"from the observation's visible lines (the value after 'id='). "
"Never use object type names like 'Apple' as objectId. Full objectIds contain '|' characters.\n"
"- Avoid repeating the same last action unless clearly helpful.\n"
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would for these to be done similar to how we do prompts in other ADMs (outlines templates or callables, and parameterize them in the init call so that we can swap them around at Hydra configuration time)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These experiment configs should be probably be in a subdirectory inside of experiment as that's typically how we've been doing it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like these should be in a file specific to AI2Thor (if they are indeed specific data types for that domain) just to help with namespacing. I.e. if I do from align_system.data_models.ai2thor import ToolSpec that seems more informative

Image.fromarray(frame.astype(np.uint8)).save(fpath)
print(f"[AI2ThorEnv] saved frame: {fpath}")

def reset(self, task: str) -> Observation:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the setup info etc. in here seems like it should be living in a data or config file somewhere rather than code?

from align_system.data_models.types import Action as PlannerAction


TASKS = {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar story here, this should probably be in a data or configuration file somewhere and we would probably want to be able to modify it for a given experiment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants