Skip to content

Job_chat: Implement tool-use for code edits#501

Open
hanna-paasivirta wants to merge 12 commits into
mainfrom
empty-answers-bug-code-edit-tool
Open

Job_chat: Implement tool-use for code edits#501
hanna-paasivirta wants to merge 12 commits into
mainfrom
empty-answers-bug-code-edit-tool

Conversation

@hanna-paasivirta
Copy link
Copy Markdown
Contributor

@hanna-paasivirta hanna-paasivirta commented Jun 3, 2026

Short Description

This PR adds

  • A job code editing tool to job_chat. This replaces the existing structured output format where the model was forced to answer entirely in JSON, with code edits first, then a text explanation. The model can now answer in natural language, and call a tool to request code changes. This results in much fewer refusals to answer (where the model ends the turn early); and empty responses.
  • Corresponding prompt changes.

Fixes #497

We also need a tweak in Lightning to add a stream status between the text answer and the code edits. See: OpenFn/lightning#4833

Implementation details

The job code assistant can only call the code edit tools once for all changes. The conversation turn ends there. We picked this because it makes the turn a little faster and simpler than a fully agentic tool-use loop, as we constrain the assistant to fewer API calls.

As our use cases become more complex and our global assistant architecture evolves, we might decide to allow any number of tool calls. The Lightning-side change should be ready for different orders of streaming statuses.

Experiments

I tried the following to fix the answer refusals. Failures resulted in more answer refusals/answers containing '...' only/garbled, repetitive tokens/malformed JSON:

  • Sonnet & Opus -> no difference, 80% pass
  • Prompt improvement & without structured outputs -> 90% pass
  • Prompt improvement & with structured outputs -> 70% pass (saw structured outputs cause the model to mix up tokens, or start repeating tokens until hitting max token limits)
  • Answer tool -> 0% pass (it forgets about the answer tool)
  • Answer tool with per message reminder -> 10% pass
  • Code edit tool without structured outputs-> 100% pass
  • Code edit tool with structured outputs-> 100% pass

I added the structured outputs to the code edit tool, where it doesn’t seem to cause strange behaviour and so should provide the occasional guardrail it’s intended to be (instead of manhandling the output into a low-probability sequence).

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our Responsible AI Policy

@josephjclark
Copy link
Copy Markdown
Collaborator

This PR just does a single tool call turn - which means:

  1. stream order is reversed - text comes before code now (but that might be OK because of how the text is written)
  2. After code is generated there's no extra explanation sent by the model. This might be OK, might not.

Suggest we add full loop later. Single step is the quickest win.

Defer the opus update to later - I want a clean fix first, then we'll consider the opus update in isolation.

@hanna-paasivirta hanna-paasivirta marked this pull request as ready for review June 4, 2026 13:20
@hanna-paasivirta hanna-paasivirta changed the title Job_chat: Implement agentic tool-use for code edits Job_chat: Implement tool-use for code edits Jun 4, 2026
Copy link
Copy Markdown
Collaborator

@josephjclark josephjclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give this a final test and hopefully release tomorrow. Thanks @hanna-paasivirta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Job_chat: Model outputs an empty or partial response

2 participants