Job_chat: Implement tool-use for code edits by hanna-paasivirta · Pull Request #501 · OpenFn/apollo

hanna-paasivirta · 2026-06-03T08:30:06Z

Short Description

This PR adds

A job code editing tool to job_chat. This replaces the existing structured output format where the model was forced to answer entirely in JSON, with code edits first, then a text explanation. The model can now answer in natural language, and call a tool to request code changes. This results in much fewer refusals to answer (where the model ends the turn early); and empty responses.
Corresponding prompt changes.

Fixes #497

We also need a tweak in Lightning to add a stream status between the text answer and the code edits. See: OpenFn/lightning#4833

Implementation details

The job code assistant can only call the code edit tools once for all changes. The conversation turn ends there. We picked this because it makes the turn a little faster and simpler than a fully agentic tool-use loop, as we constrain the assistant to fewer API calls.

As our use cases become more complex and our global assistant architecture evolves, we might decide to allow any number of tool calls. The Lightning-side change should be ready for different orders of streaming statuses.

Experiments

I tried the following to fix the answer refusals. Failures resulted in more answer refusals/answers containing '...' only/garbled, repetitive tokens/malformed JSON:

Sonnet & Opus -> no difference, 80% pass
Prompt improvement & without structured outputs -> 90% pass
Prompt improvement & with structured outputs -> 70% pass (saw structured outputs cause the model to mix up tokens, or start repeating tokens until hitting max token limits)
Answer tool -> 0% pass (it forgets about the answer tool)
Answer tool with per message reminder -> 10% pass
Code edit tool without structured outputs-> 100% pass
Code edit tool with structured outputs-> 100% pass

I added the structured outputs to the code edit tool, where it doesn’t seem to cause strange behaviour and so should provide the occasional guardrail it’s intended to be (instead of manhandling the output into a low-probability sequence).

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

You can read more details in our Responsible AI Policy

josephjclark · 2026-06-03T08:56:40Z

This PR just does a single tool call turn - which means:

stream order is reversed - text comes before code now (but that might be OK because of how the text is written)
After code is generated there's no extra explanation sent by the model. This might be OK, might not.

Suggest we add full loop later. Single step is the quickest win.

Defer the opus update to later - I want a clean fix first, then we'll consider the opus update in isolation.

josephjclark

I'll give this a final test and hopefully release tomorrow. Thanks @hanna-paasivirta

hanna-paasivirta added 5 commits June 2, 2026 17:16

remove strucutred outputs and try opus

1d31816

adjust prompt for json format clarification

23a8a2d

add answer tool

3e26570

code edits tool

d20e0dc

strict tool use

d6dedc3

hanna-paasivirta added 5 commits June 3, 2026 19:59

tidy

24826a1

temp mark

217ead3

typo

3cef8c8

tense

8defff6

tweak streaming

75a5645

hanna-paasivirta mentioned this pull request Jun 4, 2026

Job code assistant: Show statuses streamed after the text answer OpenFn/lightning#4832

Open

add changeset

c3ebe8e

hanna-paasivirta marked this pull request as ready for review June 4, 2026 13:20

hanna-paasivirta changed the title ~~Job_chat: Implement agentic tool-use for code edits~~ Job_chat: Implement tool-use for code edits Jun 4, 2026

hanna-paasivirta requested a review from josephjclark June 4, 2026 13:31

version

58bbae3

josephjclark approved these changes Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job_chat: Implement tool-use for code edits#501

Job_chat: Implement tool-use for code edits#501
hanna-paasivirta wants to merge 12 commits into
mainfrom
empty-answers-bug-code-edit-tool

hanna-paasivirta commented Jun 3, 2026 •

edited

Loading

Uh oh!

josephjclark commented Jun 3, 2026

Uh oh!

josephjclark left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hanna-paasivirta commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Short Description

Implementation details

Experiments

AI Usage

Uh oh!

josephjclark commented Jun 3, 2026

Uh oh!

josephjclark left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hanna-paasivirta commented Jun 3, 2026 •

edited

Loading