Job_chat: Model outputs an empty or partial response

This test scenario asking the model to explain codes fails in at least 40% of cases both via `job_chat` and `global_chat` on `main`. The model either emits an empty answer, three dots '...' or begins to explain code but ends the answer with a colon, with no code inline or in the `suggested_code` key.

I dug into this extensively but don't understand it. There are some known issues with using adaptive thinking and structured outputs, but there is probably something we are triggering inadvertently. It seems like there's too many constraints on the output and the model decides to stop the turn. 

I suspected that the ordering with code before text wasn't the most natural solution. I tried reversing it, and the problem does still appear (at least, cutting off on a colon: "...Here's an example that adds multiple members to your list in a single batch:", 'suggested_code': ""), but it might happen less frequently (1 in 10 tests).

I added the test in `services/global_chat/tests/acceptance/bugs/test_repro_dots_response.md` and `services/job_chat/tests/acceptance/bugs/test_repro_dots_response.md` in `global-chat-job-code` in #495 


```
hanna@Hannas-MacBook-Pro-2 apollo % poetry run pytest services/job_chat/tests/acceptance/tmp/test_repro_dots_response.md -s
=============================================================================================================== test session starts ================================================================================================================
platform darwin -- Python 3.11.15, pytest-8.4.1, pluggy-1.6.0
codspeed: 3.2.0 (disabled, mode: walltime, timer_resolution: 41.7ns)
benchmark: 5.1.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/hanna/openfn/apollo
configfile: pyproject.toml
plugins: recording-0.13.4, anyio-4.9.0, syrupy-4.9.1, socket-0.7.0, langsmith-0.4.1, codspeed-3.2.0, benchmark-5.1.0, asyncio-0.26.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 1 item

services/job_chat/tests/acceptance/tmp/test_repro_dots_response.md
→ job-chat.tmp.repro-dots-response
  service: job_chat
  judges:  general, openfn_code_quality
  calling job_chat...
Calling services/job_chat ...

INFO:search_adaptor_docs:Checking/loading adaptor docs for @openfn/language-mailchimp@latest
INFO:load_adaptor_docs:Checking if docs already exist in database
INFO:load_adaptor_docs:✓ Docs already exist for @openfn/language-mailchimp@latest (27 functions)
INFO:search_adaptor_docs:Adaptor docs for @openfn/language-mailchimp@latest already exist (checked in 0.008s)
INFO:job_chat:Making non-streaming API call
INFO:job_chat:model output: stop_reason='end_turn' content_block_types=['thinking', 'text'] content_block_lengths=[479, 40]
INFO:job_chat:Cache write: 533 tokens
INFO:job_chat:Cache read: 2930 tokens
INFO:job_chat:parse_and_apply_edits: text_answer_len=3 code_edits_count=0 actions=[] has_original=True

Done!
{'response': '...', 'suggested_code': None, 'history': [{'role': 'assistant', 'content': "\nHi! I'm here to help you create an OpenFn workflow. Could you describe the process you'd like to automate? For example, tell me what systems are involved, what data needs to move between them, and when or how the workflow should be triggered."}, {'role': 'user', 'content': '\n[pg:workflows/daily-google-sheets-email-sender] i want this to send an email to my group 1234  whre the title and body are just green emojis'}, {'role': 'assistant', 'content': '\nThis job creates a Mailchimp campaign for list 1234 with a green emoji subject and body, then sends it. Make sure to update from_name and reply_to with your actual sender details — these must match a verified sender in your Mailchimp account.'}, {'role': 'user', 'content': '\n[pg:workflows/daily-google-sheets-email-sender/update-mailchimp] what does the ... do'}, {'role': 'assistant', 'content': "\nThe ... is JavaScript's spread operator. In { ...state, campaignId: state.data.id }, it copies all existing properties from state into the new object, then adds/overwrites campaignId. Without it, you'd lose everything else in state (like data, configuration, etc.) and only have campaignId."}, {'role': 'user', 'content': '[pg:job_code/update-mailchimp/mailchimp@latest] \nokay can you show me how to use the hardest function for htis adaptor to teach me'}, {'role': 'assistant', 'content': '...'}], 'usage': {'cache_creation_input_tokens': 533, 'cache_read_input_tokens': 2930, 'input_tokens': 733, 'output_tokens': 167}, 'meta': {'rag': {'search_results': [], 'search_results_sections': [], 'search_queries': [], 'config_version': 1.0, 'prompts_version': 1.0, 'usage': {'needs_docs': {'cache_creation': {'ephemeral_1h_input_tokens': 0, 'ephemeral_5m_input_tokens': 0}, 'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'inference_geo': 'global', 'input_tokens': 730, 'output_tokens': 4, 'server_tool_use': None, 'service_tier': 'standard'}, 'generate_queries': {}}}}}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job_chat: Model outputs an empty or partial response #497

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Job_chat: Model outputs an empty or partial response #497

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions