Was it a bad idea to stream the text generation process using Context.report_progress()?

### Question

I built a chatbot server using VLLM's `AsyncLLMEngine` and an MCP Streamable HTTP server via its Python SDK.

To stream the token generation process from the `AsyncLLMEngine` generator to the MCP client, I used `context.report_progress` as shown below.

```
...
foundation_model = AsyncLLMEngine.from_engine_args(
    AsyncEngineArgs(
        ...
    )
)


...

@server.tool()
async def chat(q:str, ctx: Context):

    messages = get_messages(q)
    generator = foundation_model.generate(messages, ...)
    
    async for request_output in generator:
        current_text = request_output.outputs[0].text
        next_text = current_text[len(answer):]
        if ctx is not None and isinstance(ctx, Context):
            await ctx.report_progress(i, None, next_text)
        answer = current_text
        i += 1
    
```
Is this usage appropriate for streaming the token generation process?

Is there another way?

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Was it a bad idea to stream the text generation process using Context.report_progress()? #2501

Question

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Was it a bad idea to stream the text generation process using Context.report_progress()? #2501

Description

Question

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions