difference during inference with or without kvcache

### Bug description

i modified the generate/base.py ,  one inference with kvcache, and the other one is without kvcache,

i set both temperature=0, topk=None and topp=0,
i use the same seed,
i use the same model(qwen2.5-0.5b-instruct)

the only thing i did is in the function **generate_fn**,
-------------------------------
        if prefill_token:
            tmp_x = token.view(1, -1)
        else:
            tmp_x = torch.cat(all_tokens, dim=0).view(1, -1)
        token = next_token(
            model,
            input_pos=None,
            x=tmp_x,
            input_pos_maxp1=None,
            temperature=temperature,
            top_k=top_k,
            top_p=top_p,
            prefill_token=prefill_token,
            count=count,
        )
-------------------------------

but i found that the logits is difference,

im not sure if the difference is ok,


### Reproduced in studio

_No response_

### What operating system are you using?

Unknown

### LitGPT Version

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

difference during inference with or without kvcache #2165

Bug description

the only thing i did is in the function generate_fn,

Reproduced in studio

What operating system are you using?

LitGPT Version

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

difference during inference with or without kvcache #2165

Description

Bug description

the only thing i did is in the function generate_fn,

Reproduced in studio

What operating system are you using?

LitGPT Version

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions