How to achieve frame-level interaction

Hi thanks for your impressive work!
After reading your paper, I have a question about the frame-level interaction control. To my understanding, the actions are injected as a (1+n) length sequence to generate (1+n) images together, and autoregressively extended to a long video.

So during inference, is it possible to provide one action a time to generate the next content? or how do you define the frame-level control. Thank you a lot in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to achieve frame-level interaction #6

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

How to achieve frame-level interaction #6

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions