Skip to content

How to achieve frame-level interaction #6

Description

@LeeKeyu

Hi thanks for your impressive work!
After reading your paper, I have a question about the frame-level interaction control. To my understanding, the actions are injected as a (1+n) length sequence to generate (1+n) images together, and autoregressively extended to a long video.

So during inference, is it possible to provide one action a time to generate the next content? or how do you define the frame-level control. Thank you a lot in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions