Skip to content

epl单机单卡和单机多卡训练step如何理解 #30

@SueeH

Description

@SueeH

单机单卡:
启动命令:TF_CONFIG='{"cluster":{"worker":["127.0.0.1:49119"]},"task":{"type":"worker","index":0}}' CUDA_VISIBLE_DEVICES=0 bash ./scripts/train_dp.sh
image

单机双卡:
启动命令:TF_CONFIG='{"cluster":{"worker":["127.0.0.1:49119"]},"task":{"type":"worker","index":0}}' CUDA_VISIBLE_DEVICES=0,1 bash ./scripts/train_dp.sh
1693045873752

代码修改了一下:去掉了last_step限制,数据集repeat=10,将txt改为py,可执行。
resnet_dp.txt

想请教下,这个如何理解呢?每个卡分别跑了10step?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions