Skip to content

Add DeepSeek-V4-Pro disaggregated P/D inference recipe (scripts/DeepSeekV4)#173

Open
raviguptaamd wants to merge 1 commit into
ROCm:developfrom
raviguptaamd:add-deepseekv4-recipe
Open

Add DeepSeek-V4-Pro disaggregated P/D inference recipe (scripts/DeepSeekV4)#173
raviguptaamd wants to merge 1 commit into
ROCm:developfrom
raviguptaamd:add-deepseekv4-recipe

Conversation

@raviguptaamd

Copy link
Copy Markdown
Contributor

Summary

Adds a new blueprint/recipe for DeepSeek-V4-Pro disaggregated prefill/decode (P/D) inference on AMD Instinct MI355X, under scripts/DeepSeekV4/, and links it from the top-level README Blueprints table.

The recipe covers two transport backends:

  • ATOM + Mooncake (KV-cache transfer over RDMA)
  • SGLang + MoRI-IO

What's included

  • scripts/DeepSeekV4/README.md — usage and overview
  • cluster.yaml / model.yaml — cluster + model configuration
  • lib/ — launcher and helpers (run_disagg.sh, cfg.py, topo.py, check_accuracy.py, clean_node.sh, prompts.json, lib_inferencex.sh)
  • atom_disagg/ — ATOM+Mooncake env/server/launch/bench scripts
  • utils/bench_serving/ — serving benchmark client (throughput/latency)
  • docs/proxy_and_disagg.md — proxy + disaggregation notes
  • run_atom_disagg.sh / run_sglang_disagg.sh — entrypoints

Test plan

  • Reviewer: confirm README Blueprints row renders/links correctly
  • Smoke 1P/1D and 2P/1D launch on MI355X nodes
  • Verify RDMA KV transfer over Pollara NICs

Made with Cursor

…eekV4)

Multi-node prefill/decode disaggregated serving/benchmark harness for
DeepSeek-V4-Pro (ATOM+mooncake primary; SGLang+MoRI experimental), ported
from InferenceX. Adds a Blueprints row to the top-level README.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant