The evaluation/README.md and REPRODUCE_BAGEL_RESULTS.md both document an AC (Appearance Consistency) scoring pipeline that depends on a mllm_eval/ directory:
- Run MLLM-based AC scoring (optional but recommended)
Launch the evaluation helper in mllm_eval/ to produce AC results with a serving LLM:
cd mllm_eval
bash eval_infer.sh <model_path> default <port>
However, the mllm_eval/ directory (and its scripts build_lmdb.py, mllm_eval.py, eval_infer.sh) is not present in the released repository.
The only AC-related code available is eval/aggregate.py, which can read a pre-existing AC predictions JSON, but cannot produce one.
The
evaluation/README.mdandREPRODUCE_BAGEL_RESULTS.mdboth document an AC (Appearance Consistency) scoring pipeline that depends on amllm_eval/directory:However, the
mllm_eval/directory (and its scriptsbuild_lmdb.py,mllm_eval.py,eval_infer.sh) is not present in the released repository.The only AC-related code available is
eval/aggregate.py, which can read a pre-existing AC predictions JSON, but cannot produce one.