This repository contains tools and scripts for managing and publishing proceedings for the Proceedings of Machine Learning Research (PMLR).
I've archived an old version of this code at https://github.com/mlresearch/old_papersite. On 2025-05-26 that repo was cloned to start this one and restructure with aim of creating a better automated pipeline.
The repository is structured as follows:
- lib/: Contains Ruby scripts for managing the Jekyll site and processing BibTeX files.
- bin/: Contains shell scripts for various tasks related to the repository.
- backlog/: Task management system for tracking improvements and features.
- cip/: Code Improvement Plans for documenting architectural changes.
The standard workflow for publishing a PMLR volume:
- Pre-publication check: Use
check_volume.shto validate the volume directory before touching anything - BibTeX cleaning: Use
tidy_bibtex.rbto fix common formatting issues - Volume creation: Use
create_volume.rbto generate Jekyll posts and organise assets - Deployment: Use
deploy_volume.shfor the two-branch separation strategy
Run this first to catch common submission errors before processing:
cd ~/mlresearch/v304
../papersite/bin/check_volume.sh 304The checker validates:
| Check | What it catches |
|---|---|
@Proceedings entry |
Missing required fields (published, name, volume, …); volume not in braces; date not in YYYY-MM-DD format; published = {} (empty date — a common editor omission) |
| PDF locations | PDFs in subdirectories (e.g. pdfs/) instead of the repository root; permission/consent PDF directories (e.g. v304permissions/) are ignored |
| Supplementary locations | Supp files in subdirectories (e.g. supplementary_material/) instead of root |
| BibTeX key / PDF match | Keys without a matching PDF; orphaned PDFs with no BibTeX entry; hyphenated keys (e.g. hernandez-garcia25) are fully supported |
| Author name formatting | Missing comma separator (YanjunXu → Xu, Yanjun); lowercase surname; reversed Given, Surname order |
| Double backslashes | \\textit, \\Delta, etc. that should be single backslash |
| Escaped characters | \$, \{, \}, \_ in abstracts/titles that should be unescaped |
| Non-ASCII BibTeX keys | Keys like miñoza26 that will fail during processing |
The script exits 0 if all checks pass, 1 if any errors are found.
ruby lib/tidy_bibtex.rb proceedings.bib proceedings.bib --fix-percentruby lib/create_volume.rb -v 304 -b proceedings.bib
# If PDFs are in a separate branch, skip PDF existence checks
ruby lib/create_volume.rb -v 304 -b proceedings.bib --skip-pdf-check# Non-interactive (for scripted/automated use):
cd ~/mlresearch/v304
SKIP_CONFIRM=1 bash ../papersite/bin/deploy_volume.sh 304
# Interactive (prompts "Continue? (yes/no)" — default when run manually):
cd ~/mlresearch/v304
bash ../papersite/bin/deploy_volume.sh 304The script must be run from the volume directory, not from papersite/. It requires _posts/, _config.yml, and assets/ to exist (generated by create_volume.rb) and an origin remote pointing to github.com:mlresearch/vNNN.
This creates:
- main branch: Assets (PDFs, supplementary files) and
README.md - gh-pages branch: Jekyll site files served by GitHub Pages
Both branches are pushed to GitHub. The published site will be available at https://mlresearch.github.io/vNNN/ once GitHub Pages rebuilds (usually within a minute).
Issues that have occurred in real submissions and are worth fixing before running the pipeline:
Empty or missing publication date
published = {} (empty braces) is a common omission. Must be published = {YYYY-MM-DD}.
LaTeX formatting commands in abstracts
Commands such as {\color{blue}{...}} or \textcolor in abstracts cause BibTeX parse failures due to unbalanced braces. Strip all LaTeX colour/formatting commands from abstracts before submission — they are not rendered in the proceedings HTML anyway.
Unicode characters in abstracts and titles
Characters such as É, –, ' (left single quote, U+2018) in abstracts or titles are replaced by their LaTeX equivalents during processing via unicode_replacements.yml. If create_volume.rb errors with "No substitution found for Unicode character X", add an entry to lib/unicode_replacements.yml:
X:
replacement: "\\'{E}" # LaTeX equivalent
name: LATIN CAPITAL LETTER E WITH ACUTEHyphenated BibTeX keys
Keys such as hernandez-garcia25 are valid and fully supported by the checker and pipeline. They do not need to be renamed.
Permission/consent PDF directories
Directories named *permissions* or *permission* (e.g. v330permissions/) are excluded from the PDF-location check. They are also removed during deployment and do not appear in either published branch.
The repository has two test suites covering different components.
Ruby Test::Unit tests for tidy_bibtex.rb. Covers auto-detection, issue
detection and fixing, command-line options, and edge cases.
ruby test/run_tests.rb # run all BibTeX cleaner tests
ruby test/test_bibtex_cleaner.rb # run a single fileSee test/README.md for full details and conventions.
Bash regression tests for check_volume.rb, using real bib files extracted
from git history as fixtures for known-bad submissions, plus synthetic
fixtures for file-location checks.
cd ~/mlresearch/papersite
bash tests/test_check_volume.sh # run all regression tests
bash tests/test_check_volume.sh --verbose # show every individual assertionFixtures (tests/fixtures/):
| Fixture | Source | Tests |
|---|---|---|
v304_original/ |
git show 59fd75f:proceedings.bib |
Author errors, double backslashes, escaped chars, volume not in braces |
v328_original/ |
git show 0545422:CPAL26.bib |
Non-ASCII BibTeX keys |
pdfs_in_subdir/ |
Synthetic | PDFs in pdfs/ subdirectory |
supps_in_subdir/ |
Synthetic | Supplementary files in supplementary_material/ |
clean_volume/ |
Synthetic | All checks pass; zero-exit regression |
See tests/README.md for fixture details and guidance on adding new tests.
The Ruby code is used for creating Jekyll sites for hosting PMLR on GitHub Pages. The main code is found in lib/mlresearch.rb.
The Ruby scripts depend on the following packages:
- ActiveRecord
- bibtex-ruby
- facets
- pandoc-ruby
You can install these packages using:
gem install bibtex-ruby facets pandoc-ruby activerecordAlternatively, you can use the provided Gemfile with:
bundle installFor detailed usage instructions, refer to the lib/README.md file.
To suggest fixes or improvements, please make a pull request containing the changes requested and a justification for the changes.
For details on how to publish in PMLR, please check PMLR FAQ.
For details on what is required to submit a proceedings, please check PMLR Specification.