Skip to content

ETT-316 issues with glacier expiration job#186

Open
moseshll wants to merge 3 commits into
mainfrom
ETT-316_glacier_parallelization
Open

ETT-316 issues with glacier expiration job#186
moseshll wants to merge 3 commits into
mainfrom
ETT-316_glacier_parallelization

Conversation

@moseshll

@moseshll moseshll commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
  • Add expire_versions.pl invoked as worker
  • Add BackupExpirationBatch.pm as core logic inside expire_versions.pl
    • No tests for this class, relying on the existing end-to-end tests to trickle down coverage
  • BackupExpiration
    • fork/exec worker processes to handle deletion in parallel
    • Set default number of BackupExpiration workers to 8
    • Log some details on worker spawn and despawn
  • Tests
    • Existing test suite left as intact as possible
    • Add test for unknown storage throwing exception
    • Add test with job size 1 so multiple workers are involved
    • Fix brittle tests with potential collisions from old_random_timestamp and new_random_timestamp

- `BackupExpiration.pm` forks/execs worker processes to handle deletion in parallel.
- Existing test suite left as intact as possible.
Comment thread t/backup_expiration.t
my $exp = HTFeed::BackupExpiration->new(storage_name => $vars{storage_name}, dry_run => 0);
my $exp = HTFeed::BackupExpiration->new(
storage_name => $vars{storage_name},
storage_config => $vars{storage_config},

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change here is to get storage config into the class under test, so it can pass it along (via YAML) to the worker processes that actually need it. Can't rely on patching the global config as we used to, when the work was being done in (or close to) the current process.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mere presence of storage_config passed to new is what prompts BackupExpiration to write the custom config to a file. I think it would be possible to just pass a flag that says in effect "this is custom config that won't be discoverable by a spawned process -- please write your config to a YAML file." But this is the mechanism that emerged, and it seems to work.

moseshll added 2 commits June 18, 2026 09:20
… be already there, maybe as a subdependency..
- `BackupExpiration`
  - Set default number of `BackupExpiration` workers to 8
  - Allow spawning worker while iterating versions (inner loop) instead of waiting until the end
    - allows testing with a bunch of versions of one object
  - Log some details on worker spawn and despawn
- Tests
  - Add test for unknown storage throwing exception
  - Add test with job size 1 so multiple workers are involved
  - Fix brittle tests with potential collisions from `old_random_timestamp` and `new_random_timestamp`
    - Now choose one and increment or decrement when looping
@moseshll moseshll marked this pull request as ready for review June 18, 2026 13:42
@moseshll moseshll requested a review from aelkiss June 18, 2026 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant