fix: Optimize cleanup jobs, eliminate raw ES client usage, add test coverage#2267
Open
niemyjski wants to merge 6 commits into
Open
fix: Optimize cleanup jobs, eliminate raw ES client usage, add test coverage#2267niemyjski wants to merge 6 commits into
niemyjski wants to merge 6 commits into
Conversation
niemyjski
commented
May 30, 2026
niemyjski
commented
May 30, 2026
niemyjski
commented
May 30, 2026
niemyjski
commented
May 30, 2026
niemyjski
commented
May 30, 2026
niemyjski
commented
May 30, 2026
niemyjski
commented
May 30, 2026
niemyjski
commented
May 30, 2026
niemyjski
commented
May 30, 2026
niemyjski
commented
May 30, 2026
niemyjski
commented
May 30, 2026
niemyjski
commented
May 30, 2026
niemyjski
commented
May 30, 2026
niemyjski
commented
May 30, 2026
niemyjski
commented
May 30, 2026
154223e to
7788c75
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR refactors cleanup jobs to rely on repository abstractions, improves duplicate stack cleanup behavior, and adds integration coverage for cleanup/repository operations.
Changes:
- Reworked orphaned-data and cleanup jobs with lock renewal, cancellation checks, and repository-based deletes/updates.
- Added event repository helpers for bulk deletion, stack reassignment, and distinct-id aggregation.
- Added integration tests for cleanup pagination, retention, duplicate signatures, and event repository operations.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/Exceptionless.Core/Jobs/CleanupDataJob.cs |
Extends lock duration and renews locks during paged cleanup/retention loops. |
src/Exceptionless.Core/Jobs/CleanupOrphanedDataJob.cs |
Replaces direct Elasticsearch calls with repository methods for orphan cleanup and duplicate-stack fixing. |
src/Exceptionless.Core/Repositories/EventRepository.cs |
Adds bulk delete, stack reassignment, and composite aggregation helpers. |
src/Exceptionless.Core/Repositories/Interfaces/IEventRepository.cs |
Exposes new event repository cleanup/query APIs and composite cursor type. |
src/Exceptionless.Core/Repositories/StackRepository.cs |
Adds duplicate signature aggregation lookup. |
src/Exceptionless.Core/Repositories/Interfaces/IStackRepository.cs |
Exposes duplicate signature lookup API. |
tests/Exceptionless.Tests/Jobs/CleanupDataJobTests.cs |
Adds cleanup pagination and retention integration coverage. |
tests/Exceptionless.Tests/Jobs/CleanupOrphanedDataJobTests.cs |
Adds integration coverage for orphan cleanup and duplicate stack merging. |
tests/Exceptionless.Tests/Repositories/EventRepositoryTests.cs |
Adds coverage for distinct ids, stack reassignment, and bulk delete helpers. |
tests/Exceptionless.Tests/Repositories/StackRepositoryTests.cs |
Adds duplicate signature repository coverage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…overage - Refactor CleanupOrphanedDataJob to use repository methods exclusively (eliminates all direct IElasticClient usage) - Fix critical bug: @min_count:2 → @min:2 in GetDuplicateSignaturesAsync (invalid syntax was silently ignored, returning ALL signatures as duplicates) - Add lock renewal at page boundaries in CleanupDataJob - Add OperationCanceledException filter before generic catch - Convert CompositeKeyResult class to record - Use pattern matching (is null/is not null, is []) throughout - Remove redundant is_deleted:false filter (repository handles soft deletes) - Revert page size to 5 (2.5s sleep/item makes large pages impractical) - Consolidate duplicate SaveAsync calls using spread syntax - Add XML documentation for composite aggregation and script safety - Add 8 new integration tests for EventRepository and StackRepository - Rename all tests to Method_Given_Expected convention Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7788c75 to
3c5880b
Compare
…cleanup GetByIdsAsync with Include(s => s.Id) only fetched the id field, causing is_deleted to default to false for all entities. ShouldReturnDocument in Foundatio checked this field to filter soft-deletes, but always saw false and treated all entities (including soft-deleted ones) as existing. As a result, events pointing to soft-deleted stacks/projects/organizations were never cleaned up by CleanupOrphanedDataJob. Fix by including IsDeleted in the projection for all three GetByIdsAsync calls so the soft-delete filter correctly excludes deleted entities and their orphaned events get cleaned up. Add integration tests proving that events pointing to soft-deleted stacks, projects, and organizations are properly cleaned up. Add a comment to RenewLockAsync in CleanupDataJob explaining that it is called at each page boundary to prevent the distributed lock from expiring during long-running bulk cleanup operations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…erage Four tests were passing for the wrong reason: orphaned events in the project and org cleanup tests had random stack IDs, so DeleteOrphanedEventsByStackAsync ran first and deleted them — project/org cleanup was never exercised. Fixed by using the valid stack ID, ensuring only the intended cleanup phase handles those events. Also added assertions to FixDuplicateStacks merge test verifying that all 110 events land on the target stack (not just total count = 110). New tests added: - RemoveAllByStackIds_WithMatchingEvents_RemovesAll - ReassignStack_WithEmptySourceIds_ReturnsZeroWithoutModification (guards against the catastrophic empty-.Stack()-filter-patches-all-events bug) - GetDistinctProjectIds_WithMultipleProjects_ReturnsAllUniqueIds - GetDistinctOrganizationIds_WithMultipleOrganizations_ReturnsAllUniqueIds Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The cache invalidation for Stack prefix was called in three places: 1. Inside the progress-logging timer block (every 30s during batch) 2. After each batch completes (end-of-batch) 3. After the entire while loop finishes Places 1 and 3 were redundant with place 2. The end-of-batch call already ensures the cache is invalidated after each page of duplicates is processed. Calling it on a timer and again after the loop added no correctness benefit while adding unnecessary cache churn. Resolves PR feedback: 'Why do we call this twice?' Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add // Arrange / // Act / // Assert comments to all tests in CleanupOrphanedDataJobTests and new tests in EventRepositoryTests (PR feedback: 'tests need 3 name part with act, assert, arrange') - Add explanatory comment to broad catch (Exception) in FixDuplicateStacks clarifying it is intentional: log-and-continue per-group so a single corrupt signature does not abort the entire dedup job - Inline pre-condition assertions and simplify intermediate variables in tests for clarity Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
Merged origin/main and addressed the outstanding review feedback in 92fa434. Summary:
Verification:
All review threads are resolved. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Comprehensive optimization of cleanup jobs: eliminate all direct Elasticsearch client usage, fix critical bugs, restore correct loop semantics, and add full integration test coverage.
Critical Bug Fixes
@min_count:2→@min:2inGetDuplicateSignaturesAsyncThe
@min_countsyntax is silently ignored by Foundatio Parsers, causing the aggregation to return ALL signature hashes (not just duplicates). This would have causedFixDuplicateStacksto treat every stack as a duplicate on the next run.FixDuplicateStacksonly ran one batch (regression from refactor)The original code looped until
GetDuplicateSignaturesAsyncreturned empty. The refactor broke this. Restored thewhileloop with:ImmediateConsistency()onCountAsyncinsideGetDuplicateSignaturesAsync(one refresh per batch, matching originalIndices.RefreshAsynccall pattern — NOT per item)ReassignStackAsyncdata-loss hazard on empty sequenceIf
sourceStackIdswas empty,PatchAllAsyncwould have no stack filter and would reassign ALL events to the target stack. Added materialization + early return guard.FixDuplicateStacksevent-first orderingMoved
ReassignStackAsyncbeforeSaveAsync(soft-delete). If event reassignment fails, duplicate stacks remain visible and no data is lost. Previously: soft-delete first → event reassignment failure → orphan cleanup deletes the events.GetDistinctFieldValuesAsyncafterKey cursor leakafterKeywas only populated whencomposite.AfterKey != null, never cleared. Added: always clear cursor first, then repopulate. Callers checkingafterKey.AfterKey.Count > 0now correctly detect end-of-pagination.Architecture (eliminate raw ES client from jobs)
CleanupOrphanedDataJobto use repository methods exclusivelyGetDistinctFieldValuesAsyncusing composite aggregation (encapsulated in repository — composite aggregation is not in Foundatio's DSL, so raw client use is justified and documented)RemoveAllByProjectIds/RemoveAllByOrganizationIds,RemoveAllByStackIdsAsynctoIEventRepositoryReassignStackAsyncusing parameterized Painless scriptGetDuplicateSignaturesAsynctoIStackRepositoryOther Fixes
OperationCanceledExceptionfilter before generic catchCleanupDataJobSaveAsynccalls into one using spread syntaxis_deleted:falsefilter (repository applies soft-delete filter by default)CompositeKeyResultconverted fromclasstorecordis null/is not null,is []):{Message}to error log format stringsRebased onto main
Resolved merge conflict with
#2268(Deleted counter tests) — preserved all tests from both branches.Test Coverage (39 job tests + 278 repo tests, all passing)
New job tests (named
Method_Given_Expected):RunAsync_SuspendedOrganization_SuspendsRelatedTokensRunAsync_SoftDeletedOrganization_RemovesAllRelatedDataRunAsync_SoftDeletedProject_RemovesProjectAndEventsRunAsync_SoftDeletedStack_RemovesStackAndEventsRunAsync_EventsOutsideRetentionPeriod_RemovesExpiredEventsDeleteOrphanedEventsByStack_WithLargeDataset_DeletesAllOrphanedEventsCleanupSoftDeletedOrganizations_WithMultiplePages_RemovesAllDataCleanupSoftDeletedStacks_WithMultiplePages_RemovesAllStacksEnforceRetention_WithMultipleOrganizations_RespectsPerOrgRetentionEnforceRetention_WithEventsOutsideRetention_DeletesOnlyExpiredEvents#2268(merged)New repository tests:
GetDistinctStackIds_WithMultipleStacks_ReturnsAllUniqueIdsGetDistinctStackIds_WithPagination_ReturnsAllIdsReassignStack_WithSourceEvents_MovesAllEventsToTargetRemoveAllByProjectIds_WithMatchingEvents_RemovesAllRemoveAllByOrganizationIds_WithMatchingEvents_RemovesAllGetDuplicateSignatures_WithDuplicates_ReturnsSignaturesGetDuplicateSignatures_WithNoDuplicates_ReturnsEmptyGetDuplicateSignatures_WithSoftDeletedStacks_ExcludesThem