Shared secondary bank performance optimizations#3995
Open
paulromano wants to merge 8 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
When doing profiling on a coupled neutron-photon transport simulation with weight windows enabled (which turns on the new shared secondary bank by default), I was noticing a lot of time spent in areas I wasn't expecting. This PR is the first round of some performance optimizations to address the issues:
neutron_xs_andphoton_xs_in theParticleDataclass). At first, I made some attempts to only allocate the cache if it was needed for a given particle. Later on though, I realized the core issue was that in the main loop over the shared secondary "read" bank, we allocate aParticleon the stack. Thus, the cost of the memory allocations can be completely eliminated by simply allocating theParticleonce per thread and re-using it in the loop (just requires clearing the local secondary bank at the end of each iteration on the inner loop).event_death. There are some atomics in there for global tallies that we can avoid in a lot of situations, so I've simply added conditionals to avoid triggering the atomics unless they are really needed.future_seed_coefficientsfunction so that when we callinit_particle_seeds, we only pay the cost once rather than 4 times.I have another round of optimizations having to do with the shared secondary bank data structures directly that builds on top of these, so once this PR is completed, I'll submit a follow-on one.
Checklist
I have followed the style guidelines for Python source files (if applicable)I have added tests that prove my fix is effective or that my feature works (if applicable)