This example demonstrates a very simple application where an energy deposit and # of steps is accounted in thread-local (i.e. one instance per thread) hits maps with underlying types of plain-old data (POD) and global (i.e. one instance) hits maps with underlying types of atomics. The example uses a coarse mesh, extensive physics, and step limiters to ensure that there is a higher degree of conflict between threads when updating the scorers to test the robustness of the atomics classes and maximize the compounding of thread-local round-off error.

At the end of the simulation, the scorers are printed to "mfd_<DATA_TYPE>_<SCORER_TYPE>.out", where DATA_TYPE is either "tl" (thread-local) or "tg" (thread-global) and SCORER_TYPE is "EnergyDeposit" or "NumberOfSteps". These values are then compared to a thread-global sum of these scorers that were updated via mutex locking. If round-off errors in thread-local EnergyDeposit are present, they can be viewed in "mfd_diff.out" at the end of the simulation

This example also provides a demonstration of the timemory (a performance instrumentation toolkit) package provided in Geant4 – for documentation of timemory see https://github.com/NERSC/timemory and https://timemory.readthedocs.io.

ATOMICS and the ATOMIC SCORERS

atomics can ONLY handle plain-old data (POD) types, e.g. int, double, etc. The implementation of atomics in compiler-dependent. At the very worst, the performance of an atomic is the same mutex locking. Atomics, in general, are not copy-constructable. This has to do with thread safety (e.g. making a copy while another thread tries to update) This is why atomics cannot be used in STL containers. The implementation in atomic.hh has limited copy-construction and still cannot be used in STL containers. Use these copy-constructors with extreme caution. See opening comments of G4atomic.hh for more details.

The newly provided classes in this example (G4atomic, G4TAtomicHitsMap, and G4TAtomicHitsCollection) are intended for applications where memory is a greater concern than performance. While atomics generally perform better than mutex locking, the synchronization is not without a cost. However, since the memory consumed by thread-local hits maps scales roughly linearly with the number of threads, simulations with a large number of scoring volumes can decrease simulation time by increasing the number of threads beyond what was previously allowed due to the increase in memory consumption.

These classes are intended to be included in the Geant4 source code ***
release next year                                                   ***

The G4TAtomicHitsMap and G4TAtomicHitsCollection work exactly the same way as the standard G4THitsMap and G4THitsCollection, respectively, with the exception(s) that you should only implement one instance and provide a pointer/reference of that instance to the threads instead of having the threads create them. Additionally, there is no need to include them in the G4Run::Merge().

GEOMETRY DEFINITION

The geometry is constructed in the TSDetectorConstruction class. The setup consists of a box filling the world. The volume is divided into subregions, where the outermost boxes are a different material. The materials by default are water and boron as these have large scattering cross-sections for neutrons (the default particle).

PHYSICS LIST

The particle's type and the physic processes which will be available in this example are set are built from a variety of physics constructors. The chosen physics lists are extensive, primarily The constructors are:

ACTION INITALIZATION

TSActionInitialization, instantiates and registers to Geant4 kernel all user action classes.

While in sequential mode the action classes are instatiated just once, via invoking the method: TSActionInitialization::Build() in multi-threading mode the same method is invoked for each thread worker and so all user action classes are defined thread-local.

A run action class is instantiated both thread-local and global that's why its instance is created also in the method TSActionInitialization::BuildForMaster() which is invoked only in multi-threading mode.

PRIMARY GENERATOR

The primary generator is defined in the TSPrimaryGeneratorAction class. The default kinematics is a 1 MeV neutron, randomly distributed in front of the target across 100% of the transverse (X,Y) target size. This default setting can be changed via the Geant4 built-in commands of the G4ParticleGun class.

DETECTOR RESPONSE

This example demonstrates a scoring implemented in the user action classes and TSRun object.

The energy deposited is collected per event in the PrimitiveScorer G4PSEnergyDeposit (as part of a MultiFunctionalDetector) and the thread-local version are merged at the end of the run.

The number of steps is collected per event in the PrimativeScorer G4PSNoOfSteps and the thread-local version are merged at the end of the run.

When the MFD is recording an event i.e. TSRun::RecordEvent(const G4Event*), the global atomic hits map adds the same hits collections

In multi-threading mode the energy accumulated in TSRun MFD object per workers is merged to the master in TSRun::Merge().

Scoring is accumulated with thread-local doubles, thread-global atomics, mutex-locking, G4StatAnalysis, and G4ConvergenceTester

TSRun contains five hits collections types: 1) a thread-local hits map, 2) a global atomic hits map 3) a global "mutex" hits map 4) a global G4StatAnalysis hits deque 5) a global G4ConvergenceTester hits deque

The thread-local hits map is the same as you will find in many other examples.

The atomics hits map is the purpose of this example. Code-wise, the implementation looks extremely similar to the thread-local version with 3 primary exceptions: (1) construction - there should only be one instance so it should be a static member variable or a pointer/reference to a single instance (2) It does not need to, nor should be, summed in G4Run::Merge() (3) destruction – it should only be cleared by the master thread since there is only one instance.

The "mutex" hits map is also included as reference for checking the results accumulated by the thread-local hits maps and atomic hits maps. The differences w.r.t. this hits maps are computed in TSRunAction::EndOfRunAction

The "G4StatAnalysis" and "G4ConvergenceTester" hits deques are memory-efficient version of the standard G4THitsMap. While maps are ideal for scoring at the G4Event-level, where sparsity w.r.t. indices is common; at the G4Run-level, these data structures require much less memory overhead. Due to a lack of G4ConvergenceTester::operator+=(G4ConvergenceTester), the static version of G4ConvergenceTester is the only valid way to use G4ConvergenceTester in a scoring container. This is not the case for G4StatAnalysis, which can be used in lieu of G4double.

HOW TO RUN

Execute ts_scorers in the 'interactive mode' with visualization:

% ./ts_scorers

and type in the commands from run.mac line by line:

Idle> /control/verbose 2
Idle> /tracking/verbose 1
Idle> /run/beamOn 10
Idle> ...
Idle> exit

or

Idle> /control/execute run.mac
....
Idle> exit

Execute ts_scorers in the 'batch' mode from macro files (without visualization)
```
% ./ts_scorers run.mac
% ./ts_scorers run.mac > run.out
```

TIMEMORY USAGE

This example demonstrates profiling analysis with timemory (https://github.com/NERSC/timemory).

Compile Geant4 with timemory (-DGEANT4_USE_TIMEMORY=ON)
timemory provide timing within the Geant4 source code and within the example (TSRun::RecordEvent)
Analysis is echoed to stdout and generates several output files in a folder based on the name of the executable. In general that folder will be "timemory-{name of executable}-output"