Geant4 Cross Reference (Editor's cut)

Cross-Referencing   Geant4
Geant4/run/README.md

Version: [ ReleaseNotes ] [ 10.2 ] [ 10.2.p1 ] [ 10.2.p2 ] [ 10.2.p3 ] [ 10.3 ] [ 10.3.p1 ] [ 10.3.p2 ] [ 10.3.p3 ] [ 10.4 ] [ 10.4.p1 ] [ 10.4.p2 ] [ 10.4.p3 ] [ 10.5 ] [ 10.5.p1 ] [ 10.6 ] [ 10.6.p1 ] [ 10.6.p2 ] [ 10.6.p3 ] [ 10.7 ] [ 10.7.p1 ] [ 10.7.p2 ] [ 10.7.p3 ] [ 10.7.p4 ] [ 11.0 ] [ 11.0.p1 ] [ 11.0.p2 ] [ 11.0.p3 ] [ 11.0.p4 ] [ 11.1 ] [ 11.1.1 ] [ 11.1.2 ] [ 11.1.3 ] [ 11.2 ] [ 11.2.1 ] [ 11.2.2 ]

  1 # Geant4 Tasking
  2 
  3 This directory contains a Geant4 run manager which uses a tasking system for the G4Event loop.
  4 This tasking system is fully compatible with TBB if `GEANT4_USE_TBB=ON` is specified when
  5 configuring CMake. The default behavior, however, is to submit the tasks to an internal
  6 thread-pool and task-queue.
  7 
  8 ## G4TaskRunManager
  9 
 10 `G4TaskRunManager` multiply inherits from `G4MTRunManager` and `PTL::TaskRunManager`.
 11 `PTL::TaskRunManager` holds the thread-pool instance, the size of the thread-pool,
 12 and the default task-queue. The constructor of `G4TaskRunManager` takes a `G4VUserTaskQueue`
 13 pointer (can be nullptr), a boolean for whether to use TBB if available, and a grainsize.
 14 
 15 ### Concepts
 16 
 17 #### Grainsize
 18 
 19 > Environment Variable: `G4FORCE_GRAINSIZE=N`
 20 
 21 The grainsize is essentially the number of tasks. If set to 0, the default grainsize
 22 will be `poolSize` and each thread will get `numEvents / poolSize` events.
 23 If the grainsize is set to 1, then _all the events_ will be submitted as one task (i.e. be
 24 processed serially by one thread in the pool). If the grainsize is set to 50 and there are 500 events,
 25 then 50 tasks of 10 events will be submitted.
 26 
 27 #### Events Per Tasks
 28 
 29 > Environment Variable: `G4FORCE_EVENTS_PER_TASK=N`
 30 
 31 Sometimes is easier to specify the number of events in a task instead of the grainsize.
 32 If the events-per-task is set to 10 and there are 500 events,
 33 then 50 tasks of 10 events will be submitted.
 34 
 35 ### Default Constructor
 36 
 37 ```cpp
 38     G4TaskRunManager(G4VUserTaskQueue* = nullptr, bool useTBB = false, G4int grainsize = 0);
 39 ```
 40 
 41 ## G4RunManagerFactory
 42 
 43 An enumeration `G4RunManagerType` and a function `G4RunManagerFactory::CreateRunManager(...)`
 44 was added to `"G4RunManagerFactory.hh"` to simplify the selection of the various run managers.
 45 The first parameter is either one of the enumerated `G4RunManagerType` or a string identifier
 46 
 47 | Enumeration                     | String ID   | Class               |
 48 | ------------------------------- | ----------- | ------------------- |
 49 | `G4RunManagerType::Serial`      | `"Serial"`  | `G4RunManager`      |
 50 | `G4RunManagerType::MT`          | `"MT"`      | `G4MTRunManager`    |
 51 | `G4RunManagerType::Tasking`     | `"Tasking"` | `G4TaskRunManager`  |
 52 | `G4RunManagerType::TBB`         | `"TBB"`     | `G4TaskRunManager`  |
 53 | `G4RunManagerType::Default`     | `"Default"` | Environment setting |
 54 | `G4RunManagerType::SerialOnly`  | `"Serial"`  | `G4RunManager`      |
 55 | `G4RunManagerType::MTOnly`      | `"MT"`      | `G4MTRunManager`    |
 56 | `G4RunManagerType::TaskingOnly` | `"Tasking"` | `G4TaskRunManager`  |
 57 | `G4RunManagerType::TBBOnly`     | `"TBB"`     | `G4TaskRunManager`  |
 58 
 59 
 60 The `Default` enumeration value will defer to the following environment variable `G4RUN_MANAGER_TYPE`
 61 if specified and will default to `"MT"` if MT is supported and serial if MT is not supported.
 62 If the `G4FORCE_RUN_MANAGER_TYPE` environment variable is set, this variable will override the
 63 value passed to the `CreateRunManager` function unless `G4RunManagerType` matches one of the `<TYPE>Only`
 64 values. In this case, the environment variable is ignored and the run manager will be `<TYPE>`.
 65 
 66 | Environment Variable       | Options                                  | Description                                                                            |
 67 | -------------------------- | ---------------------------------------- | -------------------------------------------------------------------------------------- |
 68 | `G4RUN_MANAGER_TYPE`       | `"Serial"`, `"MT"`, `"Tasking"`, `"TBB"` | Only applicable when `G4RunManagerType::Default` is used                               |
 69 | `G4FORCE_RUN_MANAGER_TYPE` | `"Serial"`, `"MT"`, `"Tasking"`, `"TBB"` | Will override explicitly specifed `G4RunManagerType` if application allows and fail if type is not available |
 70 
 71 ## Creating the G4RunManager
 72 
 73 - The `G4RunManagerFactory::CreateRunManager(...)` function takes either `G4RunManagerType` enumerated type or string to specify the desired G4RunManager
 74   - If a string is used, regex matching is used which is case-insensitive
 75   - Returns a `G4RunManager*`
 76   - Various overloads exist which just reorder passing in:
 77     - `int numberOfThreads` - executes `G4MTRunManager::SetNumberOfThreads(numberOfThreads)` before returning if > 0
 78       - default: `0`
 79     - `bool fail_if_unavail` - will cause a runtime failure if requested type is not available with Geant4 build
 80       - default: `true`
 81     - `G4VTaskQueue*` - a task-queue manager
 82       - default: `nullptr`
 83 
 84 ```cpp
 85 #include "G4RunManagerFactory.hh"
 86 
 87 int main()
 88 {
 89     // specify {Serial, MT, Tasking, TBB} as the default, can be overridden
 90     // with "G4FORCE_RUN_MANAGER_TYPE" env variable
 91     auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::Serial);
 92     auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::MT);
 93     auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::Tasking);
 94     auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::TBB);
 95 
 96     // specify {Serial, MT, Tasking, TBB} as the required type, cannot be overridden
 97     // with "G4FORCE_RUN_MANAGER_TYPE" env variable
 98     auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::SerialOnly);
 99     auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::MTOnly);
100     auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::TaskingOnly);
101     auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::TBBOnly);
102 
103     // defer to "G4RUN_MANAGER_TYPE" env variable and default to MT if
104     // env variable is not set
105     auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::Default);
106 
107     // same as above
108     auto* runmanager = G4RunManagerFactory::CreateRunManager();
109 }
110 ```
111 
112 ## Using the Tasking System
113 
114 With G4TaskRunManager, Geant4 events will be launched asynchronously as tasks. These tasks are
115 placed into a queue until one of the thread in the pool is available to execute the task. Users can
116 take advantage of this system to load-balance expensive sub-event calculations which might have
117 previously resulted in serial bottlenecks. For example, if an application needs to do extensive event
118 analysis on electrons and thread #1 ends up with 10x as many of these events, the other threads
119 might finish their G4Run significantly eariler and be idle while thread #1 has a lot of work.
120 Tasking allows these analysis calculations to be offload back into the queue so that other
121 threads can contribute to their completion.
122 
123 ### Option 1 - Submit Directly to Thread-Pool
124 
125 - To execute the function `foo(int, double)` asynchronously:
126 
127 ```cpp
128 // get the task manager
129 auto* task_manager = G4TaskRunManager::GetTaskManager();
130 
131 // submit task to thread-pool and receive a future for when the result is need
132 std::future<void> _fvoid = task_manager->async<void>(foo, 1, 1.0);
133 std::future<int>  _fint  = task_manager->async<int>(bar, 1.0);
134 
135 // wait for task to execute
136 _fvoid.wait();
137 _fint.wait();
138 
139 // get the result (if non-void)
140 auto result = _fint.get();
141 ```
142 
143 ### Option 2 - Submit to task-group
144 
145 - Obtain a pointer to the thread-pool instance
146 - Create a `task_group<T>` object where `T` is the return type of all the functions in the group
147   - If `T` is non-void, you must provide a join functor who return type and first argument are both references
148   to the joined type and the second argument is type `T`, e.g. `task_group<int>` can provide a join functor
149   with `vector<int>&` as the return type and `T` as the second argument or `int&` as the return and first argument
150   and `int` as the second argument
151   - If `T` is void, the join functor is optional and can be treated as a final synchronization operation after
152   all the tasks have been completed.
153 
154 > NOTE: The join functor for task-groups are called sequentially on the thread that is
155 > waiting on `task_group<T>::join()` member function.
156 
157 #### Global Definitions for Examples
158 
159 ```cpp
160 // obtain thread-pool instance from task manager
161 static auto* thread_pool = G4TaskRunManager::GetThreadPool();
162 
163 // trivial int function which just returns value passed
164 int foo(int v) { return v; }
165 
166 // function which launches CUDA kernel
167 void bar(int v)
168 {
169     cuda_bar<<<512, 1>>>(v);
170 }
171 ```
172 
173 #### Example with non-void return types from tasks
174 
175 ```cpp
176 // put all return values from tasks into an array
177 auto join_vec = [](std::vector<int>& lhs, int rhs) { lhs.push_back(rhs); return lhs; };
178 
179 // sum the values returned by tasks
180 auto sum_int = [](int& lhs, int rhs) { return lhs += rhs; };
181 
182 // task group which applies 'join_vec' to all task return values
183 task_group<int>  vec_tg(join_vec, thread_pool);
184 // task group with applies 'sum_int' to all task return values
185 task_group<int>  sum_tg(sum_int, thread_pool);
186 
187 // submit work to task-groups
188 vec_tg.exec(foo, 1);
189 vec_tg.exec(foo, 2);
190 sum_tg.exec(foo, 1);
191 sum_tg.exec(foo, 2);
192 
193 // produces std::vector{ 1, 2 };
194 auto vec_result = vec_tg.join();
195 
196 // produces 1 + 2 = 3
197 auto sum_result = sum_tg.join();
198 ```
199 
200 #### Example with void return type from tasks
201 
202 ```cpp
203 // wait for the GPU to finish
204 auto sync = []() { cudaDeviceSynchronize(); };
205 
206 // task group which applies 'sync' after all tasks have been executed
207 task_group<void> gpu_tg(sync, thread_pool);
208 // generic task group w/o a join functor
209 task_group<void> general_tg(thread_pool);
210 
211 // submit work to task-groups
212 gpu_tg.exec(bar, 1);
213 gpu_tg.exec(bar, 2);
214 general_tg.exec(bar, 1);
215 general_tg.exec(bar, 2);
216 
217 // 'sync()' will get called after all tasks in group have executed
218 // (i.e. return from 'bar' function). 'sync' will then block until
219 // all GPU work has been completed
220 gpu_tg.join();
221 
222 // will block only until all tasks in group have been executed
223 // (i.e. returned from 'bar' function)
224 generic_tg.join();
225 
226 ```