Geant4 Cross Reference (Editor's cut) |
1 # Geant4 Tasking 2 3 This directory contains a Geant4 run manager which uses a tasking system for the G4Event loop. 4 This tasking system is fully compatible with TBB if `GEANT4_USE_TBB=ON` is specified when 5 configuring CMake. The default behavior, however, is to submit the tasks to an internal 6 thread-pool and task-queue. 7 8 ## G4TaskRunManager 9 10 `G4TaskRunManager` multiply inherits from `G4MTRunManager` and `PTL::TaskRunManager`. 11 `PTL::TaskRunManager` holds the thread-pool instance, the size of the thread-pool, 12 and the default task-queue. The constructor of `G4TaskRunManager` takes a `G4VUserTaskQueue` 13 pointer (can be nullptr), a boolean for whether to use TBB if available, and a grainsize. 14 15 ### Concepts 16 17 #### Grainsize 18 19 > Environment Variable: `G4FORCE_GRAINSIZE=N` 20 21 The grainsize is essentially the number of tasks. If set to 0, the default grainsize 22 will be `poolSize` and each thread will get `numEvents / poolSize` events. 23 If the grainsize is set to 1, then _all the events_ will be submitted as one task (i.e. be 24 processed serially by one thread in the pool). If the grainsize is set to 50 and there are 500 events, 25 then 50 tasks of 10 events will be submitted. 26 27 #### Events Per Tasks 28 29 > Environment Variable: `G4FORCE_EVENTS_PER_TASK=N` 30 31 Sometimes is easier to specify the number of events in a task instead of the grainsize. 32 If the events-per-task is set to 10 and there are 500 events, 33 then 50 tasks of 10 events will be submitted. 34 35 ### Default Constructor 36 37 ```cpp 38 G4TaskRunManager(G4VUserTaskQueue* = nullptr, bool useTBB = false, G4int grainsize = 0); 39 ``` 40 41 ## G4RunManagerFactory 42 43 An enumeration `G4RunManagerType` and a function `G4RunManagerFactory::CreateRunManager(...)` 44 was added to `"G4RunManagerFactory.hh"` to simplify the selection of the various run managers. 45 The first parameter is either one of the enumerated `G4RunManagerType` or a string identifier 46 47 | Enumeration | String ID | Class | 48 | ------------------------------- | ----------- | ------------------- | 49 | `G4RunManagerType::Serial` | `"Serial"` | `G4RunManager` | 50 | `G4RunManagerType::MT` | `"MT"` | `G4MTRunManager` | 51 | `G4RunManagerType::Tasking` | `"Tasking"` | `G4TaskRunManager` | 52 | `G4RunManagerType::TBB` | `"TBB"` | `G4TaskRunManager` | 53 | `G4RunManagerType::Default` | `"Default"` | Environment setting | 54 | `G4RunManagerType::SerialOnly` | `"Serial"` | `G4RunManager` | 55 | `G4RunManagerType::MTOnly` | `"MT"` | `G4MTRunManager` | 56 | `G4RunManagerType::TaskingOnly` | `"Tasking"` | `G4TaskRunManager` | 57 | `G4RunManagerType::TBBOnly` | `"TBB"` | `G4TaskRunManager` | 58 59 60 The `Default` enumeration value will defer to the following environment variable `G4RUN_MANAGER_TYPE` 61 if specified and will default to `"MT"` if MT is supported and serial if MT is not supported. 62 If the `G4FORCE_RUN_MANAGER_TYPE` environment variable is set, this variable will override the 63 value passed to the `CreateRunManager` function unless `G4RunManagerType` matches one of the `<TYPE>Only` 64 values. In this case, the environment variable is ignored and the run manager will be `<TYPE>`. 65 66 | Environment Variable | Options | Description | 67 | -------------------------- | ---------------------------------------- | -------------------------------------------------------------------------------------- | 68 | `G4RUN_MANAGER_TYPE` | `"Serial"`, `"MT"`, `"Tasking"`, `"TBB"` | Only applicable when `G4RunManagerType::Default` is used | 69 | `G4FORCE_RUN_MANAGER_TYPE` | `"Serial"`, `"MT"`, `"Tasking"`, `"TBB"` | Will override explicitly specifed `G4RunManagerType` if application allows and fail if type is not available | 70 71 ## Creating the G4RunManager 72 73 - The `G4RunManagerFactory::CreateRunManager(...)` function takes either `G4RunManagerType` enumerated type or string to specify the desired G4RunManager 74 - If a string is used, regex matching is used which is case-insensitive 75 - Returns a `G4RunManager*` 76 - Various overloads exist which just reorder passing in: 77 - `int numberOfThreads` - executes `G4MTRunManager::SetNumberOfThreads(numberOfThreads)` before returning if > 0 78 - default: `0` 79 - `bool fail_if_unavail` - will cause a runtime failure if requested type is not available with Geant4 build 80 - default: `true` 81 - `G4VTaskQueue*` - a task-queue manager 82 - default: `nullptr` 83 84 ```cpp 85 #include "G4RunManagerFactory.hh" 86 87 int main() 88 { 89 // specify {Serial, MT, Tasking, TBB} as the default, can be overridden 90 // with "G4FORCE_RUN_MANAGER_TYPE" env variable 91 auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::Serial); 92 auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::MT); 93 auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::Tasking); 94 auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::TBB); 95 96 // specify {Serial, MT, Tasking, TBB} as the required type, cannot be overridden 97 // with "G4FORCE_RUN_MANAGER_TYPE" env variable 98 auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::SerialOnly); 99 auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::MTOnly); 100 auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::TaskingOnly); 101 auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::TBBOnly); 102 103 // defer to "G4RUN_MANAGER_TYPE" env variable and default to MT if 104 // env variable is not set 105 auto* runmanager = G4RunManagerFactory::CreateRunManager(G4RunManagerType::Default); 106 107 // same as above 108 auto* runmanager = G4RunManagerFactory::CreateRunManager(); 109 } 110 ``` 111 112 ## Using the Tasking System 113 114 With G4TaskRunManager, Geant4 events will be launched asynchronously as tasks. These tasks are 115 placed into a queue until one of the thread in the pool is available to execute the task. Users can 116 take advantage of this system to load-balance expensive sub-event calculations which might have 117 previously resulted in serial bottlenecks. For example, if an application needs to do extensive event 118 analysis on electrons and thread #1 ends up with 10x as many of these events, the other threads 119 might finish their G4Run significantly eariler and be idle while thread #1 has a lot of work. 120 Tasking allows these analysis calculations to be offload back into the queue so that other 121 threads can contribute to their completion. 122 123 ### Option 1 - Submit Directly to Thread-Pool 124 125 - To execute the function `foo(int, double)` asynchronously: 126 127 ```cpp 128 // get the task manager 129 auto* task_manager = G4TaskRunManager::GetTaskManager(); 130 131 // submit task to thread-pool and receive a future for when the result is need 132 std::future<void> _fvoid = task_manager->async<void>(foo, 1, 1.0); 133 std::future<int> _fint = task_manager->async<int>(bar, 1.0); 134 135 // wait for task to execute 136 _fvoid.wait(); 137 _fint.wait(); 138 139 // get the result (if non-void) 140 auto result = _fint.get(); 141 ``` 142 143 ### Option 2 - Submit to task-group 144 145 - Obtain a pointer to the thread-pool instance 146 - Create a `task_group<T>` object where `T` is the return type of all the functions in the group 147 - If `T` is non-void, you must provide a join functor who return type and first argument are both references 148 to the joined type and the second argument is type `T`, e.g. `task_group<int>` can provide a join functor 149 with `vector<int>&` as the return type and `T` as the second argument or `int&` as the return and first argument 150 and `int` as the second argument 151 - If `T` is void, the join functor is optional and can be treated as a final synchronization operation after 152 all the tasks have been completed. 153 154 > NOTE: The join functor for task-groups are called sequentially on the thread that is 155 > waiting on `task_group<T>::join()` member function. 156 157 #### Global Definitions for Examples 158 159 ```cpp 160 // obtain thread-pool instance from task manager 161 static auto* thread_pool = G4TaskRunManager::GetThreadPool(); 162 163 // trivial int function which just returns value passed 164 int foo(int v) { return v; } 165 166 // function which launches CUDA kernel 167 void bar(int v) 168 { 169 cuda_bar<<<512, 1>>>(v); 170 } 171 ``` 172 173 #### Example with non-void return types from tasks 174 175 ```cpp 176 // put all return values from tasks into an array 177 auto join_vec = [](std::vector<int>& lhs, int rhs) { lhs.push_back(rhs); return lhs; }; 178 179 // sum the values returned by tasks 180 auto sum_int = [](int& lhs, int rhs) { return lhs += rhs; }; 181 182 // task group which applies 'join_vec' to all task return values 183 task_group<int> vec_tg(join_vec, thread_pool); 184 // task group with applies 'sum_int' to all task return values 185 task_group<int> sum_tg(sum_int, thread_pool); 186 187 // submit work to task-groups 188 vec_tg.exec(foo, 1); 189 vec_tg.exec(foo, 2); 190 sum_tg.exec(foo, 1); 191 sum_tg.exec(foo, 2); 192 193 // produces std::vector{ 1, 2 }; 194 auto vec_result = vec_tg.join(); 195 196 // produces 1 + 2 = 3 197 auto sum_result = sum_tg.join(); 198 ``` 199 200 #### Example with void return type from tasks 201 202 ```cpp 203 // wait for the GPU to finish 204 auto sync = []() { cudaDeviceSynchronize(); }; 205 206 // task group which applies 'sync' after all tasks have been executed 207 task_group<void> gpu_tg(sync, thread_pool); 208 // generic task group w/o a join functor 209 task_group<void> general_tg(thread_pool); 210 211 // submit work to task-groups 212 gpu_tg.exec(bar, 1); 213 gpu_tg.exec(bar, 2); 214 general_tg.exec(bar, 1); 215 general_tg.exec(bar, 2); 216 217 // 'sync()' will get called after all tasks in group have executed 218 // (i.e. return from 'bar' function). 'sync' will then block until 219 // all GPU work has been completed 220 gpu_tg.join(); 221 222 // will block only until all tasks in group have been executed 223 // (i.e. returned from 'bar' function) 224 generic_tg.join(); 225 226 ```