MaCh3  2.5.0
Reference Guide
Public Member Functions | Private Attributes | List of all members
SplineMonolithGPU Class Reference

Class responsible for calculating spline weight on GPU. More...

Public Member Functions

 SplineMonolithGPU ()
 constructor More...
 
virtual ~SplineMonolithGPU ()
 This function deallocates the resources allocated for the separate {x} and {ybcd} arrays in the and TF1 stuff at GPU. More...
 
__host__ void InitGPU_SplineMonolith (M3::float_t **cpu_total_weights, int n_events, unsigned int total_nknots, unsigned int n_splines, unsigned int n_tf1, int Eve_size)
 Allocate memory on gpu for spline monolith. More...
 
__host__ void CopyToGPU_SplineMonolith (const SplineMonoStruct *cpu_spline_handler, const std::vector< float > &cpu_many_array_TF1, const std::vector< short int > &cpu_paramNo_arr_TF1, const int n_events, const std::vector< unsigned int > &cpu_nParamPerEvent, const std::vector< unsigned int > &cpu_nParamPerEvent_TF1, const int n_params, const unsigned int n_splines, const short int spline_size, const unsigned int total_nknots, const unsigned int n_tf1)
 Copies data from CPU to GPU for the spline monolith. More...
 
__host__ void InitGPU_Segments (short int **segment)
 Allocate memory for spline segments. More...
 
__host__ void InitGPU_Vals (float **vals)
 Allocate memory for spline segments. More...
 
__host__ void RunGPU_SplineMonolith (M3::float_t *cpu_total_weights, float *vals, short int *segment)
 Run the GPU code for the separate many arrays. As in separate {x}, {y,b,c,d} arrays Pass the segment and the parameter values (binary search already performed in SplineBase::FindSplineSegment() More...
 
__host__ void CleanupPinnedMemory (M3::float_t *cpu_total_weights, short int *segment, float *vals)
 Clean up pinned variables at CPU. More...
 

Private Attributes

unsigned int * gpu_nParamPerEvent
 KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}. More...
 
unsigned int * gpu_nParamPerEvent_TF1
 KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}. More...
 
float * gpu_coeff_x
 KS: GPU arrays to hold X coefficient. More...
 
float * gpu_coeff_many
 GPU arrays to hold other coefficients. More...
 
unsigned int * gpu_nKnots_arr
 KS: GPU Number of knots per spline. More...
 
short int * gpu_paramNo_arr
 CW: GPU array with the number of points per spline (not per spline point!) More...
 
float * gpu_coeff_TF1_many
 GPU arrays to hold TF1 coefficients. More...
 
short int * gpu_nPoints_arr
 GPU arrays to hold number of points. More...
 
short int * gpu_paramNo_TF1_arr
 CW: GPU array with the number of points per TF1 object. More...
 
float * gpu_par_val
 CW: parameter value on GPU. More...
 
short int * gpu_spline_segment
 CW: Spline segment on GPU. More...
 
M3::float_tgpu_total_weights
 GPU arrays to hold weight for event. More...
 
float * gpu_weights
 GPU arrays to hold weight for each spline. More...
 
float * gpu_weights_tf1
 GPU arrays to hold weight for each TF1. More...
 
short int cpu_spline_size
 Size of splines living on CPU. More...
 
unsigned int cpu_n_splines
 Number of splines living on CPU. More...
 
unsigned int cpu_n_TF1
 Number of tf1 living on CPU. More...
 
int cpu_n_params
 Number of params living on CPU. More...
 
int cpu_n_events
 Number of events living on CPU. More...
 
cudaTextureObject_t text_coeff_x = 0
 KS: Textures are L1 cache variables which are well optimised for fetching. Make texture only for variables you often access but rarely overwrite. There are limits on texture memory so don't use huge arrays. More...
 
cudaTextureObject_t text_nParamPerEvent = 0
 KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}. More...
 
cudaTextureObject_t text_nParamPerEvent_TF1 = 0
 KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}. More...
 

Detailed Description

Class responsible for calculating spline weight on GPU.

Author
Richard Calland
Asher Kaboth
Clarence Wret
Kamil Skwarczynski

Definition at line 74 of file gpuSplineUtils.cuh.

Constructor & Destructor Documentation

◆ SplineMonolithGPU()

SplineMonolithGPU::SplineMonolithGPU ( )

constructor

Number of events living on CPU

Definition at line 58 of file gpuSplineUtils.cu.

58  {
60  cpu_n_params = -1;
61  cpu_n_events = -1;
62  cpu_n_TF1 = 0;
63  cpu_n_splines = 0;
64  cpu_spline_size = 0;
65 
66  gpu_weights = nullptr;
67  gpu_total_weights = nullptr;
68  gpu_nParamPerEvent = nullptr;
69  gpu_nPoints_arr = nullptr;
70  gpu_paramNo_arr = nullptr;
71  gpu_nKnots_arr = nullptr;
72  gpu_coeff_x = nullptr;
73  gpu_coeff_many = nullptr;
74  gpu_coeff_TF1_many = nullptr;
75  gpu_paramNo_TF1_arr = nullptr;
76  gpu_nParamPerEvent_TF1 = nullptr;
77  gpu_weights_tf1 = nullptr;
78 }
float * gpu_weights
GPU arrays to hold weight for each spline.
unsigned int * gpu_nParamPerEvent
KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number...
unsigned int cpu_n_TF1
Number of tf1 living on CPU.
unsigned int * gpu_nParamPerEvent_TF1
KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number...
float * gpu_weights_tf1
GPU arrays to hold weight for each TF1.
unsigned int * gpu_nKnots_arr
KS: GPU Number of knots per spline.
float * gpu_coeff_TF1_many
GPU arrays to hold TF1 coefficients.
float * gpu_coeff_many
GPU arrays to hold other coefficients.
short int * gpu_paramNo_TF1_arr
CW: GPU array with the number of points per TF1 object.
int cpu_n_params
Number of params living on CPU.
int cpu_n_events
Number of events living on CPU.
short int * gpu_nPoints_arr
GPU arrays to hold number of points.
unsigned int cpu_n_splines
Number of splines living on CPU.
short int * gpu_paramNo_arr
CW: GPU array with the number of points per spline (not per spline point!)
float * gpu_coeff_x
KS: GPU arrays to hold X coefficient.
short int cpu_spline_size
Size of splines living on CPU.
M3::float_t * gpu_total_weights
GPU arrays to hold weight for event.

◆ ~SplineMonolithGPU()

SplineMonolithGPU::~SplineMonolithGPU ( )
virtual

This function deallocates the resources allocated for the separate {x} and {ybcd} arrays in the and TF1 stuff at GPU.

Definition at line 81 of file gpuSplineUtils.cu.

81  {
82 // *******************************************
83  cudaFree(gpu_paramNo_arr);
84  cudaFree(gpu_nKnots_arr);
85 
86  // free the coefficient arrays
87  cudaDestroyTextureObject(text_coeff_x);
88  cudaFree(gpu_coeff_x);
89  cudaFree(gpu_coeff_many);
90 
91  cudaFree(gpu_par_val);
92  cudaFree(gpu_spline_segment);
93 
94  cudaFree(gpu_coeff_TF1_many);
95  cudaFree(gpu_paramNo_TF1_arr);
96  // free weights on the gpu
97  cudaFree(gpu_weights);
98  cudaFree(gpu_weights_tf1);
99  cudaFree(gpu_total_weights);
100  //KS: Before removing variable let's destroy texture
101  cudaDestroyTextureObject(text_nParamPerEvent);
102  cudaDestroyTextureObject(text_nParamPerEvent_TF1);
103 
104  cudaFree(gpu_nParamPerEvent);
105  cudaFree(gpu_nParamPerEvent_TF1);
106 }
float * gpu_par_val
CW: parameter value on GPU.
cudaTextureObject_t text_nParamPerEvent_TF1
KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of ...
cudaTextureObject_t text_coeff_x
KS: Textures are L1 cache variables which are well optimised for fetching. Make texture only for vari...
cudaTextureObject_t text_nParamPerEvent
KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of ...
short int * gpu_spline_segment
CW: Spline segment on GPU.

Member Function Documentation

◆ CleanupPinnedMemory()

__host__ void SplineMonolithGPU::CleanupPinnedMemory ( M3::float_t cpu_total_weights,
short int *  segment,
float *  vals 
)

Clean up pinned variables at CPU.

Parameters
cpu_total_weightsPointer to the total weights array on the CPU (used if Weight_On_SplineBySpline_Basis is not defined).
segmentFound spline segment for each parameter
valsValue to which we want reweight for each parameter

Definition at line 518 of file gpuSplineUtils.cu.

519  {
520 // *******************************************
521  cudaFreeHost(cpu_total_weights);
522  cudaFreeHost(segment);
523  cudaFreeHost(vals);
524 
525  cpu_total_weights = nullptr;
526  segment = nullptr;
527  vals = nullptr;
528 }

◆ CopyToGPU_SplineMonolith()

__host__ void SplineMonolithGPU::CopyToGPU_SplineMonolith ( const SplineMonoStruct cpu_spline_handler,
const std::vector< float > &  cpu_many_array_TF1,
const std::vector< short int > &  cpu_paramNo_arr_TF1,
const int  n_events,
const std::vector< unsigned int > &  cpu_nParamPerEvent,
const std::vector< unsigned int > &  cpu_nParamPerEvent_TF1,
const int  n_params,
const unsigned int  n_splines,
const short int  spline_size,
const unsigned int  total_nknots,
const unsigned int  n_tf1 
)

Copies data from CPU to GPU for the spline monolith.

This function transfers the necessary spline data and parameters from the CPU to the GPU, including TF1-related arrays and parameters. This setup is crucial for spline evaluations on the GPU.

Parameters
cpu_spline_handlerPointer to the structure managing spline data on the CPU.
cpu_many_array_TF1Array of TF1 parameters on the CPU.
cpu_paramNo_arr_TF1Array containing parameter numbers for TF1 objects.
n_eventsNumber of events, necessary for correct data handling.
cpu_nParamPerEventArray indicating the number of parameters per event.
cpu_nParamPerEvent_TF1Array indicating the number of parameters per TF1 object.
n_paramsTotal number of parameters across all splines.
n_splinesTotal number of spline objects.
spline_sizeSize of each spline object.
total_nknotsTotal number of knots across all splines.
n_tf1Total number of TF1 objects.

Size of splines living

Definition at line 200 of file gpuSplineUtils.cu.

215  {
216 // ******************************************************
217  // Write to the global statics (h_* denotes host stored variable)
218  cpu_n_params = n_params;
219  // Number of events
220  cpu_n_events = n_events;
221  // Total number of valid TF1 for all loaded events
222  cpu_n_TF1 = n_tf1;
223  // Total number of valid splines for all loaded events
224  cpu_n_splines = n_splines;
226  cpu_spline_size = spline_size;
227 
228  //CW: Allocate memory for the frequently copied objects
229  cudaMalloc(&gpu_par_val, n_params * sizeof(float));
230  cudaMalloc(&gpu_spline_segment, n_params * sizeof(short int));
231 
232  // Copy the coefficient arrays to the GPU; this only happens once per entire Markov Chain so is OK to do multiple extensive memory copies
233  cudaMemcpy(gpu_coeff_many, cpu_spline_handler->coeff_many.data(), sizeof(float)*total_nknots*_nCoeff_, cudaMemcpyHostToDevice);
234  CudaCheckError();
235 
236  cudaMemcpy(gpu_coeff_x, cpu_spline_handler->coeff_x.data(), sizeof(float)*spline_size*n_params, cudaMemcpyHostToDevice);
237  CudaCheckError();
238 
239  //KS: Bind our texture with the GPU variable
240  //KS: Tried also moving gpu_many_array to texture memory it only worked with restricted number of MC runs, most likely hit texture memory limit :(
241  struct cudaResourceDesc resDesc_coeff_x;
242  memset(&resDesc_coeff_x, 0, sizeof(resDesc_coeff_x));
243  resDesc_coeff_x.resType = cudaResourceTypeLinear;
244  resDesc_coeff_x.res.linear.devPtr = gpu_coeff_x;
245  resDesc_coeff_x.res.linear.desc = cudaCreateChannelDesc<float>();
246  resDesc_coeff_x.res.linear.sizeInBytes = sizeof(float)*spline_size*n_params;
247 
248  // Specify texture object parameters
249  struct cudaTextureDesc texDesc_coeff_x;
250  memset(&texDesc_coeff_x, 0, sizeof(texDesc_coeff_x));
251  texDesc_coeff_x.readMode = cudaReadModeElementType;
252 
253  // Create texture object
254  cudaCreateTextureObject(&text_coeff_x, &resDesc_coeff_x, &texDesc_coeff_x, nullptr);
255  CudaCheckError();
256 
257  // Also copy the parameter number for each spline onto the GPU; i.e. what spline parameter are we calculating right now
258  cudaMemcpy(gpu_paramNo_arr, cpu_spline_handler->paramNo_arr.data(), n_splines*sizeof(short int), cudaMemcpyHostToDevice);
259  CudaCheckError();
260 
261  // Also copy the knot map for each spline onto the GPU;
262  cudaMemcpy(gpu_nKnots_arr, cpu_spline_handler->nKnots_arr.data(), n_splines*sizeof(unsigned int), cudaMemcpyHostToDevice);
263  CudaCheckError();
264 
265  //Now TF1
266  // Copy the coefficient arrays to the GPU; this only happens once per entire Markov Chain so is OK to do multiple extensive memory copies
267  cudaMemcpy(gpu_coeff_TF1_many, cpu_many_array_TF1.data(), sizeof(float)*n_tf1*_nTF1Coeff_, cudaMemcpyHostToDevice);
268  CudaCheckError();
269 
270  // Also copy the parameter number for each TF1 onto the GPU; i.e. what TF1 parameter are we calculating right now
271  cudaMemcpy(gpu_paramNo_TF1_arr, cpu_paramNo_arr_TF1.data(), n_tf1*sizeof(short int), cudaMemcpyHostToDevice);
272  CudaCheckError();
273 
274  //KS: Keep track how much splines each event has
275  cudaMemcpy(gpu_nParamPerEvent, cpu_nParamPerEvent.data(), 2*n_events*sizeof(unsigned int), cudaMemcpyHostToDevice);
276  CudaCheckError();
277 
278  //KS: Bind our texture with the GPU variable
279  // create a resource descriptor based on device pointers
280  struct cudaResourceDesc resDesc_nParamPerEvent;
281  memset(&resDesc_nParamPerEvent, 0, sizeof(resDesc_nParamPerEvent));
282  resDesc_nParamPerEvent.resType = cudaResourceTypeLinear;
283  resDesc_nParamPerEvent.res.linear.devPtr = gpu_nParamPerEvent;
284  resDesc_nParamPerEvent.res.linear.desc = cudaCreateChannelDesc<unsigned int>();
285  resDesc_nParamPerEvent.res.linear.sizeInBytes = 2*n_events*sizeof(unsigned int);
286 
287  // Specify texture object parameters
288  struct cudaTextureDesc texDesc_nParamPerEvent;
289  memset(&texDesc_nParamPerEvent, 0, sizeof(texDesc_nParamPerEvent));
290  texDesc_nParamPerEvent.readMode = cudaReadModeElementType;
291 
292  //Finally create texture object
293  cudaCreateTextureObject(&text_nParamPerEvent, &resDesc_nParamPerEvent, &texDesc_nParamPerEvent, nullptr);
294  CudaCheckError();
295 
296  // Now TF1
297  cudaMemcpy(gpu_nParamPerEvent_TF1, cpu_nParamPerEvent_TF1.data(), 2*n_events*sizeof(unsigned int), cudaMemcpyHostToDevice);
298  CudaCheckError();
299 
300  //KS: Bind our texture with the GPU variable
301  // create a resource descriptor based on device pointers
302  struct cudaResourceDesc resDesc_nParamPerEvent_tf1;
303  memset(&resDesc_nParamPerEvent_tf1, 0, sizeof(resDesc_nParamPerEvent_tf1));
304  resDesc_nParamPerEvent_tf1.resType = cudaResourceTypeLinear;
305  resDesc_nParamPerEvent_tf1.res.linear.devPtr = gpu_nParamPerEvent_TF1;
306  resDesc_nParamPerEvent_tf1.res.linear.desc = cudaCreateChannelDesc<unsigned int>();
307  resDesc_nParamPerEvent_tf1.res.linear.sizeInBytes = 2*n_events*sizeof(unsigned int);
308 
309  // Specify texture object parameters
310  struct cudaTextureDesc texDesc_nParamPerEvent_tf1;
311  memset(&texDesc_nParamPerEvent_tf1, 0, sizeof(texDesc_nParamPerEvent_tf1));
312  texDesc_nParamPerEvent_tf1.readMode = cudaReadModeElementType;
313 
314  //Finally create texture object
315  cudaCreateTextureObject(&text_nParamPerEvent_TF1, &resDesc_nParamPerEvent_tf1, &texDesc_nParamPerEvent_tf1, nullptr);
316  CudaCheckError();
317 }
constexpr int _nCoeff_
KS: We store coefficients {y,b,c,d} in one array one by one, this is only to define it once rather th...
Definition: SplineCommon.h:18
constexpr int _nTF1Coeff_
KS: For TF1 we store at most 5 coefficients, we could make it more flexible but for now define it her...
Definition: SplineCommon.h:20
#define CudaCheckError()
Definition: gpuUtils.cuh:21
std::vector< unsigned int > nKnots_arr
KS: CPU Number of knots per spline.
Definition: SplineCommon.h:47
std::vector< float > coeff_x
KS: CPU arrays to hold X coefficient.
Definition: SplineCommon.h:41
std::vector< float > coeff_many
CPU arrays to hold other coefficients.
Definition: SplineCommon.h:44
std::vector< short int > paramNo_arr
CW: CPU array with the number of points per spline (not per spline point!)
Definition: SplineCommon.h:50

◆ InitGPU_Segments()

__host__ void SplineMonolithGPU::InitGPU_Segments ( short int **  segment)

Allocate memory for spline segments.

Parameters
segmentFound spline segment for each parameter

Definition at line 178 of file gpuSplineUtils.cu.

178  {
179 // *******************************************
180  //KS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
181  cudaMallocHost((void **) segment, cpu_n_params*sizeof(short int));
182  CudaCheckError();
183 }

◆ InitGPU_SplineMonolith()

__host__ void SplineMonolithGPU::InitGPU_SplineMonolith ( M3::float_t **  cpu_total_weights,
int  n_events,
unsigned int  total_nknots,
unsigned int  n_splines,
unsigned int  n_tf1,
int  Eve_size 
)

Allocate memory on gpu for spline monolith.

Parameters
cpu_total_weightsKS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
n_eventsNumber of events, this is necessary to allocate correctly memory
total_nknotsTotal number of knots in all splines summed
n_splinesTotal number of spline objects, not knots
n_tf1Total number of TF1 objects, not coefficients

Definition at line 110 of file gpuSplineUtils.cu.

116  {
117 // *******************************************
118  // Allocate chunks of memory to GPU
119  cudaMalloc((void **) &gpu_paramNo_arr, n_splines*sizeof(short int));
120  CudaCheckError();
121 
122  cudaMalloc((void **) &gpu_nKnots_arr, n_splines*sizeof(unsigned int));
123  CudaCheckError();
124 
125  cudaMalloc((void **) &gpu_coeff_x, Eve_size*sizeof(float));
126  CudaCheckError();
127 
128  cudaMalloc((void **) &gpu_coeff_many, _nCoeff_*total_nknots*sizeof(float));
129  CudaCheckError();
130 
131  // Allocate memory for the array of weights to be returned to CPU
132  cudaMalloc((void **) &gpu_weights, n_splines*sizeof(float));
133  CudaCheckError();
134 
135  // Now TF1 specific
136  cudaMalloc((void **) &gpu_coeff_TF1_many, _nTF1Coeff_*n_tf1*sizeof(float));
137  CudaCheckError();
138 
139  cudaMalloc((void **) &gpu_weights_tf1, n_tf1*sizeof(float));
140  CudaCheckError();
141 
142  cudaMalloc((void **) &gpu_paramNo_TF1_arr, n_tf1*sizeof(short int));
143  CudaCheckError();
144 
145  //KS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
146  cudaMallocHost((void **) cpu_total_weights, n_events*sizeof(M3::float_t));
147  CudaCheckError();
148 
149  //KS: Allocate memory for the array of total weights to be returned to CPU
150  cudaMalloc((void **) &gpu_total_weights, n_events*sizeof(M3::float_t));
151  CudaCheckError();
152 
153  //KS: Allocate memory for the map keeping track how many splines each parameter has
154  cudaMalloc((void **) &gpu_nParamPerEvent, 2*n_events*sizeof(unsigned int));
155  CudaCheckError();
156 
157  //KS: Allocate memory for the map keeping track how many TF1 each parameter has
158  cudaMalloc((void **) &gpu_nParamPerEvent_TF1, 2*n_events*sizeof(unsigned int));
159  CudaCheckError();
160 
161  // Print allocation info to user
162  printf("Allocated %i entries for paramNo and nKnots arrays, size = %f MB\n",
163  n_splines, static_cast<double>(sizeof(short int) * n_splines + sizeof(unsigned int) * n_splines) / 1.0e6);
164  printf("Allocated %i entries for x coeff arrays, size = %f MB\n",
165  Eve_size, static_cast<double>(sizeof(float) * Eve_size) / 1.0e6);
166  printf("Allocated %i entries for {ybcd} coeff arrays, size = %f MB\n",
167  _nCoeff_ * total_nknots, static_cast<double>(sizeof(float) * _nCoeff_ * total_nknots) / 1.0e6);
168  printf("Allocated %i entries for TF1 coefficient arrays, size = %f MB\n",
169  _nTF1Coeff_ * n_tf1, static_cast<double>(sizeof(float) * _nTF1Coeff_ * n_tf1) / 1.0e6);
170 
171  //KS: Ask CUDA about memory usage
172  checkGpuMem();
173  PrintNdevices();
174 }
void checkGpuMem()
KS: Get some fancy info about VRAM usage.
Definition: gpuUtils.cu:43
void PrintNdevices()
KS: Get some fancy info about GPU.
Definition: gpuUtils.cu:58
double float_t
Definition: Core.h:37

◆ InitGPU_Vals()

__host__ void SplineMonolithGPU::InitGPU_Vals ( float **  vals)

Allocate memory for spline segments.

Parameters
valsValue to which we want reweight for each parameter

Definition at line 187 of file gpuSplineUtils.cu.

187  {
188 // *******************************************
189  //KS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
190  cudaMallocHost((void **) vals, cpu_n_params*sizeof(float));
191  CudaCheckError();
192 }

◆ RunGPU_SplineMonolith()

__host__ void SplineMonolithGPU::RunGPU_SplineMonolith ( M3::float_t cpu_total_weights,
float *  vals,
short int *  segment 
)

Run the GPU code for the separate many arrays. As in separate {x}, {y,b,c,d} arrays Pass the segment and the parameter values (binary search already performed in SplineBase::FindSplineSegment()

Executes the GPU code for calculating spline weights.

This function runs the GPU computation for the spline monolith. It assumes that the appropriate segment has already been identified through binary search in the SplineBase::FindSplineSegment() function.

Parameters
cpu_weightsPointer to the array of weights on the CPU (used if Weight_On_SplineBySpline_Basis is defined).
cpu_weights_tf1Pointer to the array of TF1 weights (used if Weight_On_SplineBySpline_Basis is defined).
cpu_total_weightsPointer to the total weights array (used if Weight_On_SplineBySpline_Basis is not defined).
valsPointer to an array holding the parameter values to be processed.
segmentPointer to an array containing segment indices for parameters.

Definition at line 443 of file gpuSplineUtils.cu.

448  {
449 // *****************************************
450  dim3 block_size;
451  dim3 grid_size;
452 
453  block_size.x = _BlockSize_;
454  grid_size.x = (cpu_n_splines / block_size.x) + 1;
455 
456  // Copy the segment values to the GPU (segment_gpu), which is cpu_n_params long
457  cudaMemcpy(gpu_spline_segment, segment, cpu_n_params * sizeof(short int), cudaMemcpyHostToDevice);
458  CudaCheckError();
459 
460  // Copy the parameter values values to the GPU (vals_gpu), which is cpu_n_params long
461  cudaMemcpy(gpu_par_val, vals, cpu_n_params * sizeof(float), cudaMemcpyHostToDevice);
462  CudaCheckError();
463 
464  // KS: Consider asynchronous kernel call, this might help EvalOnGPU_Splines and EvalOnGPU_TF1 are independent
465  // Set the cache config to prefer L1 for the kernel
466  //cudaFuncSetCacheConfig(EvalOnGPU_Splines, cudaFuncCachePreferL1);
467  EvalOnGPU_Splines<<<grid_size, block_size>>>(
473  gpu_par_val,
475 
476  gpu_weights,
478  );
479  CudaCheckError();
480 
481  grid_size.x = (cpu_n_TF1 / block_size.x) + 1;
482  EvalOnGPU_TF1<<<grid_size, block_size>>>(
483  cpu_n_TF1,
486  gpu_par_val,
488  );
489  CudaCheckError();
490 
491  grid_size.x = (cpu_n_events / block_size.x) + 1;
492 
493  EvalOnGPU_TotWeight<<<grid_size, block_size>>>(
494  cpu_n_events,
495  gpu_weights,
497 
499 
502  );
503  CudaCheckError();
504 
505  //KS: Here we have to make a somewhat large GPU->CPU transfer because it is proportional to number of events
506  //KS: Normally code wait for memory transfer to finish before moving further cudaMemcpyAsync means we will continue to execute code and in a meantime keep copying stuff.
507  cudaMemcpyAsync(cpu_total_weights, gpu_total_weights, cpu_n_events * sizeof(M3::float_t), cudaMemcpyDeviceToHost, 0);
508  CudaCheckError();
509 
510  #ifdef MACH3_DEBUG
511  printf("Copied GPU total weights to CPU with SUCCESS (drink more tea)\n");
512  printf("Released calculated response from GPU with SUCCESS (drink most tea)\n");
513  #endif
514 }
#define _BlockSize_
KS: Need it for shared memory, there is way to use dynamic shared memory but I am lazy right now.
Definition: gpuUtils.cuh:24

Member Data Documentation

◆ cpu_n_events

int SplineMonolithGPU::cpu_n_events
private

Number of events living on CPU.

Definition at line 219 of file gpuSplineUtils.cuh.

◆ cpu_n_params

int SplineMonolithGPU::cpu_n_params
private

Number of params living on CPU.

Definition at line 217 of file gpuSplineUtils.cuh.

◆ cpu_n_splines

unsigned int SplineMonolithGPU::cpu_n_splines
private

Number of splines living on CPU.

Definition at line 213 of file gpuSplineUtils.cuh.

◆ cpu_n_TF1

unsigned int SplineMonolithGPU::cpu_n_TF1
private

Number of tf1 living on CPU.

Definition at line 215 of file gpuSplineUtils.cuh.

◆ cpu_spline_size

short int SplineMonolithGPU::cpu_spline_size
private

Size of splines living on CPU.

Definition at line 211 of file gpuSplineUtils.cuh.

◆ gpu_coeff_many

float* SplineMonolithGPU::gpu_coeff_many
private

GPU arrays to hold other coefficients.

Definition at line 183 of file gpuSplineUtils.cuh.

◆ gpu_coeff_TF1_many

float* SplineMonolithGPU::gpu_coeff_TF1_many
private

GPU arrays to hold TF1 coefficients.

Definition at line 192 of file gpuSplineUtils.cuh.

◆ gpu_coeff_x

float* SplineMonolithGPU::gpu_coeff_x
private

KS: GPU arrays to hold X coefficient.

Definition at line 180 of file gpuSplineUtils.cuh.

◆ gpu_nKnots_arr

unsigned int* SplineMonolithGPU::gpu_nKnots_arr
private

KS: GPU Number of knots per spline.

Definition at line 186 of file gpuSplineUtils.cuh.

◆ gpu_nParamPerEvent

unsigned int* SplineMonolithGPU::gpu_nParamPerEvent
private

KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}.

Definition at line 175 of file gpuSplineUtils.cuh.

◆ gpu_nParamPerEvent_TF1

unsigned int* SplineMonolithGPU::gpu_nParamPerEvent_TF1
private

KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}.

Definition at line 177 of file gpuSplineUtils.cuh.

◆ gpu_nPoints_arr

short int* SplineMonolithGPU::gpu_nPoints_arr
private

GPU arrays to hold number of points.

Definition at line 194 of file gpuSplineUtils.cuh.

◆ gpu_par_val

float* SplineMonolithGPU::gpu_par_val
private

CW: parameter value on GPU.

Definition at line 199 of file gpuSplineUtils.cuh.

◆ gpu_paramNo_arr

short int* SplineMonolithGPU::gpu_paramNo_arr
private

CW: GPU array with the number of points per spline (not per spline point!)

Definition at line 189 of file gpuSplineUtils.cuh.

◆ gpu_paramNo_TF1_arr

short int* SplineMonolithGPU::gpu_paramNo_TF1_arr
private

CW: GPU array with the number of points per TF1 object.

Definition at line 196 of file gpuSplineUtils.cuh.

◆ gpu_spline_segment

short int* SplineMonolithGPU::gpu_spline_segment
private

CW: Spline segment on GPU.

Definition at line 201 of file gpuSplineUtils.cuh.

◆ gpu_total_weights

M3::float_t* SplineMonolithGPU::gpu_total_weights
private

GPU arrays to hold weight for event.

Definition at line 204 of file gpuSplineUtils.cuh.

◆ gpu_weights

float* SplineMonolithGPU::gpu_weights
private

GPU arrays to hold weight for each spline.

Definition at line 206 of file gpuSplineUtils.cuh.

◆ gpu_weights_tf1

float* SplineMonolithGPU::gpu_weights_tf1
private

GPU arrays to hold weight for each TF1.

Definition at line 208 of file gpuSplineUtils.cuh.

◆ text_coeff_x

cudaTextureObject_t SplineMonolithGPU::text_coeff_x = 0
private

KS: Textures are L1 cache variables which are well optimised for fetching. Make texture only for variables you often access but rarely overwrite. There are limits on texture memory so don't use huge arrays.

Definition at line 225 of file gpuSplineUtils.cuh.

◆ text_nParamPerEvent

cudaTextureObject_t SplineMonolithGPU::text_nParamPerEvent = 0
private

KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}.

Definition at line 227 of file gpuSplineUtils.cuh.

◆ text_nParamPerEvent_TF1

cudaTextureObject_t SplineMonolithGPU::text_nParamPerEvent_TF1 = 0
private

KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}.

Definition at line 229 of file gpuSplineUtils.cuh.


The documentation for this class was generated from the following files: