MaCh3 2.2.1
Reference Guide
Loading...
Searching...
No Matches
Public Member Functions | Private Attributes | List of all members
SMonolithGPU Class Reference

Class responsible for calculating spline weight on GPU. More...

Public Member Functions

 SMonolithGPU ()
 constructor
 
virtual ~SMonolithGPU ()
 destructor
 
__host__ void InitGPU_SplineMonolith (float **cpu_total_weights, int n_events, unsigned int total_nknots, unsigned int n_splines, unsigned int n_tf1, int Eve_size)
 Allocate memory on gpu for spline monolith.
 
__host__ void CopyToGPU_SplineMonolith (SplineMonoStruct *cpu_spline_handler, std::vector< float > cpu_many_array_TF1, std::vector< short int > cpu_paramNo_arr_TF1, int n_events, std::vector< unsigned int > cpu_nParamPerEvent, std::vector< unsigned int > cpu_nParamPerEvent_TF1, int n_params, unsigned int n_splines, short int spline_size, unsigned int total_nknots, unsigned int n_tf1)
 Copies data from CPU to GPU for the spline monolith.
 
__host__ void InitGPU_Segments (short int **segment)
 Allocate memory for spline segments.
 
__host__ void InitGPU_Vals (float **vals)
 Allocate memory for spline segments.
 
__host__ void RunGPU_SplineMonolith (float *cpu_total_weights, float *vals, short int *segment, const unsigned int h_n_splines, const unsigned int h_n_tf1)
 Run the GPU code for the separate many arrays. As in separate {x}, {y,b,c,d} arrays Pass the segment and the parameter values (binary search already performed in SplineMonolith::FindSplineSegment()
 
__host__ void CleanupGPU_SplineMonolith (float *cpu_total_weights)
 This function deallocates the resources allocated for the separate {x} and {ybcd} arrays in the and TF1 stuff at GPU.
 
__host__ void CleanupGPU_Segments (short int *segment, float *vals)
 Clean up pinned variables at CPU.
 

Private Attributes

unsigned int * gpu_nParamPerEvent
 KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}.
 
unsigned int * gpu_nParamPerEvent_TF1
 KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}.
 
float * gpu_coeff_x
 KS: GPU arrays to hold X coefficient.
 
float * gpu_coeff_many
 GPU arrays to hold other coefficients.
 
unsigned int * gpu_nKnots_arr
 KS: GPU Number of knots per spline.
 
short int * gpu_paramNo_arr
 CW: GPU array with the number of points per spline (not per spline point!)
 
float * gpu_coeff_TF1_many
 GPU arrays to hold TF1 coefficients.
 
short int * gpu_nPoints_arr
 GPU arrays to hold number of points.
 
short int * gpu_paramNo_TF1_arr
 CW: GPU array with the number of points per TF1 object.
 
float * gpu_total_weights
 GPU arrays to hold weight for event.
 
float * gpu_weights
 GPU arrays to hold weight for each spline.
 
float * gpu_weights_tf1
 GPU arrays to hold weight for each TF1.
 
int h_n_params
 Number of params living on CPU.
 
int h_n_events
 Number of events living on CPU.
 
cudaTextureObject_t text_coeff_x = 0
 KS: Textures are L1 cache variables which are well optimised for fetching. Make texture only for variables you often access but rarely overwrite. There are limits on texture memory so don't use huge arrays.
 
cudaTextureObject_t text_nParamPerEvent = 0
 KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}.
 
cudaTextureObject_t text_nParamPerEvent_TF1 = 0
 KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}.
 

Detailed Description

Class responsible for calculating spline weight on GPU.

Definition at line 61 of file gpuSplineUtils.cuh.

Constructor & Destructor Documentation

◆ SMonolithGPU()

SMonolithGPU::SMonolithGPU ( )

constructor

Number of events living on CPU

Definition at line 81 of file gpuSplineUtils.cu.

81 {
82 h_n_params = -1;
84 h_n_events = -1;
85
86 gpu_weights = nullptr;
87 gpu_total_weights = nullptr;
88 gpu_nParamPerEvent = nullptr;
89 gpu_nPoints_arr = nullptr;
90 gpu_paramNo_arr = nullptr;
91 gpu_nKnots_arr = nullptr;
92 gpu_coeff_x = nullptr;
93 gpu_coeff_many = nullptr;
94 gpu_coeff_TF1_many = nullptr;
95 gpu_paramNo_TF1_arr = nullptr;
96 gpu_nParamPerEvent_TF1 = nullptr;
97 gpu_weights_tf1 = nullptr;
98}
unsigned int * gpu_nParamPerEvent
KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number...
short int * gpu_paramNo_TF1_arr
CW: GPU array with the number of points per TF1 object.
int h_n_events
Number of events living on CPU.
float * gpu_weights
GPU arrays to hold weight for each spline.
float * gpu_coeff_many
GPU arrays to hold other coefficients.
unsigned int * gpu_nParamPerEvent_TF1
KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number...
float * gpu_coeff_x
KS: GPU arrays to hold X coefficient.
short int * gpu_nPoints_arr
GPU arrays to hold number of points.
float * gpu_weights_tf1
GPU arrays to hold weight for each TF1.
unsigned int * gpu_nKnots_arr
KS: GPU Number of knots per spline.
int h_n_params
Number of params living on CPU.
float * gpu_coeff_TF1_many
GPU arrays to hold TF1 coefficients.
short int * gpu_paramNo_arr
CW: GPU array with the number of points per spline (not per spline point!)
float * gpu_total_weights
GPU arrays to hold weight for event.

◆ ~SMonolithGPU()

SMonolithGPU::~SMonolithGPU ( )
virtual

destructor

Definition at line 100 of file gpuSplineUtils.cu.

100 {
101
102}

Member Function Documentation

◆ CleanupGPU_Segments()

__host__ void SMonolithGPU::CleanupGPU_Segments ( short int *  segment,
float *  vals 
)

Clean up pinned variables at CPU.

Parameters
segmentFound spline segment for each parameter
valsValue to which we want reweight for each parameter

Definition at line 587 of file gpuSplineUtils.cu.

587 {
588// *******************************************
589 cudaFreeHost(segment);
590 cudaFreeHost(vals);
591
592 segment = nullptr;
593 vals = nullptr;
594}

◆ CleanupGPU_SplineMonolith()

__host__ void SMonolithGPU::CleanupGPU_SplineMonolith ( float *  cpu_total_weights)

This function deallocates the resources allocated for the separate {x} and {ybcd} arrays in the and TF1 stuff at GPU.

Parameters
cpu_total_weightsPointer to the total weights array on the CPU (used if Weight_On_SplineBySpline_Basis is not defined).

Definition at line 552 of file gpuSplineUtils.cu.

556 {
557// *********************************
558 cudaFree(gpu_paramNo_arr);
559 cudaFree(gpu_nKnots_arr);
560
561 // free the coefficient arrays
562 cudaDestroyTextureObject(text_coeff_x);
563
564 cudaFree(gpu_coeff_x);
565 cudaFree(gpu_coeff_many);
566
567 cudaFree(gpu_coeff_TF1_many);
568 cudaFree(gpu_paramNo_TF1_arr);
569 // free weights on the gpu
570 cudaFree(gpu_weights);
571 cudaFree(gpu_weights_tf1);
572#ifndef Weight_On_SplineBySpline_Basis
573 cudaFree(gpu_total_weights);
574 //KS: Before removing variable let's destroy texture
575 cudaDestroyTextureObject(text_nParamPerEvent);
576 cudaDestroyTextureObject(text_nParamPerEvent_TF1);
577
578 cudaFree(gpu_nParamPerEvent);
579 cudaFree(gpu_nParamPerEvent_TF1);
580 cudaFreeHost(cpu_total_weights);
581 cpu_total_weights = nullptr;
582#endif
583}
cudaTextureObject_t text_nParamPerEvent_TF1
KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of ...
cudaTextureObject_t text_coeff_x
KS: Textures are L1 cache variables which are well optimised for fetching. Make texture only for vari...
cudaTextureObject_t text_nParamPerEvent
KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of ...

◆ CopyToGPU_SplineMonolith()

__host__ void SMonolithGPU::CopyToGPU_SplineMonolith ( SplineMonoStruct cpu_spline_handler,
std::vector< float >  cpu_many_array_TF1,
std::vector< short int >  cpu_paramNo_arr_TF1,
int  n_events,
std::vector< unsigned int >  cpu_nParamPerEvent,
std::vector< unsigned int >  cpu_nParamPerEvent_TF1,
int  n_params,
unsigned int  n_splines,
short int  spline_size,
unsigned int  total_nknots,
unsigned int  n_tf1 
)

Copies data from CPU to GPU for the spline monolith.

This function transfers the necessary spline data and parameters from the CPU to the GPU, including TF1-related arrays and parameters. This setup is crucial for spline evaluations on the GPU.

Parameters
cpu_spline_handlerPointer to the structure managing spline data on the CPU.
cpu_many_array_TF1Array of TF1 parameters on the CPU.
cpu_paramNo_arr_TF1Array containing parameter numbers for TF1 objects.
n_eventsNumber of events, necessary for correct data handling.
cpu_nParamPerEventArray indicating the number of parameters per event.
cpu_nParamPerEvent_TF1Array indicating the number of parameters per TF1 object.
n_paramsTotal number of parameters across all splines.
n_splinesTotal number of spline objects.
spline_sizeSize of each spline object.
total_nknotsTotal number of knots across all splines.
n_tf1Total number of TF1 objects.

Definition at line 201 of file gpuSplineUtils.cu.

217 {
218// ******************************************************
219 if (n_params != _N_SPLINES_) {
220 printf("Number of splines not equal to %i, GPU code for event-by-event splines will fail\n", _N_SPLINES_);
221 printf("n_params = %i\n", n_params);
222 printf("%s : %i\n", __FILE__, __LINE__);
223 throw;
224 }
225
226 // Write to the global statics (h_* denotes host stored variable)
227 h_n_params = n_params;
228#ifndef Weight_On_SplineBySpline_Basis
229 h_n_events = n_events;
230#endif
231 // Copy the constants
232 // Total number of valid splines for all loaded events
233 cudaMemcpyToSymbol(d_n_splines, &n_splines, sizeof(n_splines));
235
236 // Total number of valid TF1 for all loaded events
237 cudaMemcpyToSymbol(d_n_TF1, &n_tf1, sizeof(n_tf1));
239
240 // Total spline size per spline; i.e. just the number of points or knots in the spline
241 cudaMemcpyToSymbol(d_spline_size, &spline_size, sizeof(spline_size));
243#ifndef Weight_On_SplineBySpline_Basis
244 // Number of events
245 cudaMemcpyToSymbol(d_n_events, &h_n_events, sizeof(h_n_events));
247#endif
248 // Copy the coefficient arrays to the GPU; this only happens once per entire Markov Chain so is OK to do multiple extensive memory copies
249 cudaMemcpy(gpu_coeff_many, cpu_spline_handler->coeff_many.data(), sizeof(float)*total_nknots*_nCoeff_, cudaMemcpyHostToDevice);
251
252 cudaMemcpy(gpu_coeff_x, cpu_spline_handler->coeff_x.data(), sizeof(float)*spline_size*n_params, cudaMemcpyHostToDevice);
254
255 //KS: Bind our texture with the GPU variable
256 //KS: Tried also moving gpu_many_array to texture memory it only worked with restricted number of MC runs, most likely hit texture memory limit :(
257 struct cudaResourceDesc resDesc_coeff_x;
258 memset(&resDesc_coeff_x, 0, sizeof(resDesc_coeff_x));
259 resDesc_coeff_x.resType = cudaResourceTypeLinear;
260 resDesc_coeff_x.res.linear.devPtr = gpu_coeff_x;
261 resDesc_coeff_x.res.linear.desc = cudaCreateChannelDesc<float>();
262 resDesc_coeff_x.res.linear.sizeInBytes = sizeof(float)*spline_size*n_params;
263
264 // Specify texture object parameters
265 struct cudaTextureDesc texDesc_coeff_x;
266 memset(&texDesc_coeff_x, 0, sizeof(texDesc_coeff_x));
267 texDesc_coeff_x.readMode = cudaReadModeElementType;
268
269 // Create texture object
270 cudaCreateTextureObject(&text_coeff_x, &resDesc_coeff_x, &texDesc_coeff_x, nullptr);
272
273 // Also copy the parameter number for each spline onto the GPU; i.e. what spline parameter are we calculating right now
274 cudaMemcpy(gpu_paramNo_arr, cpu_spline_handler->paramNo_arr.data(), n_splines*sizeof(short int), cudaMemcpyHostToDevice);
276
277 // Also copy the knot map for each spline onto the GPU;
278 cudaMemcpy(gpu_nKnots_arr, cpu_spline_handler->nKnots_arr.data(), n_splines*sizeof(unsigned int), cudaMemcpyHostToDevice);
280
281 //Now TF1
282 // Copy the coefficient arrays to the GPU; this only happens once per entire Markov Chain so is OK to do multiple extensive memory copies
283 cudaMemcpy(gpu_coeff_TF1_many, cpu_many_array_TF1.data(), sizeof(float)*n_tf1*_nTF1Coeff_, cudaMemcpyHostToDevice);
285
286 // Also copy the parameter number for each TF1 onto the GPU; i.e. what TF1 parameter are we calculating right now
287 cudaMemcpy(gpu_paramNo_TF1_arr, cpu_paramNo_arr_TF1.data(), n_tf1*sizeof(short int), cudaMemcpyHostToDevice);
289
290 #ifndef Weight_On_SplineBySpline_Basis
291 //KS: Keep track how much splines each event has
292 cudaMemcpy(gpu_nParamPerEvent, cpu_nParamPerEvent.data(), 2*n_events*sizeof(unsigned int), cudaMemcpyHostToDevice);
294
295 //KS: Bind our texture with the GPU variable
296 // create a resource descriptor based on device pointers
297 struct cudaResourceDesc resDesc_nParamPerEvent;
298 memset(&resDesc_nParamPerEvent, 0, sizeof(resDesc_nParamPerEvent));
299 resDesc_nParamPerEvent.resType = cudaResourceTypeLinear;
300 resDesc_nParamPerEvent.res.linear.devPtr = gpu_nParamPerEvent;
301 resDesc_nParamPerEvent.res.linear.desc = cudaCreateChannelDesc<unsigned int>();
302 resDesc_nParamPerEvent.res.linear.sizeInBytes = 2*n_events*sizeof(unsigned int);
303
304 // Specify texture object parameters
305 struct cudaTextureDesc texDesc_nParamPerEvent;
306 memset(&texDesc_nParamPerEvent, 0, sizeof(texDesc_nParamPerEvent));
307 texDesc_nParamPerEvent.readMode = cudaReadModeElementType;
308
309 //Finally create texture object
310 cudaCreateTextureObject(&text_nParamPerEvent, &resDesc_nParamPerEvent, &texDesc_nParamPerEvent, nullptr);
312
313 // Now TF1
314 cudaMemcpy(gpu_nParamPerEvent_TF1, cpu_nParamPerEvent_TF1.data(), 2*n_events*sizeof(unsigned int), cudaMemcpyHostToDevice);
316
317 //KS: Bind our texture with the GPU variable
318 // create a resource descriptor based on device pointers
319 struct cudaResourceDesc resDesc_nParamPerEvent_tf1;
320 memset(&resDesc_nParamPerEvent_tf1, 0, sizeof(resDesc_nParamPerEvent_tf1));
321 resDesc_nParamPerEvent_tf1.resType = cudaResourceTypeLinear;
322 resDesc_nParamPerEvent_tf1.res.linear.devPtr = gpu_nParamPerEvent_TF1;
323 resDesc_nParamPerEvent_tf1.res.linear.desc = cudaCreateChannelDesc<unsigned int>();
324 resDesc_nParamPerEvent_tf1.res.linear.sizeInBytes = 2*n_events*sizeof(unsigned int);
325
326 // Specify texture object parameters
327 struct cudaTextureDesc texDesc_nParamPerEvent_tf1;
328 memset(&texDesc_nParamPerEvent_tf1, 0, sizeof(texDesc_nParamPerEvent_tf1));
329 texDesc_nParamPerEvent_tf1.readMode = cudaReadModeElementType;
330
331 //Finally create texture object
332 cudaCreateTextureObject(&text_nParamPerEvent_TF1, &resDesc_nParamPerEvent_tf1, &texDesc_nParamPerEvent_tf1, nullptr);
334 #endif
335}
#define _nCoeff_
KS: We store coefficients {y,b,c,d} in one array one by one, this is only to define it once rather th...
Definition: SplineCommon.h:13
#define _nTF1Coeff_
KS: For TF1 we store at most 5 coefficients, we could make it more flexible but for now define it her...
Definition: SplineCommon.h:15
#define _N_SPLINES_
__device__ __constant__ unsigned int d_n_TF1
Number of tf1 living on GPU.
__device__ __constant__ unsigned int d_n_splines
Number of splines living on GPU.
__device__ __constant__ int d_n_events
Number of events living on GPU.
__device__ __constant__ short int d_spline_size
Size of splines living on GPU.
#define CudaCheckError()
Definition: gpuUtils.cuh:22
std::vector< unsigned int > nKnots_arr
KS: CPU Number of knots per spline.
Definition: SplineCommon.h:41
std::vector< float > coeff_x
KS: CPU arrays to hold X coefficient.
Definition: SplineCommon.h:35
std::vector< float > coeff_many
CPU arrays to hold other coefficients.
Definition: SplineCommon.h:38
std::vector< short int > paramNo_arr
CW: CPU array with the number of points per spline (not per spline point!)
Definition: SplineCommon.h:44

◆ InitGPU_Segments()

__host__ void SMonolithGPU::InitGPU_Segments ( short int **  segment)

Allocate memory for spline segments.

Parameters
segmentFound spline segment for each parameter

Definition at line 178 of file gpuSplineUtils.cu.

178 {
179// *******************************************
180 //KS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
181 cudaMallocHost((void **) segment, _N_SPLINES_*sizeof(short int));
183}

◆ InitGPU_SplineMonolith()

__host__ void SMonolithGPU::InitGPU_SplineMonolith ( float **  cpu_total_weights,
int  n_events,
unsigned int  total_nknots,
unsigned int  n_splines,
unsigned int  n_tf1,
int  Eve_size 
)

Allocate memory on gpu for spline monolith.

Parameters
cpu_total_weightsKS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
n_eventsNumber of events, this is necessary to allocate correctly memory
total_nknotsTotal number of knots in all splines summed
n_splinesTotal number of spline objects, not knots
n_tf1Total number of TF1 objects, not coefficients

Definition at line 106 of file gpuSplineUtils.cu.

114 {
115// *******************************************
116 // Allocate chunks of memory to GPU
117 cudaMalloc((void **) &gpu_paramNo_arr, n_splines*sizeof(short int));
119
120 cudaMalloc((void **) &gpu_nKnots_arr, n_splines*sizeof(unsigned int));
122
123 cudaMalloc((void **) &gpu_coeff_x, Eve_size*sizeof(float));
125
126 cudaMalloc((void **) &gpu_coeff_many, _nCoeff_*total_nknots*sizeof(float));
128
129 // Allocate memory for the array of weights to be returned to CPU
130 cudaMalloc((void **) &gpu_weights, n_splines*sizeof(float));
132
133 // Now TF1 specific
134 cudaMalloc((void **) &gpu_coeff_TF1_many, _nTF1Coeff_*n_tf1*sizeof(float));
136
137 cudaMalloc((void **) &gpu_weights_tf1, n_tf1*sizeof(float));
139
140 cudaMalloc((void **) &gpu_paramNo_TF1_arr, n_tf1*sizeof(short int));
142
143#ifndef Weight_On_SplineBySpline_Basis
144 //KS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
145 cudaMallocHost((void **) cpu_total_weights, n_events*sizeof(float));
147
148 //KS: Allocate memory for the array of total weights to be returned to CPU
149 cudaMalloc((void **) &gpu_total_weights, n_events*sizeof(float));
151
152 //KS: Allocate memory for the map keeping track how many splines each parameter has
153 cudaMalloc((void **) &gpu_nParamPerEvent, 2*n_events*sizeof(unsigned int));
155
156 //KS: Allocate memory for the map keeping track how many TF1 each parameter has
157 cudaMalloc((void **) &gpu_nParamPerEvent_TF1, 2*n_events*sizeof(unsigned int));
159#endif
160
161 // Print allocation info to user
162 printf("Allocated %i entries for paramNo and nKnots arrays, size = %f MB\n",
163 n_splines, static_cast<double>(sizeof(short int) * n_splines + sizeof(unsigned int) * n_splines) / 1.0e6);
164 printf("Allocated %i entries for x coeff arrays, size = %f MB\n",
165 Eve_size, static_cast<double>(sizeof(float) * Eve_size) / 1.0e6);
166 printf("Allocated %i entries for {ybcd} coeff arrays, size = %f MB\n",
167 _nCoeff_ * total_nknots, static_cast<double>(sizeof(float) * _nCoeff_ * total_nknots) / 1.0e6);
168 printf("Allocated %i entries for TF1 coefficient arrays, size = %f MB\n",
169 _nTF1Coeff_ * n_tf1, static_cast<double>(sizeof(float) * _nTF1Coeff_ * n_tf1) / 1.0e6);
170
171 //KS: Ask CUDA about memory usage
172 checkGpuMem();
174}
void checkGpuMem()
KS: Get some fancy info about VRAM usage.
Definition: gpuUtils.cu:43
void PrintNdevices()
KS: Get some fancy info about GPU.
Definition: gpuUtils.cu:60

◆ InitGPU_Vals()

__host__ void SMonolithGPU::InitGPU_Vals ( float **  vals)

Allocate memory for spline segments.

Parameters
valsValue to which we want reweight for each parameter

Definition at line 187 of file gpuSplineUtils.cu.

187 {
188// *******************************************
189 //KS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
190 cudaMallocHost((void **) vals, _N_SPLINES_*sizeof(float));
192}

◆ RunGPU_SplineMonolith()

__host__ void SMonolithGPU::RunGPU_SplineMonolith ( float *  cpu_total_weights,
float *  vals,
short int *  segment,
const unsigned int  h_n_splines,
const unsigned int  h_n_tf1 
)

Run the GPU code for the separate many arrays. As in separate {x}, {y,b,c,d} arrays Pass the segment and the parameter values (binary search already performed in SplineMonolith::FindSplineSegment()

Executes the GPU code for calculating spline weights.

This function runs the GPU computation for the spline monolith. It assumes that the appropriate segment has already been identified through binary search in the SplineMonolith::FindSplineSegment() function.

Parameters
cpu_weightsPointer to the array of weights on the CPU (used if Weight_On_SplineBySpline_Basis is defined).
cpu_weights_tf1Pointer to the array of TF1 weights (used if Weight_On_SplineBySpline_Basis is defined).
cpu_total_weightsPointer to the total weights array (used if Weight_On_SplineBySpline_Basis is not defined).
valsPointer to an array holding the parameter values to be processed.
segmentPointer to an array containing segment indices for parameters.
h_n_splinesTotal number of spline objects in the GPU context.
h_n_tf1Total number of TF1 objects in the GPU context.

Definition at line 459 of file gpuSplineUtils.cu.

471 {
472// *****************************************
473 dim3 block_size;
474 dim3 grid_size;
475
476 block_size.x = _BlockSize_;
477 grid_size.x = (h_n_splines / block_size.x) + 1;
478
479 // Copy the segment values to the GPU (segment_gpu), which is h_n_params long
480 cudaMemcpyToSymbol(segment_gpu, segment, h_n_params*sizeof(short int));
482
483 // Copy the parameter values values to the GPU (vals_gpu), which is h_n_params long
484 cudaMemcpyToSymbol(val_gpu, vals, h_n_params*sizeof(float));
486
487 // KS: Consider asynchronous kernel call, this might help EvalOnGPU_Splines and EvalOnGPU_TF1 are independent
488 // Set the cache config to prefer L1 for the kernel
489 //cudaFuncSetCacheConfig(EvalOnGPU_Splines, cudaFuncCachePreferL1);
490 EvalOnGPU_Splines<<<grid_size, block_size>>>(
493
495
498 );
500
501 grid_size.x = (h_n_tf1 / block_size.x) + 1;
502 EvalOnGPU_TF1<<<grid_size, block_size>>>(
505
507 );
509
510//KS: We can either copy gpu_weight and calculate total weight in reweighting loop, or not copy and calculate total weight stall at GPU, which means less memory transfer
511#ifdef Weight_On_SplineBySpline_Basis
512 // Here we have to make a somewhat large GPU->CPU transfer because it's all the splines' response
513 cudaMemcpy(cpu_weights, gpu_weights, h_n_splines*sizeof(float), cudaMemcpyDeviceToHost);
515
516 cudaMemcpy(cpu_weights_tf1, gpu_weights_tf1, h_n_tf1*sizeof(float), cudaMemcpyDeviceToHost);
518
519//KS: Else calculate Total Weight
520#else
521 grid_size.x = (h_n_events / block_size.x) + 1;
522
523 EvalOnGPU_TotWeight<<<grid_size, block_size>>>(
526
528
531 );
533
534 //KS: Here we have to make a somewhat large GPU->CPU transfer because it is proportional to number of events
535 //KS: Normally code wait for memory transfer to finish before moving further cudaMemcpyAsync means we will continue to execute code and in a meantime keep copying stuff.
536 cudaMemcpyAsync(cpu_total_weights, gpu_total_weights, h_n_events * sizeof(float), cudaMemcpyDeviceToHost, 0);
538#endif
539
540 #ifdef DEBUG
541 printf("Copied GPU total weights to CPU with SUCCESS (drink more tea)\n");
542 printf("Released calculated response from GPU with SUCCESS (drink most tea)\n");
543 #endif
544}
__device__ __constant__ short int segment_gpu[NSplines_GPU]
__device__ __constant__ float val_gpu[NSplines_GPU]
CW: Constant memory needs to be hard-coded on compile time. Could make this texture memory instead,...
#define _BlockSize_
KS: Need it for shared memory, there is way to use dynamic shared memory but I am lazy right now.
Definition: gpuUtils.cuh:25

Member Data Documentation

◆ gpu_coeff_many

float* SMonolithGPU::gpu_coeff_many
private

GPU arrays to hold other coefficients.

Definition at line 192 of file gpuSplineUtils.cuh.

◆ gpu_coeff_TF1_many

float* SMonolithGPU::gpu_coeff_TF1_many
private

GPU arrays to hold TF1 coefficients.

Definition at line 201 of file gpuSplineUtils.cuh.

◆ gpu_coeff_x

float* SMonolithGPU::gpu_coeff_x
private

KS: GPU arrays to hold X coefficient.

Definition at line 189 of file gpuSplineUtils.cuh.

◆ gpu_nKnots_arr

unsigned int* SMonolithGPU::gpu_nKnots_arr
private

KS: GPU Number of knots per spline.

Definition at line 195 of file gpuSplineUtils.cuh.

◆ gpu_nParamPerEvent

unsigned int* SMonolithGPU::gpu_nParamPerEvent
private

KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}.

Definition at line 184 of file gpuSplineUtils.cuh.

◆ gpu_nParamPerEvent_TF1

unsigned int* SMonolithGPU::gpu_nParamPerEvent_TF1
private

KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}.

Definition at line 186 of file gpuSplineUtils.cuh.

◆ gpu_nPoints_arr

short int* SMonolithGPU::gpu_nPoints_arr
private

GPU arrays to hold number of points.

Definition at line 203 of file gpuSplineUtils.cuh.

◆ gpu_paramNo_arr

short int* SMonolithGPU::gpu_paramNo_arr
private

CW: GPU array with the number of points per spline (not per spline point!)

Definition at line 198 of file gpuSplineUtils.cuh.

◆ gpu_paramNo_TF1_arr

short int* SMonolithGPU::gpu_paramNo_TF1_arr
private

CW: GPU array with the number of points per TF1 object.

Definition at line 205 of file gpuSplineUtils.cuh.

◆ gpu_total_weights

float* SMonolithGPU::gpu_total_weights
private

GPU arrays to hold weight for event.

Definition at line 208 of file gpuSplineUtils.cuh.

◆ gpu_weights

float* SMonolithGPU::gpu_weights
private

GPU arrays to hold weight for each spline.

Definition at line 210 of file gpuSplineUtils.cuh.

◆ gpu_weights_tf1

float* SMonolithGPU::gpu_weights_tf1
private

GPU arrays to hold weight for each TF1.

Definition at line 212 of file gpuSplineUtils.cuh.

◆ h_n_events

int SMonolithGPU::h_n_events
private

Number of events living on CPU.

Definition at line 218 of file gpuSplineUtils.cuh.

◆ h_n_params

int SMonolithGPU::h_n_params
private

Number of params living on CPU.

Definition at line 216 of file gpuSplineUtils.cuh.

◆ text_coeff_x

cudaTextureObject_t SMonolithGPU::text_coeff_x = 0
private

KS: Textures are L1 cache variables which are well optimised for fetching. Make texture only for variables you often access but rarely overwrite. There are limits on texture memory so don't use huge arrays.

Definition at line 224 of file gpuSplineUtils.cuh.

◆ text_nParamPerEvent

cudaTextureObject_t SMonolithGPU::text_nParamPerEvent = 0
private

KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}.

Definition at line 227 of file gpuSplineUtils.cuh.

◆ text_nParamPerEvent_TF1

cudaTextureObject_t SMonolithGPU::text_nParamPerEvent_TF1 = 0
private

KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}.

Definition at line 229 of file gpuSplineUtils.cuh.


The documentation for this class was generated from the following files: