MaCh3  2.4.2
Reference Guide
Public Member Functions | Private Attributes | List of all members
SMonolithGPU Class Reference

Class responsible for calculating spline weight on GPU. More...

Public Member Functions

 SMonolithGPU ()
 constructor More...
 
virtual ~SMonolithGPU ()
 destructor More...
 
__host__ void InitGPU_SplineMonolith (float **cpu_total_weights, int n_events, unsigned int total_nknots, unsigned int n_splines, unsigned int n_tf1, int Eve_size)
 Allocate memory on gpu for spline monolith. More...
 
__host__ void CopyToGPU_SplineMonolith (SplineMonoStruct *cpu_spline_handler, std::vector< float > cpu_many_array_TF1, std::vector< short int > cpu_paramNo_arr_TF1, int n_events, std::vector< unsigned int > cpu_nParamPerEvent, std::vector< unsigned int > cpu_nParamPerEvent_TF1, int n_params, unsigned int n_splines, short int spline_size, unsigned int total_nknots, unsigned int n_tf1)
 Copies data from CPU to GPU for the spline monolith. More...
 
__host__ void InitGPU_Segments (short int **segment)
 Allocate memory for spline segments. More...
 
__host__ void InitGPU_Vals (float **vals)
 Allocate memory for spline segments. More...
 
__host__ void RunGPU_SplineMonolith (float *cpu_total_weights, float *vals, short int *segment, const unsigned int h_n_splines, const unsigned int h_n_tf1)
 Run the GPU code for the separate many arrays. As in separate {x}, {y,b,c,d} arrays Pass the segment and the parameter values (binary search already performed in SplineMonolith::FindSplineSegment() More...
 
__host__ void CleanupGPU_SplineMonolith (float *cpu_total_weights)
 This function deallocates the resources allocated for the separate {x} and {ybcd} arrays in the and TF1 stuff at GPU. More...
 
__host__ void CleanupGPU_Segments (short int *segment, float *vals)
 Clean up pinned variables at CPU. More...
 

Private Attributes

unsigned int * gpu_nParamPerEvent
 KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}. More...
 
unsigned int * gpu_nParamPerEvent_TF1
 KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}. More...
 
float * gpu_coeff_x
 KS: GPU arrays to hold X coefficient. More...
 
float * gpu_coeff_many
 GPU arrays to hold other coefficients. More...
 
unsigned int * gpu_nKnots_arr
 KS: GPU Number of knots per spline. More...
 
short int * gpu_paramNo_arr
 CW: GPU array with the number of points per spline (not per spline point!) More...
 
float * gpu_coeff_TF1_many
 GPU arrays to hold TF1 coefficients. More...
 
short int * gpu_nPoints_arr
 GPU arrays to hold number of points. More...
 
short int * gpu_paramNo_TF1_arr
 CW: GPU array with the number of points per TF1 object. More...
 
float * gpu_total_weights
 GPU arrays to hold weight for event. More...
 
float * gpu_weights
 GPU arrays to hold weight for each spline. More...
 
float * gpu_weights_tf1
 GPU arrays to hold weight for each TF1. More...
 
int h_n_params
 Number of params living on CPU. More...
 
int h_n_events
 Number of events living on CPU. More...
 
cudaTextureObject_t text_coeff_x = 0
 KS: Textures are L1 cache variables which are well optimised for fetching. Make texture only for variables you often access but rarely overwrite. There are limits on texture memory so don't use huge arrays. More...
 
cudaTextureObject_t text_nParamPerEvent = 0
 KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}. More...
 
cudaTextureObject_t text_nParamPerEvent_TF1 = 0
 KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}. More...
 

Detailed Description

Class responsible for calculating spline weight on GPU.

Definition at line 61 of file gpuSplineUtils.cuh.

Constructor & Destructor Documentation

◆ SMonolithGPU()

SMonolithGPU::SMonolithGPU ( )

constructor

Number of events living on CPU

Definition at line 81 of file gpuSplineUtils.cu.

81  {
82  h_n_params = -1;
84  h_n_events = -1;
85 
86  gpu_weights = nullptr;
87  gpu_total_weights = nullptr;
88  gpu_nParamPerEvent = nullptr;
89  gpu_nPoints_arr = nullptr;
90  gpu_paramNo_arr = nullptr;
91  gpu_nKnots_arr = nullptr;
92  gpu_coeff_x = nullptr;
93  gpu_coeff_many = nullptr;
94  gpu_coeff_TF1_many = nullptr;
95  gpu_paramNo_TF1_arr = nullptr;
96  gpu_nParamPerEvent_TF1 = nullptr;
97  gpu_weights_tf1 = nullptr;
98 }
unsigned int * gpu_nParamPerEvent
KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number...
short int * gpu_paramNo_TF1_arr
CW: GPU array with the number of points per TF1 object.
int h_n_events
Number of events living on CPU.
float * gpu_weights
GPU arrays to hold weight for each spline.
float * gpu_coeff_many
GPU arrays to hold other coefficients.
unsigned int * gpu_nParamPerEvent_TF1
KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number...
float * gpu_coeff_x
KS: GPU arrays to hold X coefficient.
short int * gpu_nPoints_arr
GPU arrays to hold number of points.
float * gpu_weights_tf1
GPU arrays to hold weight for each TF1.
unsigned int * gpu_nKnots_arr
KS: GPU Number of knots per spline.
int h_n_params
Number of params living on CPU.
float * gpu_coeff_TF1_many
GPU arrays to hold TF1 coefficients.
short int * gpu_paramNo_arr
CW: GPU array with the number of points per spline (not per spline point!)
float * gpu_total_weights
GPU arrays to hold weight for event.

◆ ~SMonolithGPU()

SMonolithGPU::~SMonolithGPU ( )
virtual

destructor

Definition at line 100 of file gpuSplineUtils.cu.

100  {
101 }

Member Function Documentation

◆ CleanupGPU_Segments()

__host__ void SMonolithGPU::CleanupGPU_Segments ( short int *  segment,
float *  vals 
)

Clean up pinned variables at CPU.

Parameters
segmentFound spline segment for each parameter
valsValue to which we want reweight for each parameter

Definition at line 584 of file gpuSplineUtils.cu.

584  {
585 // *******************************************
586  cudaFreeHost(segment);
587  cudaFreeHost(vals);
588 
589  segment = nullptr;
590  vals = nullptr;
591 }

◆ CleanupGPU_SplineMonolith()

__host__ void SMonolithGPU::CleanupGPU_SplineMonolith ( float *  cpu_total_weights)

This function deallocates the resources allocated for the separate {x} and {ybcd} arrays in the and TF1 stuff at GPU.

Parameters
cpu_total_weightsPointer to the total weights array on the CPU (used if Weight_On_SplineBySpline_Basis is not defined).

Definition at line 549 of file gpuSplineUtils.cu.

553  {
554 // *********************************
555  cudaFree(gpu_paramNo_arr);
556  cudaFree(gpu_nKnots_arr);
557 
558  // free the coefficient arrays
559  cudaDestroyTextureObject(text_coeff_x);
560 
561  cudaFree(gpu_coeff_x);
562  cudaFree(gpu_coeff_many);
563 
564  cudaFree(gpu_coeff_TF1_many);
565  cudaFree(gpu_paramNo_TF1_arr);
566  // free weights on the gpu
567  cudaFree(gpu_weights);
568  cudaFree(gpu_weights_tf1);
569 #ifndef Weight_On_SplineBySpline_Basis
570  cudaFree(gpu_total_weights);
571  //KS: Before removing variable let's destroy texture
572  cudaDestroyTextureObject(text_nParamPerEvent);
573  cudaDestroyTextureObject(text_nParamPerEvent_TF1);
574 
575  cudaFree(gpu_nParamPerEvent);
576  cudaFree(gpu_nParamPerEvent_TF1);
577  cudaFreeHost(cpu_total_weights);
578  cpu_total_weights = nullptr;
579 #endif
580 }
cudaTextureObject_t text_nParamPerEvent_TF1
KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of ...
cudaTextureObject_t text_coeff_x
KS: Textures are L1 cache variables which are well optimised for fetching. Make texture only for vari...
cudaTextureObject_t text_nParamPerEvent
KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of ...

◆ CopyToGPU_SplineMonolith()

__host__ void SMonolithGPU::CopyToGPU_SplineMonolith ( SplineMonoStruct cpu_spline_handler,
std::vector< float >  cpu_many_array_TF1,
std::vector< short int >  cpu_paramNo_arr_TF1,
int  n_events,
std::vector< unsigned int >  cpu_nParamPerEvent,
std::vector< unsigned int >  cpu_nParamPerEvent_TF1,
int  n_params,
unsigned int  n_splines,
short int  spline_size,
unsigned int  total_nknots,
unsigned int  n_tf1 
)

Copies data from CPU to GPU for the spline monolith.

This function transfers the necessary spline data and parameters from the CPU to the GPU, including TF1-related arrays and parameters. This setup is crucial for spline evaluations on the GPU.

Parameters
cpu_spline_handlerPointer to the structure managing spline data on the CPU.
cpu_many_array_TF1Array of TF1 parameters on the CPU.
cpu_paramNo_arr_TF1Array containing parameter numbers for TF1 objects.
n_eventsNumber of events, necessary for correct data handling.
cpu_nParamPerEventArray indicating the number of parameters per event.
cpu_nParamPerEvent_TF1Array indicating the number of parameters per TF1 object.
n_paramsTotal number of parameters across all splines.
n_splinesTotal number of spline objects.
spline_sizeSize of each spline object.
total_nknotsTotal number of knots across all splines.
n_tf1Total number of TF1 objects.

Definition at line 200 of file gpuSplineUtils.cu.

216  {
217 // ******************************************************
218  if (n_params != _N_SPLINES_) {
219  printf("Number of splines not equal to %i, GPU code for event-by-event splines will fail\n", _N_SPLINES_);
220  printf("n_params = %i\n", n_params);
221  printf("%s : %i\n", __FILE__, __LINE__);
222  throw;
223  }
224 
225  // Write to the global statics (h_* denotes host stored variable)
226  h_n_params = n_params;
227 #ifndef Weight_On_SplineBySpline_Basis
228  h_n_events = n_events;
229 #endif
230  // Copy the constants
231  // Total number of valid splines for all loaded events
232  cudaMemcpyToSymbol(d_n_splines, &n_splines, sizeof(n_splines));
233  CudaCheckError();
234 
235  // Total number of valid TF1 for all loaded events
236  cudaMemcpyToSymbol(d_n_TF1, &n_tf1, sizeof(n_tf1));
237  CudaCheckError();
238 
239  // Total spline size per spline; i.e. just the number of points or knots in the spline
240  cudaMemcpyToSymbol(d_spline_size, &spline_size, sizeof(spline_size));
241  CudaCheckError();
242 #ifndef Weight_On_SplineBySpline_Basis
243  // Number of events
244  cudaMemcpyToSymbol(d_n_events, &h_n_events, sizeof(h_n_events));
245  CudaCheckError();
246 #endif
247  // Copy the coefficient arrays to the GPU; this only happens once per entire Markov Chain so is OK to do multiple extensive memory copies
248  cudaMemcpy(gpu_coeff_many, cpu_spline_handler->coeff_many.data(), sizeof(float)*total_nknots*_nCoeff_, cudaMemcpyHostToDevice);
249  CudaCheckError();
250 
251  cudaMemcpy(gpu_coeff_x, cpu_spline_handler->coeff_x.data(), sizeof(float)*spline_size*n_params, cudaMemcpyHostToDevice);
252  CudaCheckError();
253 
254  //KS: Bind our texture with the GPU variable
255  //KS: Tried also moving gpu_many_array to texture memory it only worked with restricted number of MC runs, most likely hit texture memory limit :(
256  struct cudaResourceDesc resDesc_coeff_x;
257  memset(&resDesc_coeff_x, 0, sizeof(resDesc_coeff_x));
258  resDesc_coeff_x.resType = cudaResourceTypeLinear;
259  resDesc_coeff_x.res.linear.devPtr = gpu_coeff_x;
260  resDesc_coeff_x.res.linear.desc = cudaCreateChannelDesc<float>();
261  resDesc_coeff_x.res.linear.sizeInBytes = sizeof(float)*spline_size*n_params;
262 
263  // Specify texture object parameters
264  struct cudaTextureDesc texDesc_coeff_x;
265  memset(&texDesc_coeff_x, 0, sizeof(texDesc_coeff_x));
266  texDesc_coeff_x.readMode = cudaReadModeElementType;
267 
268  // Create texture object
269  cudaCreateTextureObject(&text_coeff_x, &resDesc_coeff_x, &texDesc_coeff_x, nullptr);
270  CudaCheckError();
271 
272  // Also copy the parameter number for each spline onto the GPU; i.e. what spline parameter are we calculating right now
273  cudaMemcpy(gpu_paramNo_arr, cpu_spline_handler->paramNo_arr.data(), n_splines*sizeof(short int), cudaMemcpyHostToDevice);
274  CudaCheckError();
275 
276  // Also copy the knot map for each spline onto the GPU;
277  cudaMemcpy(gpu_nKnots_arr, cpu_spline_handler->nKnots_arr.data(), n_splines*sizeof(unsigned int), cudaMemcpyHostToDevice);
278  CudaCheckError();
279 
280  //Now TF1
281  // Copy the coefficient arrays to the GPU; this only happens once per entire Markov Chain so is OK to do multiple extensive memory copies
282  cudaMemcpy(gpu_coeff_TF1_many, cpu_many_array_TF1.data(), sizeof(float)*n_tf1*_nTF1Coeff_, cudaMemcpyHostToDevice);
283  CudaCheckError();
284 
285  // Also copy the parameter number for each TF1 onto the GPU; i.e. what TF1 parameter are we calculating right now
286  cudaMemcpy(gpu_paramNo_TF1_arr, cpu_paramNo_arr_TF1.data(), n_tf1*sizeof(short int), cudaMemcpyHostToDevice);
287  CudaCheckError();
288 
289  #ifndef Weight_On_SplineBySpline_Basis
290  //KS: Keep track how much splines each event has
291  cudaMemcpy(gpu_nParamPerEvent, cpu_nParamPerEvent.data(), 2*n_events*sizeof(unsigned int), cudaMemcpyHostToDevice);
292  CudaCheckError();
293 
294  //KS: Bind our texture with the GPU variable
295  // create a resource descriptor based on device pointers
296  struct cudaResourceDesc resDesc_nParamPerEvent;
297  memset(&resDesc_nParamPerEvent, 0, sizeof(resDesc_nParamPerEvent));
298  resDesc_nParamPerEvent.resType = cudaResourceTypeLinear;
299  resDesc_nParamPerEvent.res.linear.devPtr = gpu_nParamPerEvent;
300  resDesc_nParamPerEvent.res.linear.desc = cudaCreateChannelDesc<unsigned int>();
301  resDesc_nParamPerEvent.res.linear.sizeInBytes = 2*n_events*sizeof(unsigned int);
302 
303  // Specify texture object parameters
304  struct cudaTextureDesc texDesc_nParamPerEvent;
305  memset(&texDesc_nParamPerEvent, 0, sizeof(texDesc_nParamPerEvent));
306  texDesc_nParamPerEvent.readMode = cudaReadModeElementType;
307 
308  //Finally create texture object
309  cudaCreateTextureObject(&text_nParamPerEvent, &resDesc_nParamPerEvent, &texDesc_nParamPerEvent, nullptr);
310  CudaCheckError();
311 
312  // Now TF1
313  cudaMemcpy(gpu_nParamPerEvent_TF1, cpu_nParamPerEvent_TF1.data(), 2*n_events*sizeof(unsigned int), cudaMemcpyHostToDevice);
314  CudaCheckError();
315 
316  //KS: Bind our texture with the GPU variable
317  // create a resource descriptor based on device pointers
318  struct cudaResourceDesc resDesc_nParamPerEvent_tf1;
319  memset(&resDesc_nParamPerEvent_tf1, 0, sizeof(resDesc_nParamPerEvent_tf1));
320  resDesc_nParamPerEvent_tf1.resType = cudaResourceTypeLinear;
321  resDesc_nParamPerEvent_tf1.res.linear.devPtr = gpu_nParamPerEvent_TF1;
322  resDesc_nParamPerEvent_tf1.res.linear.desc = cudaCreateChannelDesc<unsigned int>();
323  resDesc_nParamPerEvent_tf1.res.linear.sizeInBytes = 2*n_events*sizeof(unsigned int);
324 
325  // Specify texture object parameters
326  struct cudaTextureDesc texDesc_nParamPerEvent_tf1;
327  memset(&texDesc_nParamPerEvent_tf1, 0, sizeof(texDesc_nParamPerEvent_tf1));
328  texDesc_nParamPerEvent_tf1.readMode = cudaReadModeElementType;
329 
330  //Finally create texture object
331  cudaCreateTextureObject(&text_nParamPerEvent_TF1, &resDesc_nParamPerEvent_tf1, &texDesc_nParamPerEvent_tf1, nullptr);
332  CudaCheckError();
333  #endif
334 }
constexpr int _nCoeff_
KS: We store coefficients {y,b,c,d} in one array one by one, this is only to define it once rather th...
Definition: SplineCommon.h:13
constexpr int _nTF1Coeff_
KS: For TF1 we store at most 5 coefficients, we could make it more flexible but for now define it her...
Definition: SplineCommon.h:15
#define _N_SPLINES_
__device__ __constant__ unsigned int d_n_TF1
Number of tf1 living on GPU.
__device__ __constant__ unsigned int d_n_splines
Number of splines living on GPU.
__device__ __constant__ int d_n_events
Number of events living on GPU.
__device__ __constant__ short int d_spline_size
Size of splines living on GPU.
#define CudaCheckError()
Definition: gpuUtils.cuh:22
std::vector< unsigned int > nKnots_arr
KS: CPU Number of knots per spline.
Definition: SplineCommon.h:41
std::vector< float > coeff_x
KS: CPU arrays to hold X coefficient.
Definition: SplineCommon.h:32
std::vector< float > coeff_many
CPU arrays to hold other coefficients.
Definition: SplineCommon.h:38
std::vector< short int > paramNo_arr
CW: CPU array with the number of points per spline (not per spline point!)
Definition: SplineCommon.h:44

◆ InitGPU_Segments()

__host__ void SMonolithGPU::InitGPU_Segments ( short int **  segment)

Allocate memory for spline segments.

Parameters
segmentFound spline segment for each parameter

Definition at line 177 of file gpuSplineUtils.cu.

177  {
178 // *******************************************
179  //KS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
180  cudaMallocHost((void **) segment, _N_SPLINES_*sizeof(short int));
181  CudaCheckError();
182 }

◆ InitGPU_SplineMonolith()

__host__ void SMonolithGPU::InitGPU_SplineMonolith ( float **  cpu_total_weights,
int  n_events,
unsigned int  total_nknots,
unsigned int  n_splines,
unsigned int  n_tf1,
int  Eve_size 
)

Allocate memory on gpu for spline monolith.

Parameters
cpu_total_weightsKS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
n_eventsNumber of events, this is necessary to allocate correctly memory
total_nknotsTotal number of knots in all splines summed
n_splinesTotal number of spline objects, not knots
n_tf1Total number of TF1 objects, not coefficients

Definition at line 105 of file gpuSplineUtils.cu.

113  {
114 // *******************************************
115  // Allocate chunks of memory to GPU
116  cudaMalloc((void **) &gpu_paramNo_arr, n_splines*sizeof(short int));
117  CudaCheckError();
118 
119  cudaMalloc((void **) &gpu_nKnots_arr, n_splines*sizeof(unsigned int));
120  CudaCheckError();
121 
122  cudaMalloc((void **) &gpu_coeff_x, Eve_size*sizeof(float));
123  CudaCheckError();
124 
125  cudaMalloc((void **) &gpu_coeff_many, _nCoeff_*total_nknots*sizeof(float));
126  CudaCheckError();
127 
128  // Allocate memory for the array of weights to be returned to CPU
129  cudaMalloc((void **) &gpu_weights, n_splines*sizeof(float));
130  CudaCheckError();
131 
132  // Now TF1 specific
133  cudaMalloc((void **) &gpu_coeff_TF1_many, _nTF1Coeff_*n_tf1*sizeof(float));
134  CudaCheckError();
135 
136  cudaMalloc((void **) &gpu_weights_tf1, n_tf1*sizeof(float));
137  CudaCheckError();
138 
139  cudaMalloc((void **) &gpu_paramNo_TF1_arr, n_tf1*sizeof(short int));
140  CudaCheckError();
141 
142 #ifndef Weight_On_SplineBySpline_Basis
143  //KS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
144  cudaMallocHost((void **) cpu_total_weights, n_events*sizeof(float));
145  CudaCheckError();
146 
147  //KS: Allocate memory for the array of total weights to be returned to CPU
148  cudaMalloc((void **) &gpu_total_weights, n_events*sizeof(float));
149  CudaCheckError();
150 
151  //KS: Allocate memory for the map keeping track how many splines each parameter has
152  cudaMalloc((void **) &gpu_nParamPerEvent, 2*n_events*sizeof(unsigned int));
153  CudaCheckError();
154 
155  //KS: Allocate memory for the map keeping track how many TF1 each parameter has
156  cudaMalloc((void **) &gpu_nParamPerEvent_TF1, 2*n_events*sizeof(unsigned int));
157  CudaCheckError();
158 #endif
159 
160  // Print allocation info to user
161  printf("Allocated %i entries for paramNo and nKnots arrays, size = %f MB\n",
162  n_splines, static_cast<double>(sizeof(short int) * n_splines + sizeof(unsigned int) * n_splines) / 1.0e6);
163  printf("Allocated %i entries for x coeff arrays, size = %f MB\n",
164  Eve_size, static_cast<double>(sizeof(float) * Eve_size) / 1.0e6);
165  printf("Allocated %i entries for {ybcd} coeff arrays, size = %f MB\n",
166  _nCoeff_ * total_nknots, static_cast<double>(sizeof(float) * _nCoeff_ * total_nknots) / 1.0e6);
167  printf("Allocated %i entries for TF1 coefficient arrays, size = %f MB\n",
168  _nTF1Coeff_ * n_tf1, static_cast<double>(sizeof(float) * _nTF1Coeff_ * n_tf1) / 1.0e6);
169 
170  //KS: Ask CUDA about memory usage
171  checkGpuMem();
172  PrintNdevices();
173 }
void checkGpuMem()
KS: Get some fancy info about VRAM usage.
Definition: gpuUtils.cu:43
void PrintNdevices()
KS: Get some fancy info about GPU.
Definition: gpuUtils.cu:60

◆ InitGPU_Vals()

__host__ void SMonolithGPU::InitGPU_Vals ( float **  vals)

Allocate memory for spline segments.

Parameters
valsValue to which we want reweight for each parameter

Definition at line 186 of file gpuSplineUtils.cu.

186  {
187 // *******************************************
188  //KS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
189  cudaMallocHost((void **) vals, _N_SPLINES_*sizeof(float));
190  CudaCheckError();
191 }

◆ RunGPU_SplineMonolith()

__host__ void SMonolithGPU::RunGPU_SplineMonolith ( float *  cpu_total_weights,
float *  vals,
short int *  segment,
const unsigned int  h_n_splines,
const unsigned int  h_n_tf1 
)

Run the GPU code for the separate many arrays. As in separate {x}, {y,b,c,d} arrays Pass the segment and the parameter values (binary search already performed in SplineMonolith::FindSplineSegment()

Executes the GPU code for calculating spline weights.

This function runs the GPU computation for the spline monolith. It assumes that the appropriate segment has already been identified through binary search in the SplineMonolith::FindSplineSegment() function.

Parameters
cpu_weightsPointer to the array of weights on the CPU (used if Weight_On_SplineBySpline_Basis is defined).
cpu_weights_tf1Pointer to the array of TF1 weights (used if Weight_On_SplineBySpline_Basis is defined).
cpu_total_weightsPointer to the total weights array (used if Weight_On_SplineBySpline_Basis is not defined).
valsPointer to an array holding the parameter values to be processed.
segmentPointer to an array containing segment indices for parameters.
h_n_splinesTotal number of spline objects in the GPU context.
h_n_tf1Total number of TF1 objects in the GPU context.

Definition at line 456 of file gpuSplineUtils.cu.

468  {
469 // *****************************************
470  dim3 block_size;
471  dim3 grid_size;
472 
473  block_size.x = _BlockSize_;
474  grid_size.x = (h_n_splines / block_size.x) + 1;
475 
476  // Copy the segment values to the GPU (segment_gpu), which is h_n_params long
477  cudaMemcpyToSymbol(segment_gpu, segment, h_n_params*sizeof(short int));
478  CudaCheckError();
479 
480  // Copy the parameter values values to the GPU (vals_gpu), which is h_n_params long
481  cudaMemcpyToSymbol(val_gpu, vals, h_n_params*sizeof(float));
482  CudaCheckError();
483 
484  // KS: Consider asynchronous kernel call, this might help EvalOnGPU_Splines and EvalOnGPU_TF1 are independent
485  // Set the cache config to prefer L1 for the kernel
486  //cudaFuncSetCacheConfig(EvalOnGPU_Splines, cudaFuncCachePreferL1);
487  EvalOnGPU_Splines<<<grid_size, block_size>>>(
490 
492 
493  gpu_weights,
495  );
496  CudaCheckError();
497 
498  grid_size.x = (h_n_tf1 / block_size.x) + 1;
499  EvalOnGPU_TF1<<<grid_size, block_size>>>(
502 
504  );
505  CudaCheckError();
506 
507 //KS: We can either copy gpu_weight and calculate total weight in reweighting loop, or not copy and calculate total weight stall at GPU, which means less memory transfer
508 #ifdef Weight_On_SplineBySpline_Basis
509  // Here we have to make a somewhat large GPU->CPU transfer because it's all the splines' response
510  cudaMemcpy(cpu_weights, gpu_weights, h_n_splines*sizeof(float), cudaMemcpyDeviceToHost);
511  CudaCheckError();
512 
513  cudaMemcpy(cpu_weights_tf1, gpu_weights_tf1, h_n_tf1*sizeof(float), cudaMemcpyDeviceToHost);
514  CudaCheckError();
515 
516 //KS: Else calculate Total Weight
517 #else
518  grid_size.x = (h_n_events / block_size.x) + 1;
519 
520  EvalOnGPU_TotWeight<<<grid_size, block_size>>>(
521  gpu_weights,
523 
525 
528  );
529  CudaCheckError();
530 
531  //KS: Here we have to make a somewhat large GPU->CPU transfer because it is proportional to number of events
532  //KS: Normally code wait for memory transfer to finish before moving further cudaMemcpyAsync means we will continue to execute code and in a meantime keep copying stuff.
533  cudaMemcpyAsync(cpu_total_weights, gpu_total_weights, h_n_events * sizeof(float), cudaMemcpyDeviceToHost, 0);
534  CudaCheckError();
535 #endif
536 
537  #ifdef DEBUG
538  printf("Copied GPU total weights to CPU with SUCCESS (drink more tea)\n");
539  printf("Released calculated response from GPU with SUCCESS (drink most tea)\n");
540  #endif
541 }
__device__ __constant__ short int segment_gpu[NSplines_GPU]
__device__ __constant__ float val_gpu[NSplines_GPU]
CW: Constant memory needs to be hard-coded on compile time. Could make this texture memory instead,...
#define _BlockSize_
KS: Need it for shared memory, there is way to use dynamic shared memory but I am lazy right now.
Definition: gpuUtils.cuh:25

Member Data Documentation

◆ gpu_coeff_many

float* SMonolithGPU::gpu_coeff_many
private

GPU arrays to hold other coefficients.

Definition at line 192 of file gpuSplineUtils.cuh.

◆ gpu_coeff_TF1_many

float* SMonolithGPU::gpu_coeff_TF1_many
private

GPU arrays to hold TF1 coefficients.

Definition at line 201 of file gpuSplineUtils.cuh.

◆ gpu_coeff_x

float* SMonolithGPU::gpu_coeff_x
private

KS: GPU arrays to hold X coefficient.

Definition at line 189 of file gpuSplineUtils.cuh.

◆ gpu_nKnots_arr

unsigned int* SMonolithGPU::gpu_nKnots_arr
private

KS: GPU Number of knots per spline.

Definition at line 195 of file gpuSplineUtils.cuh.

◆ gpu_nParamPerEvent

unsigned int* SMonolithGPU::gpu_nParamPerEvent
private

KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}.

Definition at line 184 of file gpuSplineUtils.cuh.

◆ gpu_nParamPerEvent_TF1

unsigned int* SMonolithGPU::gpu_nParamPerEvent_TF1
private

KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}.

Definition at line 186 of file gpuSplineUtils.cuh.

◆ gpu_nPoints_arr

short int* SMonolithGPU::gpu_nPoints_arr
private

GPU arrays to hold number of points.

Definition at line 203 of file gpuSplineUtils.cuh.

◆ gpu_paramNo_arr

short int* SMonolithGPU::gpu_paramNo_arr
private

CW: GPU array with the number of points per spline (not per spline point!)

Definition at line 198 of file gpuSplineUtils.cuh.

◆ gpu_paramNo_TF1_arr

short int* SMonolithGPU::gpu_paramNo_TF1_arr
private

CW: GPU array with the number of points per TF1 object.

Definition at line 205 of file gpuSplineUtils.cuh.

◆ gpu_total_weights

float* SMonolithGPU::gpu_total_weights
private

GPU arrays to hold weight for event.

Definition at line 208 of file gpuSplineUtils.cuh.

◆ gpu_weights

float* SMonolithGPU::gpu_weights
private

GPU arrays to hold weight for each spline.

Definition at line 210 of file gpuSplineUtils.cuh.

◆ gpu_weights_tf1

float* SMonolithGPU::gpu_weights_tf1
private

GPU arrays to hold weight for each TF1.

Definition at line 212 of file gpuSplineUtils.cuh.

◆ h_n_events

int SMonolithGPU::h_n_events
private

Number of events living on CPU.

Definition at line 218 of file gpuSplineUtils.cuh.

◆ h_n_params

int SMonolithGPU::h_n_params
private

Number of params living on CPU.

Definition at line 216 of file gpuSplineUtils.cuh.

◆ text_coeff_x

cudaTextureObject_t SMonolithGPU::text_coeff_x = 0
private

KS: Textures are L1 cache variables which are well optimised for fetching. Make texture only for variables you often access but rarely overwrite. There are limits on texture memory so don't use huge arrays.

Definition at line 224 of file gpuSplineUtils.cuh.

◆ text_nParamPerEvent

cudaTextureObject_t SMonolithGPU::text_nParamPerEvent = 0
private

KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}.

Definition at line 227 of file gpuSplineUtils.cuh.

◆ text_nParamPerEvent_TF1

cudaTextureObject_t SMonolithGPU::text_nParamPerEvent_TF1 = 0
private

KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}.

Definition at line 229 of file gpuSplineUtils.cuh.


The documentation for this class was generated from the following files: