MaCh3  2.2.3
Reference Guide
Public Member Functions | Private Attributes | List of all members
SMonolithGPU Class Reference

Class responsible for calculating spline weight on GPU. More...

Public Member Functions

 SMonolithGPU ()
 constructor More...
 
virtual ~SMonolithGPU ()
 destructor More...
 
__host__ void InitGPU_SplineMonolith (float **cpu_total_weights, int n_events, unsigned int total_nknots, unsigned int n_splines, unsigned int n_tf1, int Eve_size)
 Allocate memory on gpu for spline monolith. More...
 
__host__ void CopyToGPU_SplineMonolith (SplineMonoStruct *cpu_spline_handler, std::vector< float > cpu_many_array_TF1, std::vector< short int > cpu_paramNo_arr_TF1, int n_events, std::vector< unsigned int > cpu_nParamPerEvent, std::vector< unsigned int > cpu_nParamPerEvent_TF1, int n_params, unsigned int n_splines, short int spline_size, unsigned int total_nknots, unsigned int n_tf1)
 Copies data from CPU to GPU for the spline monolith. More...
 
__host__ void InitGPU_Segments (short int **segment)
 Allocate memory for spline segments. More...
 
__host__ void InitGPU_Vals (float **vals)
 Allocate memory for spline segments. More...
 
__host__ void RunGPU_SplineMonolith (float *cpu_total_weights, float *vals, short int *segment, const unsigned int h_n_splines, const unsigned int h_n_tf1)
 Run the GPU code for the separate many arrays. As in separate {x}, {y,b,c,d} arrays Pass the segment and the parameter values (binary search already performed in SplineMonolith::FindSplineSegment() More...
 
__host__ void CleanupGPU_SplineMonolith (float *cpu_total_weights)
 This function deallocates the resources allocated for the separate {x} and {ybcd} arrays in the and TF1 stuff at GPU. More...
 
__host__ void CleanupGPU_Segments (short int *segment, float *vals)
 Clean up pinned variables at CPU. More...
 

Private Attributes

unsigned int * gpu_nParamPerEvent
 KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}. More...
 
unsigned int * gpu_nParamPerEvent_TF1
 KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}. More...
 
float * gpu_coeff_x
 KS: GPU arrays to hold X coefficient. More...
 
float * gpu_coeff_many
 GPU arrays to hold other coefficients. More...
 
unsigned int * gpu_nKnots_arr
 KS: GPU Number of knots per spline. More...
 
short int * gpu_paramNo_arr
 CW: GPU array with the number of points per spline (not per spline point!) More...
 
float * gpu_coeff_TF1_many
 GPU arrays to hold TF1 coefficients. More...
 
short int * gpu_nPoints_arr
 GPU arrays to hold number of points. More...
 
short int * gpu_paramNo_TF1_arr
 CW: GPU array with the number of points per TF1 object. More...
 
float * gpu_total_weights
 GPU arrays to hold weight for event. More...
 
float * gpu_weights
 GPU arrays to hold weight for each spline. More...
 
float * gpu_weights_tf1
 GPU arrays to hold weight for each TF1. More...
 
int h_n_params
 Number of params living on CPU. More...
 
int h_n_events
 Number of events living on CPU. More...
 
cudaTextureObject_t text_coeff_x = 0
 KS: Textures are L1 cache variables which are well optimised for fetching. Make texture only for variables you often access but rarely overwrite. There are limits on texture memory so don't use huge arrays. More...
 
cudaTextureObject_t text_nParamPerEvent = 0
 KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}. More...
 
cudaTextureObject_t text_nParamPerEvent_TF1 = 0
 KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}. More...
 

Detailed Description

Class responsible for calculating spline weight on GPU.

Definition at line 61 of file gpuSplineUtils.cuh.

Constructor & Destructor Documentation

◆ SMonolithGPU()

SMonolithGPU::SMonolithGPU ( )

constructor

Number of events living on CPU

Definition at line 81 of file gpuSplineUtils.cu.

81  {
82  h_n_params = -1;
84  h_n_events = -1;
85 
86  gpu_weights = nullptr;
87  gpu_total_weights = nullptr;
88  gpu_nParamPerEvent = nullptr;
89  gpu_nPoints_arr = nullptr;
90  gpu_paramNo_arr = nullptr;
91  gpu_nKnots_arr = nullptr;
92  gpu_coeff_x = nullptr;
93  gpu_coeff_many = nullptr;
94  gpu_coeff_TF1_many = nullptr;
95  gpu_paramNo_TF1_arr = nullptr;
96  gpu_nParamPerEvent_TF1 = nullptr;
97  gpu_weights_tf1 = nullptr;
98 }
unsigned int * gpu_nParamPerEvent
KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number...
short int * gpu_paramNo_TF1_arr
CW: GPU array with the number of points per TF1 object.
int h_n_events
Number of events living on CPU.
float * gpu_weights
GPU arrays to hold weight for each spline.
float * gpu_coeff_many
GPU arrays to hold other coefficients.
unsigned int * gpu_nParamPerEvent_TF1
KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number...
float * gpu_coeff_x
KS: GPU arrays to hold X coefficient.
short int * gpu_nPoints_arr
GPU arrays to hold number of points.
float * gpu_weights_tf1
GPU arrays to hold weight for each TF1.
unsigned int * gpu_nKnots_arr
KS: GPU Number of knots per spline.
int h_n_params
Number of params living on CPU.
float * gpu_coeff_TF1_many
GPU arrays to hold TF1 coefficients.
short int * gpu_paramNo_arr
CW: GPU array with the number of points per spline (not per spline point!)
float * gpu_total_weights
GPU arrays to hold weight for event.

◆ ~SMonolithGPU()

SMonolithGPU::~SMonolithGPU ( )
virtual

destructor

Definition at line 100 of file gpuSplineUtils.cu.

100  {
101 
102 }

Member Function Documentation

◆ CleanupGPU_Segments()

__host__ void SMonolithGPU::CleanupGPU_Segments ( short int *  segment,
float *  vals 
)

Clean up pinned variables at CPU.

Parameters
segmentFound spline segment for each parameter
valsValue to which we want reweight for each parameter

Definition at line 587 of file gpuSplineUtils.cu.

587  {
588 // *******************************************
589  cudaFreeHost(segment);
590  cudaFreeHost(vals);
591 
592  segment = nullptr;
593  vals = nullptr;
594 }

◆ CleanupGPU_SplineMonolith()

__host__ void SMonolithGPU::CleanupGPU_SplineMonolith ( float *  cpu_total_weights)

This function deallocates the resources allocated for the separate {x} and {ybcd} arrays in the and TF1 stuff at GPU.

Parameters
cpu_total_weightsPointer to the total weights array on the CPU (used if Weight_On_SplineBySpline_Basis is not defined).

Definition at line 552 of file gpuSplineUtils.cu.

556  {
557 // *********************************
558  cudaFree(gpu_paramNo_arr);
559  cudaFree(gpu_nKnots_arr);
560 
561  // free the coefficient arrays
562  cudaDestroyTextureObject(text_coeff_x);
563 
564  cudaFree(gpu_coeff_x);
565  cudaFree(gpu_coeff_many);
566 
567  cudaFree(gpu_coeff_TF1_many);
568  cudaFree(gpu_paramNo_TF1_arr);
569  // free weights on the gpu
570  cudaFree(gpu_weights);
571  cudaFree(gpu_weights_tf1);
572 #ifndef Weight_On_SplineBySpline_Basis
573  cudaFree(gpu_total_weights);
574  //KS: Before removing variable let's destroy texture
575  cudaDestroyTextureObject(text_nParamPerEvent);
576  cudaDestroyTextureObject(text_nParamPerEvent_TF1);
577 
578  cudaFree(gpu_nParamPerEvent);
579  cudaFree(gpu_nParamPerEvent_TF1);
580  cudaFreeHost(cpu_total_weights);
581  cpu_total_weights = nullptr;
582 #endif
583 }
cudaTextureObject_t text_nParamPerEvent_TF1
KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of ...
cudaTextureObject_t text_coeff_x
KS: Textures are L1 cache variables which are well optimised for fetching. Make texture only for vari...
cudaTextureObject_t text_nParamPerEvent
KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of ...

◆ CopyToGPU_SplineMonolith()

__host__ void SMonolithGPU::CopyToGPU_SplineMonolith ( SplineMonoStruct cpu_spline_handler,
std::vector< float >  cpu_many_array_TF1,
std::vector< short int >  cpu_paramNo_arr_TF1,
int  n_events,
std::vector< unsigned int >  cpu_nParamPerEvent,
std::vector< unsigned int >  cpu_nParamPerEvent_TF1,
int  n_params,
unsigned int  n_splines,
short int  spline_size,
unsigned int  total_nknots,
unsigned int  n_tf1 
)

Copies data from CPU to GPU for the spline monolith.

This function transfers the necessary spline data and parameters from the CPU to the GPU, including TF1-related arrays and parameters. This setup is crucial for spline evaluations on the GPU.

Parameters
cpu_spline_handlerPointer to the structure managing spline data on the CPU.
cpu_many_array_TF1Array of TF1 parameters on the CPU.
cpu_paramNo_arr_TF1Array containing parameter numbers for TF1 objects.
n_eventsNumber of events, necessary for correct data handling.
cpu_nParamPerEventArray indicating the number of parameters per event.
cpu_nParamPerEvent_TF1Array indicating the number of parameters per TF1 object.
n_paramsTotal number of parameters across all splines.
n_splinesTotal number of spline objects.
spline_sizeSize of each spline object.
total_nknotsTotal number of knots across all splines.
n_tf1Total number of TF1 objects.

Definition at line 201 of file gpuSplineUtils.cu.

217  {
218 // ******************************************************
219  if (n_params != _N_SPLINES_) {
220  printf("Number of splines not equal to %i, GPU code for event-by-event splines will fail\n", _N_SPLINES_);
221  printf("n_params = %i\n", n_params);
222  printf("%s : %i\n", __FILE__, __LINE__);
223  throw;
224  }
225 
226  // Write to the global statics (h_* denotes host stored variable)
227  h_n_params = n_params;
228 #ifndef Weight_On_SplineBySpline_Basis
229  h_n_events = n_events;
230 #endif
231  // Copy the constants
232  // Total number of valid splines for all loaded events
233  cudaMemcpyToSymbol(d_n_splines, &n_splines, sizeof(n_splines));
234  CudaCheckError();
235 
236  // Total number of valid TF1 for all loaded events
237  cudaMemcpyToSymbol(d_n_TF1, &n_tf1, sizeof(n_tf1));
238  CudaCheckError();
239 
240  // Total spline size per spline; i.e. just the number of points or knots in the spline
241  cudaMemcpyToSymbol(d_spline_size, &spline_size, sizeof(spline_size));
242  CudaCheckError();
243 #ifndef Weight_On_SplineBySpline_Basis
244  // Number of events
245  cudaMemcpyToSymbol(d_n_events, &h_n_events, sizeof(h_n_events));
246  CudaCheckError();
247 #endif
248  // Copy the coefficient arrays to the GPU; this only happens once per entire Markov Chain so is OK to do multiple extensive memory copies
249  cudaMemcpy(gpu_coeff_many, cpu_spline_handler->coeff_many.data(), sizeof(float)*total_nknots*_nCoeff_, cudaMemcpyHostToDevice);
250  CudaCheckError();
251 
252  cudaMemcpy(gpu_coeff_x, cpu_spline_handler->coeff_x.data(), sizeof(float)*spline_size*n_params, cudaMemcpyHostToDevice);
253  CudaCheckError();
254 
255  //KS: Bind our texture with the GPU variable
256  //KS: Tried also moving gpu_many_array to texture memory it only worked with restricted number of MC runs, most likely hit texture memory limit :(
257  struct cudaResourceDesc resDesc_coeff_x;
258  memset(&resDesc_coeff_x, 0, sizeof(resDesc_coeff_x));
259  resDesc_coeff_x.resType = cudaResourceTypeLinear;
260  resDesc_coeff_x.res.linear.devPtr = gpu_coeff_x;
261  resDesc_coeff_x.res.linear.desc = cudaCreateChannelDesc<float>();
262  resDesc_coeff_x.res.linear.sizeInBytes = sizeof(float)*spline_size*n_params;
263 
264  // Specify texture object parameters
265  struct cudaTextureDesc texDesc_coeff_x;
266  memset(&texDesc_coeff_x, 0, sizeof(texDesc_coeff_x));
267  texDesc_coeff_x.readMode = cudaReadModeElementType;
268 
269  // Create texture object
270  cudaCreateTextureObject(&text_coeff_x, &resDesc_coeff_x, &texDesc_coeff_x, nullptr);
271  CudaCheckError();
272 
273  // Also copy the parameter number for each spline onto the GPU; i.e. what spline parameter are we calculating right now
274  cudaMemcpy(gpu_paramNo_arr, cpu_spline_handler->paramNo_arr.data(), n_splines*sizeof(short int), cudaMemcpyHostToDevice);
275  CudaCheckError();
276 
277  // Also copy the knot map for each spline onto the GPU;
278  cudaMemcpy(gpu_nKnots_arr, cpu_spline_handler->nKnots_arr.data(), n_splines*sizeof(unsigned int), cudaMemcpyHostToDevice);
279  CudaCheckError();
280 
281  //Now TF1
282  // Copy the coefficient arrays to the GPU; this only happens once per entire Markov Chain so is OK to do multiple extensive memory copies
283  cudaMemcpy(gpu_coeff_TF1_many, cpu_many_array_TF1.data(), sizeof(float)*n_tf1*_nTF1Coeff_, cudaMemcpyHostToDevice);
284  CudaCheckError();
285 
286  // Also copy the parameter number for each TF1 onto the GPU; i.e. what TF1 parameter are we calculating right now
287  cudaMemcpy(gpu_paramNo_TF1_arr, cpu_paramNo_arr_TF1.data(), n_tf1*sizeof(short int), cudaMemcpyHostToDevice);
288  CudaCheckError();
289 
290  #ifndef Weight_On_SplineBySpline_Basis
291  //KS: Keep track how much splines each event has
292  cudaMemcpy(gpu_nParamPerEvent, cpu_nParamPerEvent.data(), 2*n_events*sizeof(unsigned int), cudaMemcpyHostToDevice);
293  CudaCheckError();
294 
295  //KS: Bind our texture with the GPU variable
296  // create a resource descriptor based on device pointers
297  struct cudaResourceDesc resDesc_nParamPerEvent;
298  memset(&resDesc_nParamPerEvent, 0, sizeof(resDesc_nParamPerEvent));
299  resDesc_nParamPerEvent.resType = cudaResourceTypeLinear;
300  resDesc_nParamPerEvent.res.linear.devPtr = gpu_nParamPerEvent;
301  resDesc_nParamPerEvent.res.linear.desc = cudaCreateChannelDesc<unsigned int>();
302  resDesc_nParamPerEvent.res.linear.sizeInBytes = 2*n_events*sizeof(unsigned int);
303 
304  // Specify texture object parameters
305  struct cudaTextureDesc texDesc_nParamPerEvent;
306  memset(&texDesc_nParamPerEvent, 0, sizeof(texDesc_nParamPerEvent));
307  texDesc_nParamPerEvent.readMode = cudaReadModeElementType;
308 
309  //Finally create texture object
310  cudaCreateTextureObject(&text_nParamPerEvent, &resDesc_nParamPerEvent, &texDesc_nParamPerEvent, nullptr);
311  CudaCheckError();
312 
313  // Now TF1
314  cudaMemcpy(gpu_nParamPerEvent_TF1, cpu_nParamPerEvent_TF1.data(), 2*n_events*sizeof(unsigned int), cudaMemcpyHostToDevice);
315  CudaCheckError();
316 
317  //KS: Bind our texture with the GPU variable
318  // create a resource descriptor based on device pointers
319  struct cudaResourceDesc resDesc_nParamPerEvent_tf1;
320  memset(&resDesc_nParamPerEvent_tf1, 0, sizeof(resDesc_nParamPerEvent_tf1));
321  resDesc_nParamPerEvent_tf1.resType = cudaResourceTypeLinear;
322  resDesc_nParamPerEvent_tf1.res.linear.devPtr = gpu_nParamPerEvent_TF1;
323  resDesc_nParamPerEvent_tf1.res.linear.desc = cudaCreateChannelDesc<unsigned int>();
324  resDesc_nParamPerEvent_tf1.res.linear.sizeInBytes = 2*n_events*sizeof(unsigned int);
325 
326  // Specify texture object parameters
327  struct cudaTextureDesc texDesc_nParamPerEvent_tf1;
328  memset(&texDesc_nParamPerEvent_tf1, 0, sizeof(texDesc_nParamPerEvent_tf1));
329  texDesc_nParamPerEvent_tf1.readMode = cudaReadModeElementType;
330 
331  //Finally create texture object
332  cudaCreateTextureObject(&text_nParamPerEvent_TF1, &resDesc_nParamPerEvent_tf1, &texDesc_nParamPerEvent_tf1, nullptr);
333  CudaCheckError();
334  #endif
335 }
constexpr int _nCoeff_
KS: We store coefficients {y,b,c,d} in one array one by one, this is only to define it once rather th...
Definition: SplineCommon.h:13
constexpr int _nTF1Coeff_
KS: For TF1 we store at most 5 coefficients, we could make it more flexible but for now define it her...
Definition: SplineCommon.h:15
#define _N_SPLINES_
__device__ __constant__ unsigned int d_n_TF1
Number of tf1 living on GPU.
__device__ __constant__ unsigned int d_n_splines
Number of splines living on GPU.
__device__ __constant__ int d_n_events
Number of events living on GPU.
__device__ __constant__ short int d_spline_size
Size of splines living on GPU.
#define CudaCheckError()
Definition: gpuUtils.cuh:22
std::vector< unsigned int > nKnots_arr
KS: CPU Number of knots per spline.
Definition: SplineCommon.h:41
std::vector< float > coeff_x
KS: CPU arrays to hold X coefficient.
Definition: SplineCommon.h:32
std::vector< float > coeff_many
CPU arrays to hold other coefficients.
Definition: SplineCommon.h:38
std::vector< short int > paramNo_arr
CW: CPU array with the number of points per spline (not per spline point!)
Definition: SplineCommon.h:44

◆ InitGPU_Segments()

__host__ void SMonolithGPU::InitGPU_Segments ( short int **  segment)

Allocate memory for spline segments.

Parameters
segmentFound spline segment for each parameter

Definition at line 178 of file gpuSplineUtils.cu.

178  {
179 // *******************************************
180  //KS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
181  cudaMallocHost((void **) segment, _N_SPLINES_*sizeof(short int));
182  CudaCheckError();
183 }

◆ InitGPU_SplineMonolith()

__host__ void SMonolithGPU::InitGPU_SplineMonolith ( float **  cpu_total_weights,
int  n_events,
unsigned int  total_nknots,
unsigned int  n_splines,
unsigned int  n_tf1,
int  Eve_size 
)

Allocate memory on gpu for spline monolith.

Parameters
cpu_total_weightsKS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
n_eventsNumber of events, this is necessary to allocate correctly memory
total_nknotsTotal number of knots in all splines summed
n_splinesTotal number of spline objects, not knots
n_tf1Total number of TF1 objects, not coefficients

Definition at line 106 of file gpuSplineUtils.cu.

114  {
115 // *******************************************
116  // Allocate chunks of memory to GPU
117  cudaMalloc((void **) &gpu_paramNo_arr, n_splines*sizeof(short int));
118  CudaCheckError();
119 
120  cudaMalloc((void **) &gpu_nKnots_arr, n_splines*sizeof(unsigned int));
121  CudaCheckError();
122 
123  cudaMalloc((void **) &gpu_coeff_x, Eve_size*sizeof(float));
124  CudaCheckError();
125 
126  cudaMalloc((void **) &gpu_coeff_many, _nCoeff_*total_nknots*sizeof(float));
127  CudaCheckError();
128 
129  // Allocate memory for the array of weights to be returned to CPU
130  cudaMalloc((void **) &gpu_weights, n_splines*sizeof(float));
131  CudaCheckError();
132 
133  // Now TF1 specific
134  cudaMalloc((void **) &gpu_coeff_TF1_many, _nTF1Coeff_*n_tf1*sizeof(float));
135  CudaCheckError();
136 
137  cudaMalloc((void **) &gpu_weights_tf1, n_tf1*sizeof(float));
138  CudaCheckError();
139 
140  cudaMalloc((void **) &gpu_paramNo_TF1_arr, n_tf1*sizeof(short int));
141  CudaCheckError();
142 
143 #ifndef Weight_On_SplineBySpline_Basis
144  //KS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
145  cudaMallocHost((void **) cpu_total_weights, n_events*sizeof(float));
146  CudaCheckError();
147 
148  //KS: Allocate memory for the array of total weights to be returned to CPU
149  cudaMalloc((void **) &gpu_total_weights, n_events*sizeof(float));
150  CudaCheckError();
151 
152  //KS: Allocate memory for the map keeping track how many splines each parameter has
153  cudaMalloc((void **) &gpu_nParamPerEvent, 2*n_events*sizeof(unsigned int));
154  CudaCheckError();
155 
156  //KS: Allocate memory for the map keeping track how many TF1 each parameter has
157  cudaMalloc((void **) &gpu_nParamPerEvent_TF1, 2*n_events*sizeof(unsigned int));
158  CudaCheckError();
159 #endif
160 
161  // Print allocation info to user
162  printf("Allocated %i entries for paramNo and nKnots arrays, size = %f MB\n",
163  n_splines, static_cast<double>(sizeof(short int) * n_splines + sizeof(unsigned int) * n_splines) / 1.0e6);
164  printf("Allocated %i entries for x coeff arrays, size = %f MB\n",
165  Eve_size, static_cast<double>(sizeof(float) * Eve_size) / 1.0e6);
166  printf("Allocated %i entries for {ybcd} coeff arrays, size = %f MB\n",
167  _nCoeff_ * total_nknots, static_cast<double>(sizeof(float) * _nCoeff_ * total_nknots) / 1.0e6);
168  printf("Allocated %i entries for TF1 coefficient arrays, size = %f MB\n",
169  _nTF1Coeff_ * n_tf1, static_cast<double>(sizeof(float) * _nTF1Coeff_ * n_tf1) / 1.0e6);
170 
171  //KS: Ask CUDA about memory usage
172  checkGpuMem();
173  PrintNdevices();
174 }
void checkGpuMem()
KS: Get some fancy info about VRAM usage.
Definition: gpuUtils.cu:43
void PrintNdevices()
KS: Get some fancy info about GPU.
Definition: gpuUtils.cu:60

◆ InitGPU_Vals()

__host__ void SMonolithGPU::InitGPU_Vals ( float **  vals)

Allocate memory for spline segments.

Parameters
valsValue to which we want reweight for each parameter

Definition at line 187 of file gpuSplineUtils.cu.

187  {
188 // *******************************************
189  //KS: Rather than allocate memory in standard way this fancy cuda tool allows to pin host memory which make memory transfer faster
190  cudaMallocHost((void **) vals, _N_SPLINES_*sizeof(float));
191  CudaCheckError();
192 }

◆ RunGPU_SplineMonolith()

__host__ void SMonolithGPU::RunGPU_SplineMonolith ( float *  cpu_total_weights,
float *  vals,
short int *  segment,
const unsigned int  h_n_splines,
const unsigned int  h_n_tf1 
)

Run the GPU code for the separate many arrays. As in separate {x}, {y,b,c,d} arrays Pass the segment and the parameter values (binary search already performed in SplineMonolith::FindSplineSegment()

Executes the GPU code for calculating spline weights.

This function runs the GPU computation for the spline monolith. It assumes that the appropriate segment has already been identified through binary search in the SplineMonolith::FindSplineSegment() function.

Parameters
cpu_weightsPointer to the array of weights on the CPU (used if Weight_On_SplineBySpline_Basis is defined).
cpu_weights_tf1Pointer to the array of TF1 weights (used if Weight_On_SplineBySpline_Basis is defined).
cpu_total_weightsPointer to the total weights array (used if Weight_On_SplineBySpline_Basis is not defined).
valsPointer to an array holding the parameter values to be processed.
segmentPointer to an array containing segment indices for parameters.
h_n_splinesTotal number of spline objects in the GPU context.
h_n_tf1Total number of TF1 objects in the GPU context.

Definition at line 459 of file gpuSplineUtils.cu.

471  {
472 // *****************************************
473  dim3 block_size;
474  dim3 grid_size;
475 
476  block_size.x = _BlockSize_;
477  grid_size.x = (h_n_splines / block_size.x) + 1;
478 
479  // Copy the segment values to the GPU (segment_gpu), which is h_n_params long
480  cudaMemcpyToSymbol(segment_gpu, segment, h_n_params*sizeof(short int));
481  CudaCheckError();
482 
483  // Copy the parameter values values to the GPU (vals_gpu), which is h_n_params long
484  cudaMemcpyToSymbol(val_gpu, vals, h_n_params*sizeof(float));
485  CudaCheckError();
486 
487  // KS: Consider asynchronous kernel call, this might help EvalOnGPU_Splines and EvalOnGPU_TF1 are independent
488  // Set the cache config to prefer L1 for the kernel
489  //cudaFuncSetCacheConfig(EvalOnGPU_Splines, cudaFuncCachePreferL1);
490  EvalOnGPU_Splines<<<grid_size, block_size>>>(
493 
495 
496  gpu_weights,
498  );
499  CudaCheckError();
500 
501  grid_size.x = (h_n_tf1 / block_size.x) + 1;
502  EvalOnGPU_TF1<<<grid_size, block_size>>>(
505 
507  );
508  CudaCheckError();
509 
510 //KS: We can either copy gpu_weight and calculate total weight in reweighting loop, or not copy and calculate total weight stall at GPU, which means less memory transfer
511 #ifdef Weight_On_SplineBySpline_Basis
512  // Here we have to make a somewhat large GPU->CPU transfer because it's all the splines' response
513  cudaMemcpy(cpu_weights, gpu_weights, h_n_splines*sizeof(float), cudaMemcpyDeviceToHost);
514  CudaCheckError();
515 
516  cudaMemcpy(cpu_weights_tf1, gpu_weights_tf1, h_n_tf1*sizeof(float), cudaMemcpyDeviceToHost);
517  CudaCheckError();
518 
519 //KS: Else calculate Total Weight
520 #else
521  grid_size.x = (h_n_events / block_size.x) + 1;
522 
523  EvalOnGPU_TotWeight<<<grid_size, block_size>>>(
524  gpu_weights,
526 
528 
531  );
532  CudaCheckError();
533 
534  //KS: Here we have to make a somewhat large GPU->CPU transfer because it is proportional to number of events
535  //KS: Normally code wait for memory transfer to finish before moving further cudaMemcpyAsync means we will continue to execute code and in a meantime keep copying stuff.
536  cudaMemcpyAsync(cpu_total_weights, gpu_total_weights, h_n_events * sizeof(float), cudaMemcpyDeviceToHost, 0);
537  CudaCheckError();
538 #endif
539 
540  #ifdef DEBUG
541  printf("Copied GPU total weights to CPU with SUCCESS (drink more tea)\n");
542  printf("Released calculated response from GPU with SUCCESS (drink most tea)\n");
543  #endif
544 }
__device__ __constant__ short int segment_gpu[NSplines_GPU]
__device__ __constant__ float val_gpu[NSplines_GPU]
CW: Constant memory needs to be hard-coded on compile time. Could make this texture memory instead,...
#define _BlockSize_
KS: Need it for shared memory, there is way to use dynamic shared memory but I am lazy right now.
Definition: gpuUtils.cuh:25

Member Data Documentation

◆ gpu_coeff_many

float* SMonolithGPU::gpu_coeff_many
private

GPU arrays to hold other coefficients.

Definition at line 192 of file gpuSplineUtils.cuh.

◆ gpu_coeff_TF1_many

float* SMonolithGPU::gpu_coeff_TF1_many
private

GPU arrays to hold TF1 coefficients.

Definition at line 201 of file gpuSplineUtils.cuh.

◆ gpu_coeff_x

float* SMonolithGPU::gpu_coeff_x
private

KS: GPU arrays to hold X coefficient.

Definition at line 189 of file gpuSplineUtils.cuh.

◆ gpu_nKnots_arr

unsigned int* SMonolithGPU::gpu_nKnots_arr
private

KS: GPU Number of knots per spline.

Definition at line 195 of file gpuSplineUtils.cuh.

◆ gpu_nParamPerEvent

unsigned int* SMonolithGPU::gpu_nParamPerEvent
private

KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}.

Definition at line 184 of file gpuSplineUtils.cuh.

◆ gpu_nParamPerEvent_TF1

unsigned int* SMonolithGPU::gpu_nParamPerEvent_TF1
private

KS: GPU map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}.

Definition at line 186 of file gpuSplineUtils.cuh.

◆ gpu_nPoints_arr

short int* SMonolithGPU::gpu_nPoints_arr
private

GPU arrays to hold number of points.

Definition at line 203 of file gpuSplineUtils.cuh.

◆ gpu_paramNo_arr

short int* SMonolithGPU::gpu_paramNo_arr
private

CW: GPU array with the number of points per spline (not per spline point!)

Definition at line 198 of file gpuSplineUtils.cuh.

◆ gpu_paramNo_TF1_arr

short int* SMonolithGPU::gpu_paramNo_TF1_arr
private

CW: GPU array with the number of points per TF1 object.

Definition at line 205 of file gpuSplineUtils.cuh.

◆ gpu_total_weights

float* SMonolithGPU::gpu_total_weights
private

GPU arrays to hold weight for event.

Definition at line 208 of file gpuSplineUtils.cuh.

◆ gpu_weights

float* SMonolithGPU::gpu_weights
private

GPU arrays to hold weight for each spline.

Definition at line 210 of file gpuSplineUtils.cuh.

◆ gpu_weights_tf1

float* SMonolithGPU::gpu_weights_tf1
private

GPU arrays to hold weight for each TF1.

Definition at line 212 of file gpuSplineUtils.cuh.

◆ h_n_events

int SMonolithGPU::h_n_events
private

Number of events living on CPU.

Definition at line 218 of file gpuSplineUtils.cuh.

◆ h_n_params

int SMonolithGPU::h_n_params
private

Number of params living on CPU.

Definition at line 216 of file gpuSplineUtils.cuh.

◆ text_coeff_x

cudaTextureObject_t SMonolithGPU::text_coeff_x = 0
private

KS: Textures are L1 cache variables which are well optimised for fetching. Make texture only for variables you often access but rarely overwrite. There are limits on texture memory so don't use huge arrays.

Definition at line 224 of file gpuSplineUtils.cuh.

◆ text_nParamPerEvent

cudaTextureObject_t SMonolithGPU::text_nParamPerEvent = 0
private

KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of splines per event, index where splines start for a given event}.

Definition at line 227 of file gpuSplineUtils.cuh.

◆ text_nParamPerEvent_TF1

cudaTextureObject_t SMonolithGPU::text_nParamPerEvent_TF1 = 0
private

KS: Map keeping track how many parameters applies to each event, we keep two numbers here {number of TF1 per event, index where TF1 start for a given event}.

Definition at line 229 of file gpuSplineUtils.cuh.


The documentation for this class was generated from the following files: