fml  0.1-0
Fused Matrix Library
fml::card Class Reference

GPU data and methods. More...

#include <card.hh>

Public Member Functions

 card ()
 Create a new card object. Does not initialize any GPU data.
 
 card (const int id=0)
 Create a new card object and set up internal CUDA data. More...
 
 card (const card &x)
 
void set (const int id)
 Sets up the existing card object. More...
 
void info () const
 Print some brief information about the GPU. More...
 
void * mem_alloc (const size_t len)
 Allocate device memory. More...
 
void mem_set (void *ptr, const int value, const size_t len)
 Set device memory. More...
 
void mem_free (void *ptr)
 Free device memory. More...
 
void mem_cpu2gpu (void *dst, const void *src, const size_t len)
 Copy host (CPU) data to device (GPU) memory. More...
 
void mem_gpu2cpu (void *dst, const void *src, const size_t len)
 Copy device (GPU) data to host (CPU) memory. More...
 
void mem_gpu2gpu (void *dst, const void *src, const size_t len)
 Copy device (GPU) data to other device (GPU) memory. More...
 
void synch ()
 Synchronize device. More...
 
void check ()
 Check for (and throw if found) a CUDA error. More...
 
void set_math_mode (gpublas_mathmode_t mode)
 Manually set the GPU BLAS math mode (as supported by hardware). More...
 
int get_id ()
 
int get_id () const
 
gpublas_handle_t blas_handle ()
 GPU BLAS handle.
 
gpublas_handle_t blas_handle () const
 
gpulapack_handle_t lapack_handle ()
 GPU LAPACK handle.
 
gpulapack_handle_t lapack_handle () const
 
bool valid_card () const
 Is the gpu data valid?
 

Protected Attributes

int _id
 
gpublas_handle_t _blas_handle
 
gpulapack_handle_t _lapack_handle
 

Detailed Description

GPU data and methods.

Implementation Details
Stores GPU ordinal and BLAS/LAPACK handles. Methods are wrappers around core GPU operations, allowing GPU malloc, memset, etc.

You probably should not use these methods directly unless you know what you are doing (in which case you probably do not even need them). Simply pass a card object to a GPU object constructor and move on.

Constructor & Destructor Documentation

◆ card()

fml::card::card ( const int  id = 0)
inline

Create a new card object and set up internal CUDA data.

Sets the current device to the provided GPU id and initializes GPU BLAS and LAPACK handles.

Parameters
[in]idOrdinal number corresponding to the desired GPU device.

Exceptions
If the GPU can not be initialized, or if the allocation of one of the handles fails, the method will throw a 'runtime_error' exception.

Member Function Documentation

◆ check()

void fml::card::check ( )
inline

Check for (and throw if found) a CUDA error.

Implementation Details
Wrapper around GPU error lookup, e.g. cudaGetLastError().

Exceptions
If a CUDA error is detected, this throws a 'runtime_error' exception.

◆ get_id()

int fml::card::get_id ( )
inline

The ordinal number corresponding to the GPU device.

◆ info()

void fml::card::info ( ) const
inline

Print some brief information about the GPU.

Implementation Details
Uses NVML.

◆ mem_alloc()

void * fml::card::mem_alloc ( const size_t  len)
inline

Allocate device memory.

Parameters
[in]lenNumber of bytes of memory to allocate.
Returns
Pointer to the newly allocated device memory.

Implementation Details
Wrapper around GPU malloc, e.g. cudaMalloc().

Exceptions
If the allocation fails, this throws a 'runtime_error' exception.

◆ mem_cpu2gpu()

void fml::card::mem_cpu2gpu ( void *  dst,
const void *  src,
const size_t  len 
)
inline

Copy host (CPU) data to device (GPU) memory.

Parameters
[in,out]dstThe device memory you want to copy TO.
[in]srcThe host memory you want to copy FROM.
[in]lenNumber of bytes of each array to use.

Implementation Details
Wrapper around GPU memcpy, e.g. cudaMemcpy().

Exceptions
If the function fails (e.g., being by improperly using device memory), this throws a 'runtime_error' exception.

◆ mem_free()

void fml::card::mem_free ( void *  ptr)
inline

Free device memory.

Parameters
[in]ptrThe device memory you want to un-allocate.

Implementation Details
Wrapper around GPU free, e.g. cudaFree().

Exceptions
If the function fails (e.g., being by given non-device memory), this throws a 'runtime_error' exception.

◆ mem_gpu2cpu()

void fml::card::mem_gpu2cpu ( void *  dst,
const void *  src,
const size_t  len 
)
inline

Copy device (GPU) data to host (CPU) memory.

Parameters
[in,out]dstThe host memory you want to copy TO.
[in]srcThe device memory you want to copy FROM.
[in]lenNumber of bytes of each array to use.

Implementation Details
Wrapper around GPU memcpy, e.g. cudaMemcpy().

Exceptions
If the function fails (e.g., being by improperly using device memory), this throws a 'runtime_error' exception.

◆ mem_gpu2gpu()

void fml::card::mem_gpu2gpu ( void *  dst,
const void *  src,
const size_t  len 
)
inline

Copy device (GPU) data to other device (GPU) memory.

Parameters
[in,out]dstThe device memory you want to copy TO.
[in]srcThe device memory you want to copy FROM.
[in]lenNumber of bytes of each array to use.

Implementation Details
Wrapper around GPU memcpy, e.g. cudaMemcpy().

Exceptions
If the function fails (e.g., being by improperly using device memory), this throws a 'runtime_error' exception.

◆ mem_set()

void fml::card::mem_set ( void *  ptr,
const int  value,
const size_t  len 
)
inline

Set device memory.

Parameters
[in,out]ptrOn entrance, the already-allocated block of memory to set. On exit, blocks of length 'len' will be set to 'value'.
[in]valueThe value to set.
[in]lenNumber of bytes of the input 'ptr' to set to 'value'.
Returns
Pointer to the newly allocated device memory.

Implementation Details
Wrapper around GPU memset, e.g. cudaMemset().

Exceptions
If the function fails (e.g., being by given non-device memory), this throws a 'runtime_error' exception.

◆ set()

void fml::card::set ( const int  id)
inline

Sets up the existing card object.

For use with the no-argument constructor. Frees any existing GPU data already allocated and stored in the object. Misuse of this could lead to some seemingly strange errors.

Parameters
[in]idOrdinal number corresponding to the desired GPU device.

Exceptions
If the GPU can not be initialized, or if the allocation of one of the handles fails, the method will throw a 'runtime_error' exception.

◆ set_math_mode()

void fml::card::set_math_mode ( gpublas_mathmode_t  mode)
inline

Manually set the GPU BLAS math mode (as supported by hardware).

Not all options are supported by all hardware/driver versions. If the function is not explicitly called, the device will use the default behavior; the vendor may vary this behavior over time.

Parameters
[in]modeShould be one of: GPUBLAS_MATH_DEFAULT - the default mode of the device GPUBLAS_MATH_ACCELERATE - use acceleration (e.g. tensorcores) in single precision routines GPUBLAS_MATH_PEDANTIC - uses only the prescribed precision

Implementation Details
Wrapper around GPU error lookup, e.g. cublasSetMathMode().

Exceptions
If a CUDA error is detected, this throws a 'runtime_error' exception.

◆ synch()

void fml::card::synch ( )
inline

Synchronize device.

Blocks further GPU execution until the device completes all previously executed kernels.

Implementation Details
Wrapper around GPU synchronize, e.g. cudaDeviceSynchronize().

Exceptions
If a CUDA error is detected, this throws a 'runtime_error' exception.


The documentation for this class was generated from the following file: