![]() |
fml
0.1-0
Fused Matrix Library
|
GPU data and methods. More...
#include <card.hh>
Public Member Functions | |
card () | |
Create a new card object. Does not initialize any GPU data. | |
card (const int id=0) | |
Create a new card object and set up internal CUDA data. More... | |
card (const card &x) | |
void | set (const int id) |
Sets up the existing card object. More... | |
void | info () const |
Print some brief information about the GPU. More... | |
void * | mem_alloc (const size_t len) |
Allocate device memory. More... | |
void | mem_set (void *ptr, const int value, const size_t len) |
Set device memory. More... | |
void | mem_free (void *ptr) |
Free device memory. More... | |
void | mem_cpu2gpu (void *dst, const void *src, const size_t len) |
Copy host (CPU) data to device (GPU) memory. More... | |
void | mem_gpu2cpu (void *dst, const void *src, const size_t len) |
Copy device (GPU) data to host (CPU) memory. More... | |
void | mem_gpu2gpu (void *dst, const void *src, const size_t len) |
Copy device (GPU) data to other device (GPU) memory. More... | |
void | synch () |
Synchronize device. More... | |
void | check () |
Check for (and throw if found) a CUDA error. More... | |
void | set_math_mode (gpublas_mathmode_t mode) |
Manually set the GPU BLAS math mode (as supported by hardware). More... | |
int | get_id () |
int | get_id () const |
gpublas_handle_t | blas_handle () |
GPU BLAS handle. | |
gpublas_handle_t | blas_handle () const |
gpulapack_handle_t | lapack_handle () |
GPU LAPACK handle. | |
gpulapack_handle_t | lapack_handle () const |
bool | valid_card () const |
Is the gpu data valid? | |
Protected Attributes | |
int | _id |
gpublas_handle_t | _blas_handle |
gpulapack_handle_t | _lapack_handle |
GPU data and methods.
Implementation Details
Stores GPU ordinal and BLAS/LAPACK handles. Methods are wrappers around core GPU operations, allowing GPU malloc, memset, etc.
You probably should not use these methods directly unless you know what you are doing (in which case you probably do not even need them). Simply pass a card object to a GPU object constructor and move on.
|
inline |
Create a new card object and set up internal CUDA data.
Sets the current device to the provided GPU id and initializes GPU BLAS and LAPACK handles.
[in] | id | Ordinal number corresponding to the desired GPU device. |
Exceptions
If the GPU can not be initialized, or if the allocation of one of the handles fails, the method will throw a 'runtime_error' exception.
|
inline |
Check for (and throw if found) a CUDA error.
Implementation Details
Wrapper around GPU error lookup, e.g. cudaGetLastError()
.
Exceptions
If a CUDA error is detected, this throws a 'runtime_error' exception.
|
inline |
The ordinal number corresponding to the GPU device.
|
inline |
Print some brief information about the GPU.
Implementation Details
Uses NVML.
|
inline |
Allocate device memory.
[in] | len | Number of bytes of memory to allocate. |
Implementation Details
Wrapper around GPU malloc, e.g. cudaMalloc()
.
Exceptions
If the allocation fails, this throws a 'runtime_error' exception.
|
inline |
Copy host (CPU) data to device (GPU) memory.
[in,out] | dst | The device memory you want to copy TO. |
[in] | src | The host memory you want to copy FROM. |
[in] | len | Number of bytes of each array to use. |
Implementation Details
Wrapper around GPU memcpy, e.g. cudaMemcpy()
.
Exceptions
If the function fails (e.g., being by improperly using device memory), this throws a 'runtime_error' exception.
|
inline |
Free device memory.
[in] | ptr | The device memory you want to un-allocate. |
Implementation Details
Wrapper around GPU free, e.g. cudaFree()
.
Exceptions
If the function fails (e.g., being by given non-device memory), this throws a 'runtime_error' exception.
|
inline |
Copy device (GPU) data to host (CPU) memory.
[in,out] | dst | The host memory you want to copy TO. |
[in] | src | The device memory you want to copy FROM. |
[in] | len | Number of bytes of each array to use. |
Implementation Details
Wrapper around GPU memcpy, e.g. cudaMemcpy()
.
Exceptions
If the function fails (e.g., being by improperly using device memory), this throws a 'runtime_error' exception.
|
inline |
Copy device (GPU) data to other device (GPU) memory.
[in,out] | dst | The device memory you want to copy TO. |
[in] | src | The device memory you want to copy FROM. |
[in] | len | Number of bytes of each array to use. |
Implementation Details
Wrapper around GPU memcpy, e.g. cudaMemcpy()
.
Exceptions
If the function fails (e.g., being by improperly using device memory), this throws a 'runtime_error' exception.
|
inline |
Set device memory.
[in,out] | ptr | On entrance, the already-allocated block of memory to set. On exit, blocks of length 'len' will be set to 'value'. |
[in] | value | The value to set. |
[in] | len | Number of bytes of the input 'ptr' to set to 'value'. |
Implementation Details
Wrapper around GPU memset, e.g. cudaMemset()
.
Exceptions
If the function fails (e.g., being by given non-device memory), this throws a 'runtime_error' exception.
|
inline |
Sets up the existing card object.
For use with the no-argument constructor. Frees any existing GPU data already allocated and stored in the object. Misuse of this could lead to some seemingly strange errors.
[in] | id | Ordinal number corresponding to the desired GPU device. |
Exceptions
If the GPU can not be initialized, or if the allocation of one of the handles fails, the method will throw a 'runtime_error' exception.
|
inline |
Manually set the GPU BLAS math mode (as supported by hardware).
Not all options are supported by all hardware/driver versions. If the function is not explicitly called, the device will use the default behavior; the vendor may vary this behavior over time.
[in] | mode | Should be one of: GPUBLAS_MATH_DEFAULT - the default mode of the device GPUBLAS_MATH_ACCELERATE - use acceleration (e.g. tensorcores) in single precision routines GPUBLAS_MATH_PEDANTIC - uses only the prescribed precision |
Implementation Details
Wrapper around GPU error lookup, e.g. cublasSetMathMode()
.
Exceptions
If a CUDA error is detected, this throws a 'runtime_error' exception.
|
inline |
Synchronize device.
Blocks further GPU execution until the device completes all previously executed kernels.
Implementation Details
Wrapper around GPU synchronize, e.g. cudaDeviceSynchronize()
.
Exceptions
If a CUDA error is detected, this throws a 'runtime_error' exception.