GPU data and methods. More...

#include <card.hh>

Public Member Functions
	card ()
	Create a new card object. Does not initialize any GPU data.

	card (const int id=0)
	Create a new card object and set up internal CUDA data. More...

	card (const card &x)

void	set (const int id)
	Sets up the existing card object. More...

void	info () const
	Print some brief information about the GPU. More...

void *	mem_alloc (const size_t len)
	Allocate device memory. More...

void	mem_set (void *ptr, const int value, const size_t len)
	Set device memory. More...

void	mem_free (void *ptr)
	Free device memory. More...

void	mem_cpu2gpu (void dst, const void src, const size_t len)
	Copy host (CPU) data to device (GPU) memory. More...

void	mem_gpu2cpu (void dst, const void src, const size_t len)
	Copy device (GPU) data to host (CPU) memory. More...

void	mem_gpu2gpu (void dst, const void src, const size_t len)
	Copy device (GPU) data to other device (GPU) memory. More...

void	synch ()
	Synchronize device. More...

void	check ()
	Check for (and throw if found) a CUDA error. More...

void	set_math_mode (gpublas_mathmode_t mode)
	Manually set the GPU BLAS math mode (as supported by hardware). More...


int	get_id ()

int	get_id () const

gpublas_handle_t	blas_handle ()
	GPU BLAS handle.

gpublas_handle_t	blas_handle () const

gpulapack_handle_t	lapack_handle ()
	GPU LAPACK handle.

gpulapack_handle_t	lapack_handle () const

bool	valid_card () const
	Is the gpu data valid?

Protected Attributes
int	_id

gpublas_handle_t	_blas_handle

gpulapack_handle_t	_lapack_handle

Detailed Description

GPU data and methods.

Implementation Details
Stores GPU ordinal and BLAS/LAPACK handles. Methods are wrappers around core GPU operations, allowing GPU malloc, memset, etc.

You probably should not use these methods directly unless you know what you are doing (in which case you probably do not even need them). Simply pass a card object to a GPU object constructor and move on.

Constructor & Destructor Documentation

◆ card()

fml::card::card ( const int id = 0 )

inline

Create a new card object and set up internal CUDA data.

Sets the current device to the provided GPU id and initializes GPU BLAS and LAPACK handles.

Parameters

[in] id Ordinal number corresponding to the desired GPU device.

Exceptions
If the GPU can not be initialized, or if the allocation of one of the handles fails, the method will throw a 'runtime_error' exception.

Member Function Documentation

◆ check()

void fml::card::check ( )

inline

Check for (and throw if found) a CUDA error.

Implementation Details
Wrapper around GPU error lookup, e.g. cudaGetLastError().

Exceptions
If a CUDA error is detected, this throws a 'runtime_error' exception.

◆ get_id()

int fml::card::get_id ( )

inline

The ordinal number corresponding to the GPU device.

◆ info()

void fml::card::info ( ) const

inline

Print some brief information about the GPU.

Implementation Details
Uses NVML.

◆ mem_alloc()

void * fml::card::mem_alloc ( const size_t len )

inline

Allocate device memory.

Parameters

[in] len Number of bytes of memory to allocate.

Returns: Pointer to the newly allocated device memory.

Implementation Details
Wrapper around GPU malloc, e.g. cudaMalloc().

Exceptions
If the allocation fails, this throws a 'runtime_error' exception.

◆ mem_cpu2gpu()

void fml::card::mem_cpu2gpu	(	void *	dst,
		const void *	src,
		const size_t	len
	)

inline

Copy host (CPU) data to device (GPU) memory.

Parameters

[in,out]	dst	The device memory you want to copy TO.
[in]	src	The host memory you want to copy FROM.
[in]	len	Number of bytes of each array to use.

Implementation Details
Wrapper around GPU memcpy, e.g. cudaMemcpy().

Exceptions
If the function fails (e.g., being by improperly using device memory), this throws a 'runtime_error' exception.

◆ mem_free()

void fml::card::mem_free ( void * ptr )

inline

Free device memory.

Parameters

[in] ptr The device memory you want to un-allocate.

Implementation Details
Wrapper around GPU free, e.g. cudaFree().

Exceptions
If the function fails (e.g., being by given non-device memory), this throws a 'runtime_error' exception.

◆ mem_gpu2cpu()

void fml::card::mem_gpu2cpu	(	void *	dst,
		const void *	src,
		const size_t	len
	)

inline

Copy device (GPU) data to host (CPU) memory.

Parameters

[in,out]	dst	The host memory you want to copy TO.
[in]	src	The device memory you want to copy FROM.
[in]	len	Number of bytes of each array to use.

Implementation Details
Wrapper around GPU memcpy, e.g. cudaMemcpy().

Exceptions
If the function fails (e.g., being by improperly using device memory), this throws a 'runtime_error' exception.

◆ mem_gpu2gpu()

void fml::card::mem_gpu2gpu	(	void *	dst,
		const void *	src,
		const size_t	len
	)

inline

Copy device (GPU) data to other device (GPU) memory.

Parameters

[in,out]	dst	The device memory you want to copy TO.
[in]	src	The device memory you want to copy FROM.
[in]	len	Number of bytes of each array to use.

Implementation Details
Wrapper around GPU memcpy, e.g. cudaMemcpy().

Exceptions
If the function fails (e.g., being by improperly using device memory), this throws a 'runtime_error' exception.

◆ mem_set()

void fml::card::mem_set	(	void *	ptr,
		const int	value,
		const size_t	len
	)

inline

Set device memory.

Parameters

[in,out]	ptr	On entrance, the already-allocated block of memory to set. On exit, blocks of length 'len' will be set to 'value'.
[in]	value	The value to set.
[in]	len	Number of bytes of the input 'ptr' to set to 'value'.

Returns: Pointer to the newly allocated device memory.

Implementation Details
Wrapper around GPU memset, e.g. cudaMemset().

Exceptions
If the function fails (e.g., being by given non-device memory), this throws a 'runtime_error' exception.

◆ set()

void fml::card::set ( const int id )

inline

Sets up the existing card object.

For use with the no-argument constructor. Frees any existing GPU data already allocated and stored in the object. Misuse of this could lead to some seemingly strange errors.

Parameters

[in] id Ordinal number corresponding to the desired GPU device.

Exceptions
If the GPU can not be initialized, or if the allocation of one of the handles fails, the method will throw a 'runtime_error' exception.

◆ set_math_mode()

void fml::card::set_math_mode ( gpublas_mathmode_t mode )

inline

Manually set the GPU BLAS math mode (as supported by hardware).

Not all options are supported by all hardware/driver versions. If the function is not explicitly called, the device will use the default behavior; the vendor may vary this behavior over time.

Parameters

[in] mode Should be one of: GPUBLAS_MATH_DEFAULT - the default mode of the device GPUBLAS_MATH_ACCELERATE - use acceleration (e.g. tensorcores) in single precision routines GPUBLAS_MATH_PEDANTIC - uses only the prescribed precision

Implementation Details
Wrapper around GPU error lookup, e.g. cublasSetMathMode().

Exceptions
If a CUDA error is detected, this throws a 'runtime_error' exception.

◆ synch()

void fml::card::synch ( )

inline

Synchronize device.

Blocks further GPU execution until the device completes all previously executed kernels.

Implementation Details
Wrapper around GPU synchronize, e.g. cudaDeviceSynchronize().

Exceptions
If a CUDA error is detected, this throws a 'runtime_error' exception.

The documentation for this class was generated from the following file:

fml/src/fml/gpu/card.hh

Public Member Functions

Protected Attributes

Detailed Description

Constructor & Destructor Documentation

◆ card()

Member Function Documentation

◆ check()

◆ get_id()

◆ info()

◆ mem_alloc()

◆ mem_cpu2gpu()

◆ mem_free()

◆ mem_gpu2cpu()

◆ mem_gpu2gpu()

◆ mem_set()

◆ set()

◆ set_math_mode()

◆ synch()