fml
0.1-0
Fused Matrix Library
|
fml is the Fused Matrix Library, a multi-source, header-only C++ library for dense matrix computing. The emphasis is on real-valued matrix types (float
, double
, and __half
) for numerical operations useful for data analysis.
The goal of fml is to be "medium-level". That is, high-level compared to working directly with e.g. the BLAS or CUDA™, but low(er)-level compared to other C++ matrix frameworks. Some knowledge of the use of LAPACK will make many choices in fml make more sense.
The library provides 4 main classes: cpumat
, gpumat
, parmat
, and mpimat
. These are mostly what they sound like, but the particular details are:
There are some differences in how objects of any particular type are constructed. But the high level APIs are largely the same between the objects. The goal is to be able to quickly create laptop-scale prototypes that are then easily converted into large scale gpu/multi-node/multi-gpu/multi-node+multi-gpu codes.
The library is header-only so no installation is strictly necessary. You can just include a copy/submodule in your project. However, if you want some analogue of make install
, then you could do something like:
There are no external header dependencies, but there are some shared libraries you need to have (more information below):
Other software we use:
tests/
).You can find some examples of how to use the library in the examples/
tree. Right now there is no real build system beyond some ad hoc makefiles; but ad hoc is better than no hoc.
Depending on which class(es) you want to use, here are some general guidelines for using the library in your own project:
cpumat
gpumat
nvcc
.nvcc
; an ordinary C++ compiler will do). If you have CUDA installed and do not know what to link with, there is no harm in linking with all of these.mpimat
mpicxx
.parmat
mpicxx
.parmat_cpu
; link with GPU stuff if using parmat_gpu
(you can use both).Check the makefiles in the examples/
tree if none of that makes sense.
Here's a simple example computing the SVD with some data held on a single CPU:
Save as svd.cpp
and build with:
You should see output like
The API is largely the same if we change the object storage, but we have to change the object initialization. For example, if x
is an object of class mpimat
, we still call linalg::svd(x, s)
. The differences lie in the creation of the objects. Here is how we might change the above example to use distributed data:
In practice, using such small block sizes for an MPI matrix is probably not a good idea; we only do so for the sake of demonstration (we want each process to own some data). We can build this new example via:
We can launch the example with multiple processes via
And here we see:
tldr:
fml/cpu.hh
fml/gpu.hh
fml/mpi.hh
The project is young and things are still mostly evolving. The current status is:
There are currently "super headers" for CPU (fml/cpu.hh
), GPU (fml/gpu.hh
), and MPI (fml/mpi.hh
) backends. These include all relevant sub-headers. These are "frozen" in the sense that they will not move and will always include everything. However, as more namespaces are added, those too will be included in the super headers. The headers one folder level deep (e.g. those in fml/cpu
) are similarly frozen, although more may be added over time. Headers two folder levels
Internals are evolving and subject to change at basically any time. Notable changes will be mentioned in the changelog.
Internals are evolving and subject to change at basically any time. Notable changes will be mentioned in the changelog.
Some similar C/C++ projects worth mentioning:
These are all great libraries which have stood the test of time. Armadillo in particular is worthy of a look, as it has a very nice interface and very extensive set of functions. However, to my knowledge, all of these focus exclusively on CPU computing. There are some extensions to Armadillo and Eigen for GPU computing. And for gemm-heavy codes, you can use nvblas to offload some work to the GPU, but this doesn't always achieve good performance. And none of the above include distributed computing, except for PETSc which focuses on sparse matrices.
There are probably many other C++ frameworks in this arena, but none to my knowledge have a similar scope to fml.
Probably the biggest influence on my thinking for this library is the pbdR package ecosystem for HPC with the R language, which I have worked on for many years now. Some obvious parallels are:
The basic philosophy of fml is: