*NMF-mGPU* implements the *Non-negative Matrix Factorization*
(*NMF*) algorithm by making use of *Graphics Processing Units*
(*GPUs*). NMF takes an input matrix (**V**) and returns two matrices,
**W** and **H**, whose product is equal to the former (i.e.,
**V** ≈ **W** ∗ **H**).
If **V** has *n* rows and *m* columns, then dimensions for **W**
and **H**, will be *n* × *k* and *k* × *m*,
respectively. The *factorization rank* ("*k*") specified by the user, is usually a value much less
than both, *n* and *m*.

This software has been developed using the NVIDIA's *CUDA* (*Compute Unified Device Architecture*) framework for GPU Computing.
*CUDA* represents a GPU device as a programmable general-purpose *coprocessor* able to perform linear-algebra
operations.

On detached devices with low on-board memory available, large datasets can be **blockwise transferred**
from the CPU's main memory to the GPU's memory and processed accordingly. In addition, *NMF-mGPU* has been
explicitly optimized for the different existing CUDA architectures.

Finally, *NMF-mGPU* also provides a *multi-GPU* version that makes use of multiple GPU devices through
the *MPI* (*Message Passing Interface*) standard.

### If you use this software, please cite the following work:

E. Mejía-Roa, D. Tabas-Madrid, J. Setoain, C. García, F. Tirado and A. Pascual-Montano. **NMF-mGPU: Non-negative
matrix factorization on multi-GPU systems**. *BMC Bioinformatics* 2015, **16**:43.
doi:10.1186/s12859-015-0485-4
[http://www.biomedcentral.com/1471-2105/16/43]