Evelyne Ringoot

GPU and HPC performance for large-scale linear algebra

We make GPU-accelerated large-scale linear algebra fast, easy and portable. I am a PhD candidate at MIT Mathematics and Computational Science advised by professor Alan Edelman in the JuliaLab. I optimize agnostic GPU kernels for singular value decomposition for performance beyond vendor-optimized libraries, for laptop GPUs and HPC multicore hardware.

Bidiagonalization of banded matrices

In the past, bidiagonalization through bulge-chasing was believed to be a CPU-only algorithm since it is memory bound. Not anymore. Low-level GPU memory has increased and we present the first GPU algorithm for reducing a banded matrix to bidiagonal form, ouperforming HPC libraries PLASMA and SLATE by orders of magnitude.

Unified GPU Kernels for the SVD

High-level level HPC libraries typically rely on low-level hardware-optimized functions. We show that the performance of hardware-specialized functions can be matched or exceeded using abstract functions, through hyperparemeter optimization by data precision and hardware.

Unified Recursive TRMM and TRSM

A single high-level recursive implementation of triangular matrix-matrix multiplication (TRMM) and triangular solve (TRSM) for all hardware and data types that leverages more general matrix-matrix multiplications (GEMM) through recursions, achieves performance in line with hardware-optimized functions.