Getting Started
Building
The CEED library, libceed, is a C99 library with no required dependencies, and with Fortran, Python, Julia, and Rust interfaces.
It can be built using:
$ make
or, with optimization flags:
$ make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast'
These optimization flags are used by all languages (C, C++, Fortran) and this makefile variable can also be set for testing and examples (below).
The library attempts to automatically detect support for the AVX instruction set using gcc-style compiler options for the host. Support may need to be manually specified via:
$ make AVX=1
or:
$ make AVX=0
if your compiler does not support gcc-style options, if you are cross compiling, etc.
To enable CUDA support, add CUDA_DIR=/opt/cuda or an appropriate directory to your make invocation.
To enable HIP support, add ROCM_DIR=/opt/rocm or an appropriate directory.
To enable SYCL support, add SYCL_DIR=/opt/sycl or an appropriate directory.
Note that SYCL backends require building with oneAPI compilers as well:
$ . /opt/intel/oneapi/setvars.sh
$ make SYCL_DIR=/opt/intel/oneapi/compiler/latest/linux SYCLCXX=icpx CC=icx CXX=icpx
The library can be configured for host applications which use OpenMP paralellism via:
$ make OPENMP=1
which will allow operators created and applied from different threads inside an omp parallel region.
To store these or other arguments as defaults for future invocations of make, use:
$ make configure CUDA_DIR=/usr/local/cuda ROCM_DIR=/opt/rocm OPT='-O3 -march=znver2'
which stores these variables in config.mk.
WebAssembly
libCEED can be built for WASM using Emscripten. For example, one can build the library and run a standalone WASM executable using
$ emmake make build/ex2-surface.wasm
$ wasmer build/ex2-surface.wasm -- -s 200000
Additional Language Interfaces
The Fortran interface is built alongside the library automatically.
Python users can install using:
$ pip install libceed
or in a clone of the repository via pip install ..
Julia users can install using:
$ julia
julia> ]
pkg> add LibCEED
See the LibCEED.jl documentation for more information.
Rust users can include libCEED via Cargo.toml:
[dependencies]
libceed = "0.12.0"
See the Cargo documentation for details.
Testing
The test suite produces TAP output and is run by:
$ make test
or, using the prove tool distributed with Perl (recommended):
$ make prove
Backends
There are multiple supported backends, which can be selected at runtime in the examples:
CEED resource |
Backend |
Deterministic Capable |
|---|---|---|
CPU Native |
||
|
Serial reference implementation |
Yes |
|
Blocked reference implementation |
Yes |
|
Serial optimized C implementation |
Yes |
|
Blocked optimized C implementation |
Yes |
|
Serial AVX implementation |
Yes |
|
Blocked AVX implementation |
Yes |
CPU Valgrind |
||
|
Memcheck backends, undefined value checks |
Yes |
CPU LIBXSMM |
||
|
Serial LIBXSMM implementation |
Yes |
|
Blocked LIBXSMM implementation |
Yes |
CUDA Native |
||
|
Reference pure CUDA kernels |
Yes |
|
Optimized pure CUDA kernels using shared memory |
Yes |
|
Optimized pure CUDA kernels using code generation |
No |
HIP Native |
||
|
Reference pure HIP kernels |
Yes |
|
Optimized pure HIP kernels using shared memory |
Yes |
|
Optimized pure HIP kernels using code generation |
No |
SYCL Native |
||
|
Reference pure SYCL kernels |
Yes |
|
Optimized pure SYCL kernels using shared memory |
Yes |
MAGMA |
||
|
CUDA MAGMA kernels |
No |
|
CUDA MAGMA kernels |
Yes |
|
HIP MAGMA kernels |
No |
|
HIP MAGMA kernels |
Yes |
The /cpu/self/*/serial backends process one element at a time and are intended for meshes with a smaller number of high order elements.
The /cpu/self/*/blocked backends process blocked batches of eight interlaced elements and are intended for meshes with higher numbers of elements.
The /cpu/self/ref/* backends are written in pure C and provide basic functionality.
The /cpu/self/opt/* backends are written in pure C and use partial e-vectors to improve performance.
The /cpu/self/avx/* backends rely upon AVX instructions to provide vectorized CPU performance.
The /cpu/self/memcheck/* backends rely upon the Valgrind Memcheck tool to help verify that user QFunctions have no undefined values.
To use, run your code with Valgrind and the Memcheck backends, e.g. valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck.
A ‘development’ or ‘debugging’ version of Valgrind with headers is required to use this backend.
This backend can be run in serial or blocked mode and defaults to running in the serial mode if /cpu/self/memcheck is selected at runtime.
The /cpu/self/xsmm/* backends rely upon the LIBXSMM package to provide vectorized CPU performance.
If linking MKL and LIBXSMM is desired but the Makefile is not detecting MKLROOT, linking libCEED against MKL can be forced by setting the environment variable MKL=1.
The LIBXSMM main development branch from 7 April 2024 or newer is required.
The /gpu/cuda/* backends provide GPU performance strictly using CUDA.
The /gpu/hip/* backends provide GPU performance strictly using HIP.
They are based on the /gpu/cuda/* backends.
ROCm version 4.2 or newer is required.
The /gpu/hip/* backends can also run on non-AMD GPUs (e.g., Intel) via chipStar, which implements HIP on top of SPIR-V through Level Zero or OpenCL.
To build against chipStar, set HIP_DIR to the chipStar install prefix (in place of ROCM_DIR); libCEED’s Makefile detects chipStar by inspecting hipconfig and automatically enables the required code paths.
At runtime, chipStar’s own environment variables (e.g., CHIP_BE=level0 or CHIP_BE=opencl, CHIP_DEVICE_TYPE, CHIP_PLATFORM) select the backend and device — see the chipStar documentation for details.
The /gpu/sycl/* backends provide GPU performance strictly using SYCL.
They are based on the /gpu/cuda/* and /gpu/hip/* backends.
The /gpu/*/magma/* backends rely upon the MAGMA package.
To enable the MAGMA backends, the environment variable MAGMA_DIR must point to the top-level MAGMA directory, with the MAGMA library located in $(MAGMA_DIR)/lib/.
By default, MAGMA_DIR is set to ../magma; to build the MAGMA backends with a MAGMA installation located elsewhere, create a link to magma/ in libCEED’s parent directory, or set MAGMA_DIR to the proper location.
MAGMA version 2.5.0 or newer is required.
Currently, each MAGMA library installation is only built for either CUDA or HIP.
The corresponding set of libCEED backends (/gpu/cuda/magma/* or /gpu/hip/magma/*) will automatically be built for the version of the MAGMA library found in MAGMA_DIR.
Users can specify a device for all CUDA, HIP, and MAGMA backends through adding :device_id=# after the resource name.
For example:
/gpu/cuda/gen:device_id=1
Bit-for-bit reproducibility is important in some applications.
However, some libCEED backends use non-deterministic operations, such as atomicAdd for increased performance.
The backends which are capable of generating reproducible results, with the proper compilation options, are highlighted in the list above.