Getting Started

Building

The CEED library, libceed, is a C99 library with no required dependencies, and with Fortran, Python, Julia, and Rust interfaces. It can be built using:

$ make

or, with optimization flags:

$ make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast'

These optimization flags are used by all languages (C, C++, Fortran) and this makefile variable can also be set for testing and examples (below).

The library attempts to automatically detect support for the AVX instruction set using gcc-style compiler options for the host. Support may need to be manually specified via:

$ make AVX=1

or:

$ make AVX=0

if your compiler does not support gcc-style options, if you are cross compiling, etc.

To enable CUDA support, add CUDA_DIR=/opt/cuda or an appropriate directory to your make invocation. To enable HIP support, add ROCM_DIR=/opt/rocm or an appropriate directory. To enable SYCL support, add SYCL_DIR=/opt/sycl or an appropriate directory. Note that SYCL backends require building with oneAPI compilers as well:

$ . /opt/intel/oneapi/setvars.sh
$ make SYCL_DIR=/opt/intel/oneapi/compiler/latest/linux SYCLCXX=icpx CC=icx CXX=icpx

The library can be configured for host applications which use OpenMP paralellism via:

$ make OPENMP=1

which will allow operators created and applied from different threads inside an omp parallel region.

To store these or other arguments as defaults for future invocations of make, use:

$ make configure CUDA_DIR=/usr/local/cuda ROCM_DIR=/opt/rocm OPT='-O3 -march=znver2'

which stores these variables in config.mk.

WebAssembly

libCEED can be built for WASM using Emscripten. For example, one can build the library and run a standalone WASM executable using

$ emmake make build/ex2-surface.wasm
$ wasmer build/ex2-surface.wasm -- -s 200000

Additional Language Interfaces

The Fortran interface is built alongside the library automatically.

Python users can install using:

$ pip install libceed

or in a clone of the repository via pip install ..

Julia users can install using:

$ julia
julia> ]
pkg> add LibCEED

See the LibCEED.jl documentation for more information.

Rust users can include libCEED via Cargo.toml:

[dependencies]
libceed = "0.12.0"

See the Cargo documentation for details.

Testing

The test suite produces TAP output and is run by:

$ make test

or, using the prove tool distributed with Perl (recommended):

$ make prove

Backends

There are multiple supported backends, which can be selected at runtime in the examples:

CEED resource	Backend	Deterministic Capable

CPU Native
`/cpu/self/ref/serial`	Serial reference implementation	Yes
`/cpu/self/ref/blocked`	Blocked reference implementation	Yes
`/cpu/self/opt/serial`	Serial optimized C implementation	Yes
`/cpu/self/opt/blocked`	Blocked optimized C implementation	Yes
`/cpu/self/avx/serial`	Serial AVX implementation	Yes
`/cpu/self/avx/blocked`	Blocked AVX implementation	Yes

CPU Valgrind
`/cpu/self/memcheck/*`	Memcheck backends, undefined value checks	Yes

CPU LIBXSMM
`/cpu/self/xsmm/serial`	Serial LIBXSMM implementation	Yes
`/cpu/self/xsmm/blocked`	Blocked LIBXSMM implementation	Yes

CUDA Native
`/gpu/cuda/ref`	Reference pure CUDA kernels	Yes
`/gpu/cuda/shared`	Optimized pure CUDA kernels using shared memory	Yes
`/gpu/cuda/gen`	Optimized pure CUDA kernels using code generation	No

HIP Native
`/gpu/hip/ref`	Reference pure HIP kernels	Yes
`/gpu/hip/shared`	Optimized pure HIP kernels using shared memory	Yes
`/gpu/hip/gen`	Optimized pure HIP kernels using code generation	No

SYCL Native
`/gpu/sycl/ref`	Reference pure SYCL kernels	Yes
`/gpu/sycl/shared`	Optimized pure SYCL kernels using shared memory	Yes

MAGMA
`/gpu/cuda/magma`	CUDA MAGMA kernels	No
`/gpu/cuda/magma/det`	CUDA MAGMA kernels	Yes
`/gpu/hip/magma`	HIP MAGMA kernels	No
`/gpu/hip/magma/det`	HIP MAGMA kernels	Yes

The /cpu/self/*/serial backends process one element at a time and are intended for meshes with a smaller number of high order elements. The /cpu/self/*/blocked backends process blocked batches of eight interlaced elements and are intended for meshes with higher numbers of elements.

The /cpu/self/ref/* backends are written in pure C and provide basic functionality.

The /cpu/self/opt/* backends are written in pure C and use partial e-vectors to improve performance.

The /cpu/self/avx/* backends rely upon AVX instructions to provide vectorized CPU performance.

The /cpu/self/memcheck/* backends rely upon the Valgrind Memcheck tool to help verify that user QFunctions have no undefined values. To use, run your code with Valgrind and the Memcheck backends, e.g. valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck. A ‘development’ or ‘debugging’ version of Valgrind with headers is required to use this backend. This backend can be run in serial or blocked mode and defaults to running in the serial mode if /cpu/self/memcheck is selected at runtime.

The /cpu/self/xsmm/* backends rely upon the LIBXSMM package to provide vectorized CPU performance. If linking MKL and LIBXSMM is desired but the Makefile is not detecting MKLROOT, linking libCEED against MKL can be forced by setting the environment variable MKL=1. The LIBXSMM version 2.0 or newer is required.

The /gpu/cuda/* backends provide GPU performance strictly using CUDA.

The /gpu/hip/* backends provide GPU performance strictly using HIP. They are based on the /gpu/cuda/* backends. ROCm version 4.2 or newer is required.

The /gpu/hip/* backends can also run on non-AMD GPUs (e.g., Intel) via chipStar, which implements HIP on top of SPIR-V through Level Zero or OpenCL. To build against chipStar, set HIP_DIR to the chipStar install prefix (in place of ROCM_DIR); libCEED’s Makefile detects chipStar by inspecting hipconfig and automatically enables the required code paths. At runtime, chipStar’s own environment variables (e.g., CHIP_BE=level0 or CHIP_BE=opencl, CHIP_DEVICE_TYPE, CHIP_PLATFORM) select the backend and device — see the chipStar documentation for details.

The /gpu/sycl/* backends provide GPU performance strictly using SYCL. They are based on the /gpu/cuda/* and /gpu/hip/* backends.

The /gpu/*/magma/* backends rely upon the MAGMA package. To enable the MAGMA backends, the environment variable MAGMA_DIR must point to the top-level MAGMA directory, with the MAGMA library located in $(MAGMA_DIR)/lib/. By default, MAGMA_DIR is set to ../magma; to build the MAGMA backends with a MAGMA installation located elsewhere, create a link to magma/ in libCEED’s parent directory, or set MAGMA_DIR to the proper location. MAGMA version 2.5.0 or newer is required. Currently, each MAGMA library installation is only built for either CUDA or HIP. The corresponding set of libCEED backends (/gpu/cuda/magma/* or /gpu/hip/magma/*) will automatically be built for the version of the MAGMA library found in MAGMA_DIR.

Users can specify a device for all CUDA, HIP, and MAGMA backends through adding :device_id=# after the resource name. For example:

/gpu/cuda/gen:device_id=1

Bit-for-bit reproducibility is important in some applications. However, some libCEED backends use non-deterministic operations, such as atomicAdd for increased performance. The backends which are capable of generating reproducible results, with the proper compilation options, are highlighted in the list above.