Ginkgo  Generated from pipelines/1068515030 branch based on master. Ginkgo version 1.7.0
A numerical linear algebra library targeting many-core architectures
The custom-matrix-format program

The custom matrix format example..

This example depends on simple-solver, poisson-solver, three-pt-stencil-solver, .

Table of contents
  1. Introduction
  2. The commented program
  1. Results
  2. The plain program

Introduction

This example solves a 1D Poisson equation:

$ u : [0, 1] \rightarrow R\\ u'' = f\\ u(0) = u0\\ u(1) = u1 $

using a finite difference method on an equidistant grid with K discretization points (K can be controlled with a command line parameter). The discretization is done via the second order Taylor polynomial:

$ u(x + h) = u(x) - u'(x)h + 1/2 u''(x)h^2 + O(h^3)\\ u(x - h) = u(x) + u'(x)h + 1/2 u''(x)h^2 + O(h^3) / +\\ ---------------------- \\ -u(x - h) + 2u(x) + -u(x + h) = -f(x)h^2 + O(h^3) $

For an equidistant grid with K "inner" discretization points $x1, ..., xk, $and step size $ h = 1 / (K + 1)$, the formula produces a system of linear equations

$ 2u_1 - u_2 = -f_1 h^2 + u0\\ -u_(k-1) + 2u_k - u_(k+1) = -f_k h^2, k = 2, ..., K - 1\\ -u_(K-1) + 2u_K = -f_K h^2 + u1\\ $

which is then solved using Ginkgo's implementation of the CG method preconditioned with block-Jacobi. It is also possible to specify on which executor Ginkgo will solve the system via the command line. The function $`f` $is set to $`f(x) = 6x`$ (making the solution $`u(x) = x^3`$), but that can be changed in the main function.

The intention of this example is to show how a custom linear operator can be created and integrated into Ginkgo to achieve performance benefits.

About the example

The commented program

#include <iostream>
#include <map>
#include <string>
#include <omp.h>
#include <ginkgo/ginkgo.hpp>

A CUDA kernel implementing the stencil, which will be used if running on the CUDA executor. Unfortunately, NVCC has serious problems interpreting some parts of Ginkgo's code, so the kernel has to be compiled separately.

template <typename ValueType>
void stencil_kernel(std::size_t size, const ValueType* coefs,
const ValueType* b, ValueType* x);

A stencil matrix class representing the 3pt stencil linear operator. We include the gko::EnableLinOp mixin which implements the entire LinOp interface, except the two apply_impl methods, which get called inside the default implementation of apply (after argument verification) to perform the actual application of the linear operator. In addition, it includes the implementation of the entire PolymorphicObject interface.

It also includes the gko::EnableCreateMethod mixin which provides a default implementation of the static create method. This method will forward all its arguments to the constructor to create the object, and return an std::unique_ptr to the created object.

template <typename ValueType>
class StencilMatrix : public gko::EnableLinOp<StencilMatrix<ValueType>>,
public gko::EnableCreateMethod<StencilMatrix<ValueType>> {
public:

This constructor will be called by the create method. Here we initialize the coefficients of the stencil.

StencilMatrix(std::shared_ptr<const gko::Executor> exec,
gko::size_type size = 0, ValueType left = -1.0,
ValueType center = 2.0, ValueType right = -1.0)
: gko::EnableLinOp<StencilMatrix>(exec, gko::dim<2>{size}),
coefficients(exec, {left, center, right})
{}
protected:
using coef_type = gko::array<ValueType>;

Here we implement the application of the linear operator, x = A * b. apply_impl will be called by the apply method, after the arguments have been moved to the correct executor and the operators checked for conforming sizes.

For simplicity, we assume that there is always only one right hand side and the stride of consecutive elements in the vectors is 1 (both of these are always true in this example).

void apply_impl(const gko::LinOp* b, gko::LinOp* x) const override
{

we only implement the operator for dense RHS. gko::as will throw an exception if its argument is not Dense.

auto dense_b = gko::as<vec>(b);
auto dense_x = gko::as<vec>(x);

we need separate implementations depending on the executor, so we create an operation which maps the call to the correct implementation

struct stencil_operation : gko::Operation {
stencil_operation(const coef_type& coefficients, const vec* b,
vec* x)
: coefficients{coefficients}, b{b}, x{x}
{}

OpenMP implementation

void run(std::shared_ptr<const gko::OmpExecutor>) const override
{
auto b_values = b->get_const_values();
auto x_values = x->get_values();
#pragma omp parallel for
for (std::size_t i = 0; i < x->get_size()[0]; ++i) {
auto coefs = coefficients.get_const_data();
auto result = coefs[1] * b_values[i];
if (i > 0) {
result += coefs[0] * b_values[i - 1];
}
if (i < x->get_size()[0] - 1) {
result += coefs[2] * b_values[i + 1];
}
x_values[i] = result;
}
}

CUDA implementation

void run(std::shared_ptr<const gko::CudaExecutor>) const override
{
stencil_kernel(x->get_size()[0], coefficients.get_const_data(),
b->get_const_values(), x->get_values());
}

We do not provide an implementation for reference executor. If not provided, Ginkgo will use the implementation for the OpenMP executor when calling it in the reference executor.

const coef_type& coefficients;
const vec* b;
vec* x;
};
this->get_executor()->run(
stencil_operation(coefficients, dense_b, dense_x));
}

There is also a version of the apply function which does the operation x = alpha * A * b + beta * x. This function is commonly used and can often be better optimized than implementing it using x = A * b. However, for simplicity, we will implement it exactly like that in this example.

void apply_impl(const gko::LinOp* alpha, const gko::LinOp* b,
const gko::LinOp* beta, gko::LinOp* x) const override
{
auto dense_b = gko::as<vec>(b);
auto dense_x = gko::as<vec>(x);
auto tmp_x = dense_x->clone();
this->apply_impl(b, tmp_x.get());
dense_x->scale(beta);
dense_x->add_scaled(alpha, tmp_x);
}
private:
coef_type coefficients;
};

Creates a stencil matrix in CSR format for the given number of discretization points.

template <typename ValueType, typename IndexType>
void generate_stencil_matrix(gko::matrix::Csr<ValueType, IndexType>* matrix)
{
const auto discretization_points = matrix->get_size()[0];
auto row_ptrs = matrix->get_row_ptrs();
auto col_idxs = matrix->get_col_idxs();
auto values = matrix->get_values();
IndexType pos = 0;
const ValueType coefs[] = {-1, 2, -1};
row_ptrs[0] = pos;
for (int i = 0; i < discretization_points; ++i) {
for (auto ofs : {-1, 0, 1}) {
if (0 <= i + ofs && i + ofs < discretization_points) {
values[pos] = coefs[ofs + 1];
col_idxs[pos] = i + ofs;
++pos;
}
}
row_ptrs[i + 1] = pos;
}
}

Generates the RHS vector given f and the boundary conditions.

template <typename Closure, typename ValueType>
void generate_rhs(Closure f, ValueType u0, ValueType u1,
{
const auto discretization_points = rhs->get_size()[0];
auto values = rhs->get_values();
const ValueType h = 1.0 / (discretization_points + 1);
for (int i = 0; i < discretization_points; ++i) {
const ValueType xi = ValueType(i + 1) * h;
values[i] = -f(xi) * h * h;
}
values[0] += u0;
values[discretization_points - 1] += u1;
}

Prints the solution u.

template <typename ValueType>
void print_solution(ValueType u0, ValueType u1,
{
std::cout << u0 << '\n';
for (int i = 0; i < u->get_size()[0]; ++i) {
std::cout << u->get_const_values()[i] << '\n';
}
std::cout << u1 << std::endl;
}

Computes the 1-norm of the error given the computed u and the correct solution function correct_u.

template <typename Closure, typename ValueType>
double calculate_error(int discretization_points,
Closure correct_u)
{
const auto h = 1.0 / (discretization_points + 1);
auto error = 0.0;
for (int i = 0; i < discretization_points; ++i) {
using std::abs;
const auto xi = (i + 1) * h;
error +=
abs(u->get_const_values()[i] - correct_u(xi)) / abs(correct_u(xi));
}
return error;
}
int main(int argc, char* argv[])
{

Some shortcuts

using ValueType = double;
using RealValueType = gko::remove_complex<ValueType>;
using IndexType = int;

Figure out where to run the code

if (argc == 2 && (std::string(argv[1]) == "--help")) {
std::cerr << "Usage: " << argv[0] << " [executor]" << std::endl;
std::exit(-1);
}
const auto executor_string = argc >= 2 ? argv[1] : "reference";
const unsigned int discretization_points =
argc >= 3 ? std::atoi(argv[2]) : 100u;
std::map<std::string, std::function<std::shared_ptr<gko::Executor>()>>
exec_map{
{"omp", [] { return gko::OmpExecutor::create(); }},
{"cuda",
[] {
}},
{"hip",
[] {
}},
{"dpcpp",
[] {
}},
{"reference", [] { return gko::ReferenceExecutor::create(); }}};

executor where Ginkgo will perform the computation

const auto exec = exec_map.at(executor_string)(); // throws if not valid

executor used by the application

const auto app_exec = exec->get_master();

problem:

auto correct_u = [](ValueType x) { return x * x * x; };
auto f = [](ValueType x) { return ValueType{6} * x; };
auto u0 = correct_u(0);
auto u1 = correct_u(1);

initialize vectors

auto rhs = vec::create(app_exec, gko::dim<2>(discretization_points, 1));
generate_rhs(f, u0, u1, rhs.get());
auto u = vec::create(app_exec, gko::dim<2>(discretization_points, 1));
for (int i = 0; i < u->get_size()[0]; ++i) {
u->get_values()[i] = 0.0;
}
const RealValueType reduction_factor{1e-7};

Generate solver and solve the system

cg::build()
.with_criteria(
gko::stop::Iteration::build().with_max_iters(discretization_points),
reduction_factor))
.on(exec)

notice how our custom StencilMatrix can be used in the same way as any built-in type

->generate(StencilMatrix<ValueType>::create(exec, discretization_points,
-1, 2, -1))
->apply(rhs, u);
std::cout << "\nSolve complete."
<< "\nThe average relative error is "
<< calculate_error(discretization_points, u.get(), correct_u) /
discretization_points
<< std::endl;
}

Results

Comments about programming and debugging

The plain program

/*******************************<GINKGO LICENSE>******************************
Copyright (c) 2017-2023, the Ginkgo authors
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
******************************<GINKGO LICENSE>*******************************/
#include <iostream>
#include <map>
#include <string>
#include <omp.h>
#include <ginkgo/ginkgo.hpp>
template <typename ValueType>
void stencil_kernel(std::size_t size, const ValueType* coefs,
const ValueType* b, ValueType* x);
template <typename ValueType>
class StencilMatrix : public gko::EnableLinOp<StencilMatrix<ValueType>>,
public gko::EnableCreateMethod<StencilMatrix<ValueType>> {
public:
StencilMatrix(std::shared_ptr<const gko::Executor> exec,
gko::size_type size = 0, ValueType left = -1.0,
ValueType center = 2.0, ValueType right = -1.0)
: gko::EnableLinOp<StencilMatrix>(exec, gko::dim<2>{size}),
coefficients(exec, {left, center, right})
{}
protected:
using coef_type = gko::array<ValueType>;
void apply_impl(const gko::LinOp* b, gko::LinOp* x) const override
{
auto dense_b = gko::as<vec>(b);
auto dense_x = gko::as<vec>(x);
struct stencil_operation : gko::Operation {
stencil_operation(const coef_type& coefficients, const vec* b,
vec* x)
: coefficients{coefficients}, b{b}, x{x}
{}
void run(std::shared_ptr<const gko::OmpExecutor>) const override
{
auto b_values = b->get_const_values();
auto x_values = x->get_values();
#pragma omp parallel for
for (std::size_t i = 0; i < x->get_size()[0]; ++i) {
auto coefs = coefficients.get_const_data();
auto result = coefs[1] * b_values[i];
if (i > 0) {
result += coefs[0] * b_values[i - 1];
}
if (i < x->get_size()[0] - 1) {
result += coefs[2] * b_values[i + 1];
}
x_values[i] = result;
}
}
void run(std::shared_ptr<const gko::CudaExecutor>) const override
{
stencil_kernel(x->get_size()[0], coefficients.get_const_data(),
b->get_const_values(), x->get_values());
}
const coef_type& coefficients;
const vec* b;
vec* x;
};
this->get_executor()->run(
stencil_operation(coefficients, dense_b, dense_x));
}
void apply_impl(const gko::LinOp* alpha, const gko::LinOp* b,
const gko::LinOp* beta, gko::LinOp* x) const override
{
auto dense_b = gko::as<vec>(b);
auto dense_x = gko::as<vec>(x);
auto tmp_x = dense_x->clone();
this->apply_impl(b, tmp_x.get());
dense_x->scale(beta);
dense_x->add_scaled(alpha, tmp_x);
}
private:
coef_type coefficients;
};
template <typename ValueType, typename IndexType>
void generate_stencil_matrix(gko::matrix::Csr<ValueType, IndexType>* matrix)
{
const auto discretization_points = matrix->get_size()[0];
auto row_ptrs = matrix->get_row_ptrs();
auto col_idxs = matrix->get_col_idxs();
auto values = matrix->get_values();
IndexType pos = 0;
const ValueType coefs[] = {-1, 2, -1};
row_ptrs[0] = pos;
for (int i = 0; i < discretization_points; ++i) {
for (auto ofs : {-1, 0, 1}) {
if (0 <= i + ofs && i + ofs < discretization_points) {
values[pos] = coefs[ofs + 1];
col_idxs[pos] = i + ofs;
++pos;
}
}
row_ptrs[i + 1] = pos;
}
}
template <typename Closure, typename ValueType>
void generate_rhs(Closure f, ValueType u0, ValueType u1,
{
const auto discretization_points = rhs->get_size()[0];
auto values = rhs->get_values();
const ValueType h = 1.0 / (discretization_points + 1);
for (int i = 0; i < discretization_points; ++i) {
const ValueType xi = ValueType(i + 1) * h;
values[i] = -f(xi) * h * h;
}
values[0] += u0;
values[discretization_points - 1] += u1;
}
template <typename ValueType>
void print_solution(ValueType u0, ValueType u1,
{
std::cout << u0 << '\n';
for (int i = 0; i < u->get_size()[0]; ++i) {
std::cout << u->get_const_values()[i] << '\n';
}
std::cout << u1 << std::endl;
}
template <typename Closure, typename ValueType>
double calculate_error(int discretization_points,
Closure correct_u)
{
const auto h = 1.0 / (discretization_points + 1);
auto error = 0.0;
for (int i = 0; i < discretization_points; ++i) {
using std::abs;
const auto xi = (i + 1) * h;
error +=
abs(u->get_const_values()[i] - correct_u(xi)) / abs(correct_u(xi));
}
return error;
}
int main(int argc, char* argv[])
{
using ValueType = double;
using RealValueType = gko::remove_complex<ValueType>;
using IndexType = int;
if (argc == 2 && (std::string(argv[1]) == "--help")) {
std::cerr << "Usage: " << argv[0] << " [executor]" << std::endl;
std::exit(-1);
}
const auto executor_string = argc >= 2 ? argv[1] : "reference";
const unsigned int discretization_points =
argc >= 3 ? std::atoi(argv[2]) : 100u;
std::map<std::string, std::function<std::shared_ptr<gko::Executor>()>>
exec_map{
{"omp", [] { return gko::OmpExecutor::create(); }},
{"cuda",
[] {
}},
{"hip",
[] {
}},
{"dpcpp",
[] {
}},
{"reference", [] { return gko::ReferenceExecutor::create(); }}};
const auto exec = exec_map.at(executor_string)(); // throws if not valid
const auto app_exec = exec->get_master();
auto correct_u = [](ValueType x) { return x * x * x; };
auto f = [](ValueType x) { return ValueType{6} * x; };
auto u0 = correct_u(0);
auto u1 = correct_u(1);
auto rhs = vec::create(app_exec, gko::dim<2>(discretization_points, 1));
generate_rhs(f, u0, u1, rhs.get());
auto u = vec::create(app_exec, gko::dim<2>(discretization_points, 1));
for (int i = 0; i < u->get_size()[0]; ++i) {
u->get_values()[i] = 0.0;
}
const RealValueType reduction_factor{1e-7};
cg::build()
.with_criteria(
gko::stop::Iteration::build().with_max_iters(discretization_points),
reduction_factor))
.on(exec)
->generate(StencilMatrix<ValueType>::create(exec, discretization_points,
-1, 2, -1))
->apply(rhs, u);
std::cout << "\nSolve complete."
<< "\nThe average relative error is "
<< calculate_error(discretization_points, u.get(), correct_u) /
discretization_points
<< std::endl;
}
gko::matrix::Csr::get_col_idxs
index_type * get_col_idxs() noexcept
Returns the column indexes of the matrix.
Definition: csr.hpp:897
gko::matrix::Csr
CSR is a matrix format which stores only the nonzero coefficients by compressing each row of the matr...
Definition: matrix.hpp:54
gko::LinOp
Definition: lin_op.hpp:146
gko::matrix::Dense
Dense is a matrix format which explicitly stores all values of the matrix.
Definition: dense_cache.hpp:48
gko::EnableCreateMethod
This mixin implements a static create() method on ConcreteType that dynamically allocates the memory,...
Definition: polymorphic_object.hpp:776
gko::size_type
std::size_t size_type
Integral type used for allocation quantities.
Definition: types.hpp:120
gko::abs
constexpr xstd::enable_if_t<!is_complex_s< T >::value, T > abs(const T &x)
Returns the absolute value of the object.
Definition: math.hpp:1104
gko::HipExecutor::create
static std::shared_ptr< HipExecutor > create(int device_id, std::shared_ptr< Executor > master, bool device_reset, allocation_mode alloc_mode=default_hip_alloc_mode, CUstream_st *stream=nullptr)
Creates a new HipExecutor.
gko
The Ginkgo namespace.
Definition: abstract_factory.hpp:48
gko::array
An array is a container which encapsulates fixed-sized arrays, stored on the Executor tied to the arr...
Definition: array.hpp:55
gko::stop::ResidualNorm
The ResidualNorm class is a stopping criterion which stops the iteration process when the actual resi...
Definition: residual_norm.hpp:138
gko::matrix::Csr::get_row_ptrs
index_type * get_row_ptrs() noexcept
Returns the row pointers of the matrix.
Definition: csr.hpp:916
gko::dim< 2 >
gko::solver::Cg
CG or the conjugate gradient method is an iterative type Krylov subspace method which is suitable for...
Definition: cg.hpp:74
gko::CudaExecutor::create
static std::shared_ptr< CudaExecutor > create(int device_id, std::shared_ptr< Executor > master, bool device_reset, allocation_mode alloc_mode=default_cuda_alloc_mode, CUstream_st *stream=nullptr)
Creates a new CudaExecutor.
gko::solver::initial_guess_mode::rhs
the input is right hand side
gko::OmpExecutor::create
static std::shared_ptr< OmpExecutor > create(std::shared_ptr< CpuAllocatorBase > alloc=std::make_shared< CpuAllocator >())
Creates a new OmpExecutor.
Definition: executor.hpp:1373
gko::matrix::Dense::get_const_values
const value_type * get_const_values() const noexcept
Returns a pointer to the array of values of the matrix.
Definition: dense.hpp:865
gko::LinOp::get_size
const dim< 2 > & get_size() const noexcept
Returns the size of the operator.
Definition: lin_op.hpp:239
gko::remove_complex
typename detail::remove_complex_s< T >::type remove_complex
Obtain the type which removed the complex of complex/scalar type or the template parameter of class b...
Definition: math.hpp:354
gko::DpcppExecutor::create
static std::shared_ptr< DpcppExecutor > create(int device_id, std::shared_ptr< Executor > master, std::string device_type="all", dpcpp_queue_property property=dpcpp_queue_property::in_order)
Creates a new DpcppExecutor.
gko::EnableLinOp
The EnableLinOp mixin can be used to provide sensible default implementations of the majority of the ...
Definition: lin_op.hpp:906
gko::Operation
Operations can be used to define functionalities whose implementations differ among devices.
Definition: executor.hpp:287
gko::matrix::Csr::get_values
value_type * get_values() noexcept
Returns the values of the matrix.
Definition: csr.hpp:878