Ginkgo  Generated from pipelines/1589998975 branch based on develop. Ginkgo version 1.10.0
A numerical linear algebra library targeting many-core architectures
The batched-solver program

Using and interfacing with a batched solver..

Table of contents
  1. Introduction
  2. The commented program
  1. Results
  2. The plain program

Using batched solvers

This example shows how to use Ginkgo batched solvers with data coming from an application. The "application" in this case is just a function in the example itself; nevertheless, the steps to be taken are shown.

A ‘batch’ here means a set of small linear systems that can be solved independently, but each system is too small to use an entire computing device. A requirement is that all the systems need to have the same sparsity pattern.

The commented program

template <typename InputType>
auto unbatch(const InputType* batch_object)
{
auto unbatched_mats =
std::vector<std::unique_ptr<typename InputType::unbatch_type>>{};
for (size_type b = 0; b < batch_object->get_num_batch_items(); ++b) {
unbatched_mats.emplace_back(
batch_object->create_const_view_for_item(b)->clone());
}
return unbatched_mats;
}
} // namespace detail

'Application' structures and functions

Structure to simulate application data related to the linear systems to be solved.

We use raw pointers below to demonstrate how to handle the situation when the application only gives us raw pointers. Ideally, one should use Ginkgo's gko::Array class here. In this example, we assume that the data is in a format that can directly be given to a batch::matrix::Csr object.

struct ApplSysData {

Number of small systems in the batch.

size_type nsystems;

Number of rows in each system.

int nrows;

Number of non-zeros in each system matrix.

int nnz;

Row pointers for one matrix

const index_type* row_ptrs;

Column indices of non-zeros for one matrix

const index_type* col_idxs;

Nonzero values for all matrices in the batch, concatenated

const value_type* all_values;

RHS vectors for all systems in the batch, concatenated

const value_type* all_rhs;
};
/ *
* Generates a batch of tridiagonal systems.
*
* @param nrows Number of rows in each system.
* @param nsystems Number of systems in the batch.
* @param exec The device executor to use for the solver.
* @note Normally, the application may not deal with Ginkgo executors, nor do
* we need it to. Here, we use the executor for backend-independent device
* memory allocation. The application, for example, might assume Hip (for AMD
* GPUs) and use `hipMalloc` directly.
* /
ApplSysData appl_generate_system(const int nrows, const size_type nsystems,
std::shared_ptr<gko::Executor> exec);

Deallocate application data.

void appl_clean_up(ApplSysData& appl_data, std::shared_ptr<gko::Executor> exec);
int main(int argc, char* argv[])
{

Print ginkgo version information

std::cout << gko::version_info::get() << std::endl;
if (argc == 2 && (std::string(argv[1]) == "--help")) {
std::cerr << "Usage: " << argv[0]
<< " [executor] [num_systems] [num_rows] [print_residuals] "
"[num_reps]"
<< std::endl;
std::exit(-1);
}

Where do you want to run your solver ?

The gko::Executor class is one of the cornerstones of Ginkgo. Currently, we have support for an gko::OmpExecutor, which uses OpenMP multi-threading in most of its kernels, a gko::ReferenceExecutor, a single threaded specialization of the OpenMP executor, gko::CudaExecutor, gko::HipExecutor, gko::DpcppExecutor which runs the code on a NVIDIA, AMD and Intel GPUs, respectively.

Note
With the help of C++, you see that you only ever need to change the executor and all the other functions/ routines within Ginkgo should automatically work and run on the executor with any other changes.
const auto executor_string = argc >= 2 ? argv[1] : "reference";
std::map<std::string, std::function<std::shared_ptr<gko::Executor>()>>
exec_map{
{"omp", [] { return gko::OmpExecutor::create(); }},
{"cuda",
[] {
}},
{"hip",
[] {
}},
{"dpcpp",
[] {
}},
{"reference", [] { return gko::ReferenceExecutor::create(); }}};

executor where Ginkgo will perform the computation

const auto exec = exec_map.at(executor_string)(); // throws if not valid
const size_type num_systems = argc >= 3 ? std::atoi(argv[2]) : 2;
const int num_rows = argc >= 4 ? std::atoi(argv[3]) : 32; // per system

Whether to print the residuals or not.

const bool print_residuals =
argc >= 5 ? (std::string(argv[4]) == "true") : false;

The number of repetitions for the timing.

const int num_reps = argc >= 6 ? std::atoi(argv[5]) : 20;

Generate data

The "application" generates the batch of linear systems on the device

auto appl_sys = appl_generate_system(num_rows, num_systems, exec);

Create batch_dim object to describe the dimensions of the batch matrix.

auto batch_mat_size =
gko::batch_dim<2>(num_systems, gko::dim<2>(num_rows, num_rows));
auto batch_vec_size =
gko::batch_dim<2>(num_systems, gko::dim<2>(num_rows, 1));

Use of application-allocated memory

We can either work on the existing memory allocated in the application, or we can copy it for the linear solve. Ginkgo expects the nonzero values for all the small matrices to be allocated contiguously, one matrix after the other.

exec, num_systems * appl_sys.nnz, appl_sys.all_values);
auto rowptrs_view = gko::array<index_type>::const_view(exec, num_rows + 1,
appl_sys.row_ptrs);
auto colidxs_view = gko::array<index_type>::const_view(exec, appl_sys.nnz,
appl_sys.col_idxs);
auto A = gko::share(mtx_type::create_const(
exec, batch_mat_size, std::move(vals_view), std::move(colidxs_view),
std::move(rowptrs_view)));

RHS and solution vectors

batch_stride object specifies the access stride within the individual matrices (vectors) in the batch. In this case, we specify a stride of 1 as the common value for all the matrices. Create RHS, again reusing application allocation

exec, num_systems * num_rows, appl_sys.all_rhs);
auto b = vec_type::create_const(exec, batch_vec_size, std::move(b_view));

Create initial guess as 0 and copy to device

auto x = vec_type::create(exec);
auto host_x = vec_type::create(exec->get_master(), batch_vec_size);
for (size_type isys = 0; isys < num_systems; isys++) {
for (int irow = 0; irow < num_rows; irow++) {
host_x->at(isys, irow, 0) = gko::zero<value_type>();
}
}
x->copy_from(host_x.get());

Create the batch solver factory

const real_type reduction_factor{1e-10};

Create a batched solver factory with relevant parameters.

auto solver =
bicgstab::build()
.with_max_iterations(500)
.with_tolerance(reduction_factor)
.with_tolerance_type(gko::batch::stop::tolerance_type::relative)
.on(exec)
->generate(A);

Batch logger

Create a logger to obtain the iteration counts and "implicit" residual norms for every system after the solve.

std::shared_ptr<const gko::batch::log::BatchConvergence<value_type>>

Generate and solve

add the logger to the solver

solver->add_logger(logger);

Solve the batch system

auto x_clone = gko::clone(x);

Warmup

for (int i = 0; i < 3; ++i) {
x_clone->copy_from(x.get());
solver->apply(b, x_clone);
}
double apply_time = 0.0;
for (int i = 0; i < num_reps; ++i) {
x_clone->copy_from(x.get());
exec->synchronize();
std::chrono::steady_clock::time_point t1 =
std::chrono::steady_clock::now();
solver->apply(b, x_clone);
exec->synchronize();
std::chrono::steady_clock::time_point t2 =
std::chrono::steady_clock::now();
auto time_span =
std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1);
apply_time += time_span.count();
}
x->copy_from(x_clone.get());

This is not necessary, but one might want to remove the logger before the next solve using the same solver object.

solver->remove_logger(logger.get());

Check result

Compute norm of RHS on the device and automatically copy to host

auto norm_dim = gko::batch_dim<2>(num_systems, gko::dim<2>(1, 1));
auto host_b_norm = real_vec_type::create(exec->get_master(), norm_dim);
host_b_norm->fill(0.0);
b->compute_norm2(host_b_norm);

we need constants on the device

auto one = vec_type::create(exec, norm_dim);
one->fill(1.0);
auto neg_one = vec_type::create(exec, norm_dim);
neg_one->fill(-1.0);

allocate and compute the residual

auto res = vec_type::create(exec, batch_vec_size);
res->copy_from(b);
A->apply(one, x, neg_one, res);

allocate and compute residual norm

auto host_res_norm = real_vec_type::create(exec->get_master(), norm_dim);
host_res_norm->fill(0.0);
res->compute_norm2(host_res_norm);
auto host_log_resid = gko::make_temporary_clone(
exec->get_master(), &logger->get_residual_norm());
auto host_log_iters = gko::make_temporary_clone(
exec->get_master(), &logger->get_num_iterations());
if (print_residuals) {
std::cout << "Residual norm sqrt(r^T r):\n";

"unbatch" converts a batch object into a vector of objects of the corresponding single type, eg. batch::matrix::Dense --> std::vector<Dense>.

auto unb_res = detail::unbatch(host_res_norm.get());
auto unb_bnorm = detail::unbatch(host_b_norm.get());
for (size_type i = 0; i < num_systems; ++i) {
std::cout << " System no. " << i
<< ": residual norm = " << unb_res[i]->at(0, 0)
<< ", implicit residual norm = "
<< host_log_resid->get_const_data()[i]
<< ", iterations = "
<< host_log_iters->get_const_data()[i] << std::endl;
const real_type relresnorm =
unb_res[i]->at(0, 0) / unb_bnorm[i]->at(0, 0);
if (!(relresnorm <= reduction_factor)) {
std::cout << "System " << i << " converged only to "
<< relresnorm << " relative residual." << std::endl;
}
}
}
std::cout << "Solver type: "
<< "batch::bicgstab"
<< "\nMatrix size: " << A->get_common_size()
<< "\nNum batch entries: " << A->get_num_batch_items()
<< "\nEntire solve took: " << apply_time / num_reps << " seconds."
<< std::endl;

Ginkgo objects are cleaned up automatically; but the "application" still needs to clean up its data in this case.

appl_clean_up(appl_sys, exec);
return 0;
}

Generate the matrix and the vectors. This function emulates the generation of the data from the application.

ApplSysData appl_generate_system(const int nrows, const size_type nsystems,
std::shared_ptr<gko::Executor> exec)
{
const int nnz = nrows * 3 - 2;
std::default_random_engine rgen(15);
std::normal_distribution<real_type> distb(0.5, 0.1);
std::vector<real_type> spacings(nsystems * nrows);
std::generate(spacings.begin(), spacings.end(),
[&]() { return distb(rgen); });
std::vector<value_type> allvalues(nnz * nsystems);
for (size_type isys = 0; isys < nsystems; isys++) {
allvalues.at(isys * nnz) = 2.0 / spacings.at(isys * nrows);
allvalues.at(isys * nnz + 1) = -1.0;
for (int irow = 0; irow < nrows - 2; irow++) {
allvalues.at(isys * nnz + 2 + irow * 3) = -1.0;
allvalues.at(isys * nnz + 2 + irow * 3 + 1) =
2.0 / spacings.at(isys * nrows + irow + 1);
allvalues.at(isys * nnz + 2 + irow * 3 + 2) = -1.0;
}
allvalues.at(isys * nnz + 2 + (nrows - 2) * 3) = -1.0;
allvalues.at(isys * nnz + 2 + (nrows - 2) * 3 + 1) =
2.0 / spacings.at((isys + 1) * nrows - 1);
assert(isys * nnz + 2 + (nrows - 2) * 3 + 2 == (isys + 1) * nnz);
}
std::vector<index_type> rowptrs(nrows + 1);
rowptrs.at(0) = 0;
rowptrs.at(1) = 2;
for (int i = 2; i < nrows; i++) {
rowptrs.at(i) = rowptrs.at(i - 1) + 3;
}
rowptrs.at(nrows) = rowptrs.at(nrows - 1) + 2;
assert(rowptrs.at(nrows) == nnz);
std::vector<index_type> colidxs(nnz);
colidxs.at(0) = 0;
colidxs.at(1) = 1;
const int nnz_per_row = 3;
for (int irow = 1; irow < nrows - 1; irow++) {
colidxs.at(2 + (irow - 1) * nnz_per_row) = irow - 1;
colidxs.at(2 + (irow - 1) * nnz_per_row + 1) = irow;
colidxs.at(2 + (irow - 1) * nnz_per_row + 2) = irow + 1;
}
colidxs.at(2 + (nrows - 2) * nnz_per_row) = nrows - 2;
colidxs.at(2 + (nrows - 2) * nnz_per_row + 1) = nrows - 1;
assert(2 + (nrows - 2) * nnz_per_row + 1 == nnz - 1);
std::vector<value_type> allb(nrows * nsystems);
for (size_type isys = 0; isys < nsystems; isys++) {
const value_type bval = distb(rgen);
std::fill(allb.begin() + isys * nrows,
allb.begin() + (isys + 1) * nrows, bval);
}
index_type* const row_ptrs = exec->alloc<index_type>(nrows + 1);
exec->copy_from(exec->get_master().get(), static_cast<size_type>(nrows + 1),
rowptrs.data(), row_ptrs);
index_type* const col_idxs = exec->alloc<index_type>(nnz);
exec->copy_from(exec->get_master().get(), static_cast<size_type>(nnz),
colidxs.data(), col_idxs);
value_type* const all_values = exec->alloc<value_type>(nsystems * nnz);
exec->copy_from(exec->get_master().get(), nsystems * nnz, allvalues.data(),
all_values);
value_type* const all_b = exec->alloc<value_type>(nsystems * nrows);
exec->copy_from(exec->get_master().get(), nsystems * nrows, allb.data(),
all_b);
return {nsystems, nrows, nnz, row_ptrs, col_idxs, all_values, all_b};
}
void appl_clean_up(ApplSysData& appl_data, std::shared_ptr<gko::Executor> exec)
{

In general, the application would control non-const pointers; the const casts below would not be needed.

exec->free(const_cast<index_type*>(appl_data.row_ptrs));
exec->free(const_cast<index_type*>(appl_data.col_idxs));
exec->free(const_cast<value_type*>(appl_data.all_values));
exec->free(const_cast<value_type*>(appl_data.all_rhs));
}

Results

The following is the expected result on the reference executor:

Solver type: batch::bicgstab
Matrix size: (32, 32)
Num batch entries: 2
Entire solve took: 2.18558e-05 seconds.

Comments about programming and debugging

The plain program

#include <fstream>
#include <iostream>
#include <map>
#include <random>
#include <string>
#include <ginkgo/ginkgo.hpp>
using value_type = double;
using index_type = int;
using real_vec_type = gko::batch::MultiVector<real_type>;
namespace detail {
template <typename InputType>
auto unbatch(const InputType* batch_object)
{
auto unbatched_mats =
std::vector<std::unique_ptr<typename InputType::unbatch_type>>{};
for (size_type b = 0; b < batch_object->get_num_batch_items(); ++b) {
unbatched_mats.emplace_back(
batch_object->create_const_view_for_item(b)->clone());
}
return unbatched_mats;
}
} // namespace detail
struct ApplSysData {
size_type nsystems;
int nrows;
int nnz;
const index_type* row_ptrs;
const index_type* col_idxs;
const value_type* all_values;
const value_type* all_rhs;
};
/*
* Generates a batch of tridiagonal systems.
*
* @param nrows Number of rows in each system.
* @param nsystems Number of systems in the batch.
* @param exec The device executor to use for the solver.
* @note Normally, the application may not deal with Ginkgo executors, nor do
* we need it to. Here, we use the executor for backend-independent device
* memory allocation. The application, for example, might assume Hip (for AMD
* GPUs) and use `hipMalloc` directly.
*/
ApplSysData appl_generate_system(const int nrows, const size_type nsystems,
std::shared_ptr<gko::Executor> exec);
void appl_clean_up(ApplSysData& appl_data, std::shared_ptr<gko::Executor> exec);
int main(int argc, char* argv[])
{
std::cout << gko::version_info::get() << std::endl;
if (argc == 2 && (std::string(argv[1]) == "--help")) {
std::cerr << "Usage: " << argv[0]
<< " [executor] [num_systems] [num_rows] [print_residuals] "
"[num_reps]"
<< std::endl;
std::exit(-1);
}
const auto executor_string = argc >= 2 ? argv[1] : "reference";
std::map<std::string, std::function<std::shared_ptr<gko::Executor>()>>
exec_map{
{"omp", [] { return gko::OmpExecutor::create(); }},
{"cuda",
[] {
}},
{"hip",
[] {
}},
{"dpcpp",
[] {
}},
{"reference", [] { return gko::ReferenceExecutor::create(); }}};
const auto exec = exec_map.at(executor_string)(); // throws if not valid
const size_type num_systems = argc >= 3 ? std::atoi(argv[2]) : 2;
const int num_rows = argc >= 4 ? std::atoi(argv[3]) : 32; // per system
const bool print_residuals =
argc >= 5 ? (std::string(argv[4]) == "true") : false;
const int num_reps = argc >= 6 ? std::atoi(argv[5]) : 20;
auto appl_sys = appl_generate_system(num_rows, num_systems, exec);
auto batch_mat_size =
gko::batch_dim<2>(num_systems, gko::dim<2>(num_rows, num_rows));
auto batch_vec_size =
gko::batch_dim<2>(num_systems, gko::dim<2>(num_rows, 1));
exec, num_systems * appl_sys.nnz, appl_sys.all_values);
auto rowptrs_view = gko::array<index_type>::const_view(exec, num_rows + 1,
appl_sys.row_ptrs);
auto colidxs_view = gko::array<index_type>::const_view(exec, appl_sys.nnz,
appl_sys.col_idxs);
auto A = gko::share(mtx_type::create_const(
exec, batch_mat_size, std::move(vals_view), std::move(colidxs_view),
std::move(rowptrs_view)));
exec, num_systems * num_rows, appl_sys.all_rhs);
auto b = vec_type::create_const(exec, batch_vec_size, std::move(b_view));
auto x = vec_type::create(exec);
auto host_x = vec_type::create(exec->get_master(), batch_vec_size);
for (size_type isys = 0; isys < num_systems; isys++) {
for (int irow = 0; irow < num_rows; irow++) {
host_x->at(isys, irow, 0) = gko::zero<value_type>();
}
}
x->copy_from(host_x.get());
const real_type reduction_factor{1e-10};
auto solver =
bicgstab::build()
.with_max_iterations(500)
.with_tolerance(reduction_factor)
.with_tolerance_type(gko::batch::stop::tolerance_type::relative)
.on(exec)
->generate(A);
std::shared_ptr<const gko::batch::log::BatchConvergence<value_type>>
solver->add_logger(logger);
auto x_clone = gko::clone(x);
for (int i = 0; i < 3; ++i) {
x_clone->copy_from(x.get());
solver->apply(b, x_clone);
}
double apply_time = 0.0;
for (int i = 0; i < num_reps; ++i) {
x_clone->copy_from(x.get());
exec->synchronize();
std::chrono::steady_clock::time_point t1 =
std::chrono::steady_clock::now();
solver->apply(b, x_clone);
exec->synchronize();
std::chrono::steady_clock::time_point t2 =
std::chrono::steady_clock::now();
auto time_span =
std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1);
apply_time += time_span.count();
}
x->copy_from(x_clone.get());
solver->remove_logger(logger.get());
auto norm_dim = gko::batch_dim<2>(num_systems, gko::dim<2>(1, 1));
auto host_b_norm = real_vec_type::create(exec->get_master(), norm_dim);
host_b_norm->fill(0.0);
b->compute_norm2(host_b_norm);
auto one = vec_type::create(exec, norm_dim);
one->fill(1.0);
auto neg_one = vec_type::create(exec, norm_dim);
neg_one->fill(-1.0);
auto res = vec_type::create(exec, batch_vec_size);
res->copy_from(b);
A->apply(one, x, neg_one, res);
auto host_res_norm = real_vec_type::create(exec->get_master(), norm_dim);
host_res_norm->fill(0.0);
res->compute_norm2(host_res_norm);
auto host_log_resid = gko::make_temporary_clone(
exec->get_master(), &logger->get_residual_norm());
auto host_log_iters = gko::make_temporary_clone(
exec->get_master(), &logger->get_num_iterations());
if (print_residuals) {
std::cout << "Residual norm sqrt(r^T r):\n";
auto unb_res = detail::unbatch(host_res_norm.get());
auto unb_bnorm = detail::unbatch(host_b_norm.get());
for (size_type i = 0; i < num_systems; ++i) {
std::cout << " System no. " << i
<< ": residual norm = " << unb_res[i]->at(0, 0)
<< ", implicit residual norm = "
<< host_log_resid->get_const_data()[i]
<< ", iterations = "
<< host_log_iters->get_const_data()[i] << std::endl;
const real_type relresnorm =
unb_res[i]->at(0, 0) / unb_bnorm[i]->at(0, 0);
if (!(relresnorm <= reduction_factor)) {
std::cout << "System " << i << " converged only to "
<< relresnorm << " relative residual." << std::endl;
}
}
}
std::cout << "Solver type: "
<< "batch::bicgstab"
<< "\nMatrix size: " << A->get_common_size()
<< "\nNum batch entries: " << A->get_num_batch_items()
<< "\nEntire solve took: " << apply_time / num_reps << " seconds."
<< std::endl;
appl_clean_up(appl_sys, exec);
return 0;
}
ApplSysData appl_generate_system(const int nrows, const size_type nsystems,
std::shared_ptr<gko::Executor> exec)
{
const int nnz = nrows * 3 - 2;
std::default_random_engine rgen(15);
std::normal_distribution<real_type> distb(0.5, 0.1);
std::vector<real_type> spacings(nsystems * nrows);
std::generate(spacings.begin(), spacings.end(),
[&]() { return distb(rgen); });
std::vector<value_type> allvalues(nnz * nsystems);
for (size_type isys = 0; isys < nsystems; isys++) {
allvalues.at(isys * nnz) = 2.0 / spacings.at(isys * nrows);
allvalues.at(isys * nnz + 1) = -1.0;
for (int irow = 0; irow < nrows - 2; irow++) {
allvalues.at(isys * nnz + 2 + irow * 3) = -1.0;
allvalues.at(isys * nnz + 2 + irow * 3 + 1) =
2.0 / spacings.at(isys * nrows + irow + 1);
allvalues.at(isys * nnz + 2 + irow * 3 + 2) = -1.0;
}
allvalues.at(isys * nnz + 2 + (nrows - 2) * 3) = -1.0;
allvalues.at(isys * nnz + 2 + (nrows - 2) * 3 + 1) =
2.0 / spacings.at((isys + 1) * nrows - 1);
assert(isys * nnz + 2 + (nrows - 2) * 3 + 2 == (isys + 1) * nnz);
}
std::vector<index_type> rowptrs(nrows + 1);
rowptrs.at(0) = 0;
rowptrs.at(1) = 2;
for (int i = 2; i < nrows; i++) {
rowptrs.at(i) = rowptrs.at(i - 1) + 3;
}
rowptrs.at(nrows) = rowptrs.at(nrows - 1) + 2;
assert(rowptrs.at(nrows) == nnz);
std::vector<index_type> colidxs(nnz);
colidxs.at(0) = 0;
colidxs.at(1) = 1;
const int nnz_per_row = 3;
for (int irow = 1; irow < nrows - 1; irow++) {
colidxs.at(2 + (irow - 1) * nnz_per_row) = irow - 1;
colidxs.at(2 + (irow - 1) * nnz_per_row + 1) = irow;
colidxs.at(2 + (irow - 1) * nnz_per_row + 2) = irow + 1;
}
colidxs.at(2 + (nrows - 2) * nnz_per_row) = nrows - 2;
colidxs.at(2 + (nrows - 2) * nnz_per_row + 1) = nrows - 1;
assert(2 + (nrows - 2) * nnz_per_row + 1 == nnz - 1);
std::vector<value_type> allb(nrows * nsystems);
for (size_type isys = 0; isys < nsystems; isys++) {
const value_type bval = distb(rgen);
std::fill(allb.begin() + isys * nrows,
allb.begin() + (isys + 1) * nrows, bval);
}
index_type* const row_ptrs = exec->alloc<index_type>(nrows + 1);
exec->copy_from(exec->get_master().get(), static_cast<size_type>(nrows + 1),
rowptrs.data(), row_ptrs);
index_type* const col_idxs = exec->alloc<index_type>(nnz);
exec->copy_from(exec->get_master().get(), static_cast<size_type>(nnz),
colidxs.data(), col_idxs);
value_type* const all_values = exec->alloc<value_type>(nsystems * nnz);
exec->copy_from(exec->get_master().get(), nsystems * nnz, allvalues.data(),
all_values);
value_type* const all_b = exec->alloc<value_type>(nsystems * nrows);
exec->copy_from(exec->get_master().get(), nsystems * nrows, allb.data(),
all_b);
return {nsystems, nrows, nnz, row_ptrs, col_idxs, all_values, all_b};
}
void appl_clean_up(ApplSysData& appl_data, std::shared_ptr<gko::Executor> exec)
{
exec->free(const_cast<index_type*>(appl_data.row_ptrs));
exec->free(const_cast<index_type*>(appl_data.col_idxs));
exec->free(const_cast<value_type*>(appl_data.all_values));
exec->free(const_cast<value_type*>(appl_data.all_rhs));
}
gko::log::profile_event_category::solver
Solver events.
gko::matrix::permute_mode::rows
The rows will be permuted.
gko::size_type
std::size_t size_type
Integral type used for allocation quantities.
Definition: types.hpp:89
gko::batch::log::BatchConvergence::create
static std::unique_ptr< BatchConvergence > create(const mask_type &enabled_events=gko::log::Logger::batch_solver_completed_mask)
Creates a convergence logger.
Definition: batch_logger.hpp:92
gko::batch::matrix::Csr
Csr is a general sparse matrix format that stores the column indices for each nonzero entry and a cum...
Definition: batch_csr.hpp:47
gko::batch::MultiVector
MultiVector stores multiple vectors in a batched fashion and is useful for batched operations.
Definition: batch_multi_vector.hpp:52
gko::log::profile_event_category::memory
Memory allocation.
gko::clone
detail::cloned_type< Pointer > clone(const Pointer &p)
Creates a unique clone of the object pointed to by p.
Definition: utils_helper.hpp:173
gko::HipExecutor::create
static std::shared_ptr< HipExecutor > create(int device_id, std::shared_ptr< Executor > master, bool device_reset, allocation_mode alloc_mode=default_hip_alloc_mode, CUstream_st *stream=nullptr)
Creates a new HipExecutor.
gko::version_info::get
static const version_info & get()
Returns an instance of version_info.
Definition: version.hpp:139
gko::dim< 2 >
gko::batch_dim< 2 >
gko::share
detail::shared_type< OwningPointer > share(OwningPointer &&p)
Marks the object pointed to by p as shared.
Definition: utils_helper.hpp:224
gko::batch::solver::Bicgstab
BiCGSTAB or the Bi-Conjugate Gradient-Stabilized is a Krylov subspace solver.
Definition: batch_bicgstab.hpp:50
gko::CudaExecutor::create
static std::shared_ptr< CudaExecutor > create(int device_id, std::shared_ptr< Executor > master, bool device_reset, allocation_mode alloc_mode=default_cuda_alloc_mode, CUstream_st *stream=nullptr)
Creates a new CudaExecutor.
gko::OmpExecutor::create
static std::shared_ptr< OmpExecutor > create(std::shared_ptr< CpuAllocatorBase > alloc=std::make_shared< CpuAllocator >())
Creates a new OmpExecutor.
Definition: executor.hpp:1396
gko::array::const_view
static detail::const_array_view< ValueType > const_view(std::shared_ptr< const Executor > exec, size_type size, const value_type *data)
Creates a constant (immutable) array from existing memory.
Definition: array.hpp:384
gko::make_temporary_clone
detail::temporary_clone< detail::pointee< Ptr > > make_temporary_clone(std::shared_ptr< const Executor > exec, Ptr &&ptr)
Creates a temporary_clone.
Definition: temporary_clone.hpp:208
gko::remove_complex
typename detail::remove_complex_s< T >::type remove_complex
Obtain the type which removed the complex of complex/scalar type or the template parameter of class b...
Definition: math.hpp:260
gko::DpcppExecutor::create
static std::shared_ptr< DpcppExecutor > create(int device_id, std::shared_ptr< Executor > master, std::string device_type="all", dpcpp_queue_property property=dpcpp_queue_property::in_order)
Creates a new DpcppExecutor.
gko::one
constexpr T one()
Returns the multiplicative identity for T.
Definition: math.hpp:630