Generic FAQ

Why are some arrays indexed using the 'int' type and others using the 'FEAT::Index' type?

Let us begin with the two golden rules of thumb for successfully indexing in FEAT3:

All arrays, whose length is runtime dependent, are indexed using the 'FEAT::Index' type.
All arrays, whose length is compile-time dependent, are indexed using the builin 'int' type.

The reason for using FEAT::Index, which is typedefed to 'unsigned long' by default, this is quite simple and obvious to any experienced programmer: We want to be able to define this type to a 32-bit or 64-bit unsigned integer type depending on whatever is best for the platform that FEAT3 is currently compiled on. This desire directly rules out the use of a buildin type name like 'int' or 'unsigned long' or whatever and a typedef is the only proper way to go.

Now the main question is: Why don't we always use FEAT::Index for indexing?

The reason for this is quite pragmatic and rather technical: Entries of "small" containers like Tiny::Vector and Tiny::Matrix are very often addressed explicitly with literals ("constants") to implement various formulae, whereas entries of "large" containers like the LAFEM containers are usually only addressed using loop variables. Although it would be more consistent to use the FEAT::Index type for addressing all types of array elements, one would have to manually cast each explicit index literal to the Index type to avoid compiler warnings complaining about implicit int-to-Index casts – and this original approach turned out to be much more painful than it sounds.

Let's take a look at an example: Let a, b and c denote three Tiny::Vector<double,3>'s and we want to compute the 3D cross product c := a x b. The code for the cross product using 'int' for indexing looks like this:

c(0) = a(1) * b(2) - a(2) * b(1);
c(1) = a(2) * b(0) - a(0) * b(2);
c(2) = a(0) * b(1) - a(1) * b(0);

If the Tiny::Vector entries were addressed using the FEAT::Index type, one would have to write the following code instead:

c(Index(0)) = a(Index(1)) * b(Index(2)) - a(Index(2)) * b(Index(1));
c(Index(1)) = a(Index(2)) * b(Index(0)) - a(Index(0)) * b(Index(2));
c(Index(2)) = a(Index(0)) * b(Index(1)) - a(Index(1)) * b(Index(0));

Because the latter approach turned out to be difficult to read and tedious to write, the decision was made to use the int type for indexing compile-time arrays to keep the literals short and clear, thus sacrificing consistency.

Some source files in the kernel subdirectories have "eickt" in their filename. What are these files for?

These source files are used by the build system to explicitly instantiate common kernel templates, if the --eickt option has been passed to the configure script.

By design, most classes defined in the FEAT3 kernel are implemented as class templates and by design of the C++ language, class templates and – even more importantly – their member functions are only compiled if the class template is instantiated (i.e. used) in a source file. The advantage of this approach is that class templates are only compiled if they actually need to be compiled in an application/tutorial/unit-test source file.

However, there are two downsides to this:
Firstly, all class templates, that are used in an application, have to be recompiled when the application source file has changed, although the kernel class templates themselves have not changed since the last compilation. This means that you spend a lot of time recompiling kernel code over and over again, although that kernel code does not change. (Note that some modern compilers will try to reduce this recompilation time as good as they can, which may or may not work out well.)

Secondly, a class template, that is used by many applications/tutorials/unit-tests (think of SparseMatrixCSR as an example), has to be compiled for each for each applications separately, although the kernel class template is identical for each application. This severely increases overall compilation time if you are compiling a whole bunch of applications, which use a lot of common class templates. (The nighly regression test system is a perfect example, because it compiles all applications/unit-tests.)

The eickt mechanism reduces this recompilation overhead to some extend by pre-compiling commonly used kernel class templates in the FEAT3 kernel. For instance, the LAFEM::SparseMatrixCSR class template is often used with the <Mem::Main,double,Index> template parameter combination, so the class LAFEM::SparseMatrixCSR<Mem::Main,double,Index> is explicitly instantiated in the kernel library.

The primary advantage of this eickt mechanism is that recompiling an application can be significantly faster if the kernel code has not changed since the last compilation, because the precompiled class templates will not have to be recompiled. The drawback is that the (initial) compilation of the kernel will be significantly slower because the class templates have to be precompiled every time one of the relevant kernel files changes.

Should I use the --eickt option for configure or not?

It depends whether you are actively working on kernel code or whether you are only writing application code without changing kernel code. In the latter case, using the eickt mechanism may speed up our overall compilation times, so it might be a good idea to use it. On the other hand, if you are actively modifying kernel code, then using eickt may significantly increase compiling times for you, if some of the eickt sources have to be re-pre-compiled due to the changes that you have made – even if you did not change any of the precompiled class templates themselves.

In a nutshell: If you change kernel files frequently, eickt is usually a bad idea, otherwise it is usually a good idea.

LAFEM FAQ

Why can't I use the assignment operator '=' to copy LAFEM containers?

The short answer is: efficency.

The standard (copy) assignment operator has been deleted for LAFEM containers to force the programmer to explicitly decide, whether a container has to be

cloned (via the clone() member function)
moved (via the std::move function)
copied (via the copy() member function)

The easiest way to create a new stand-alone copy (aka 'clone') of an existing LAFEM container is to use the clone() member function:

LAFEM::DenseVector<Mem::Main, Index, double> vector_src(10u); // create source vector
LAFEM::DenseVector<Mem::Main, Index, double> vector_dst;      // create empty destination vector
vector_src.format(1.0);                                       // set all values to 1.0 (or initialize vector_src in some other manner)
vector_dst = vector_src.clone();                              // create clone of source vector as destination vector

What is the difference between the clone(), copy() and convert() member functions?

The copy() member function is more or less a memcpy equivalent for LAFEM containers, i.e. it only performs a plain bit-wise copy of the (index and data) array element values of one LAFEM container to another. In consequence, the copy() member function only works between two LAFEM containers of the same class template with the same index and data types. The copy() member function always assumes that both the source and the destination containers are compatible upon call, i.e all index and data arrays of the source and destination containers are already allocated to the correct matching sizes. The copy() member function never allocates new arrays in the destination container and it also never frees the existing arrays of the destination container, so therefore it will abort program execution with a failed assertion if the source and destination container dimensions mismatch.

Finally, the copy() member function can be used to copy the array element values between different memory types, i.e. you can use it to perform a device-to-host or host-to-device copy between two instances of a LAFEM container, e.g.:

Index n = 10u;                                               // vector size
LAFEM::DenseVector<Mem::Main, Index, double> vector_main(n); // vector in main memory
LAFEM::DenseVector<Mem::CUDA, Index, double> vector_cuda(n); // vector in cuda memory
vector_main.format(1.0);                                     // set all values to 1.0 (or initialize vector_main in some other manner)
vector_cuda.copy(vector_main);                               // copy main vector to cuda vector

The clone() member function creates a new copy of an existing LAFEM container with the same memory, data and index type. Note that there exist two overloads of the clone() function: one that takes the source container (of the exact same type) as a function argument (in analogy to the copy function), e.g.

LAFEM::DenseVector<Mem::Main, Index, double> vector_src(10u); // create source vector
LAFEM::DenseVector<Mem::Main, Index, double> vector_dst;      // create empty destination vector
vector_src.format(1.0);                                       // set all values to 1.0 (or initialize vector_src in some other manner)
vector_dst.clone(vector_src);                                 // create destination vector as clone of source

as well as a second overload, which returns the new container as a return value, e.g.

LAFEM::DenseVector<Mem::Main, Index, double> vector_src(10u); // create source vector
vector_src.format(1.0);                                       // set all values to 1.0 (or initialize vector_src in some other manner)
auto vector_dst = vector_src.clone();                         // create clone of source vector as destination vector

In addition to this functionality, both clone() overloads of some LAFEM containers support various clone modes, which are specified using the LAFEM::CloneMode enumeration.

The convert() member function is used to convert one LAFEM container into another (compatible) container, possibly using a different LAFEM container class template and/or different memory, index and/or data types. This member function is used in one of the following scenarios:

A clone of a LAFEM container class template using a different memory, index or data type has to be created.
A LAFEM container has to be converted to a different (compatible) container class template, e.g. one wants to convert a LAFEM::SparseMatrixBanded object into a corresponding LAFEM::SparseMatrixCSR object.

In both scenarios, the source and destination objects must be created, although the destination object may be uninitialized.

LAFEM::DenseVector<Mem::Main, Index, double> vector_src(10u); // create double-precision source vector
LAFEM::DenseVector<Mem::Main,   int, float > vector_dst;      // create empty single-precision destination vector
vector_src.format(1.0);                                       // set all values to 1.0 (or initialize vector_src in some other manner)
vector_dst.convert(vector_src);                               // create destination vector as converted source vector

Table of Contents