notations
1. Definitions
1.1. Matrices
Matrix Definition A matrix is a linear transformation between finite dimensional vector spaces.
Assembling a matrix Assembling a matrix means defining its action as entries stored in a sparse or dense format. For example, in the finite element context, the storage format is sparse to take advantage of the many zero entries.
- Symmetric matrix
-
A=AT
- Definite (resp. semi-definite) positive matrix
-
All eigenvalue are
-
>0 (resp ≥0) or
-
xTAx>0,∀ x (resp. xT A x≥0∀ x)
-
- Definite (resp. semi-negative) matrix
-
All eigenvalue are
-
<0 (resp. ≤0) or
-
xT A x<0 ∀ x (resp. xT A x≤0∀ x)
-
- Indefinite matrix
-
There exists
-
positive and negative eigenvalue (Stokes, Helmholtz) or
-
there exists x,y such that xTAx>0>yTAy
-
1.2. Preconditioners
1.2.1. Definition
Let A be a Rn×n matrix, x and b be Rn vectors, we wish to solve Ax=b.
- Definition
-
A preconditioner P is a method for constructing a matrix (just a linear function, not assembled!) P−1=P(A,Ap) using a matrix A and extra information Ap, such that the spectrum of P−1A (left preconditioning) or AP−1 (right preconditioning) is well-behaved. The action of preconditioning improves the conditioning of the previous linear system.
Left preconditioning: We solve for (P−1A)x=P−1b and we build the Krylov space {P−1b,(P−1A)P−1b,(P−1A)2P−1b,…}
Right preconditioning: We solve for (AP−1)Px=b and we build the Krylov space {b,(P−1A)b,(P−1A)2b,…}
Note that the product P−1A or AP−1 is never assembled.
1.2.2. Properties
Let us now describe some properties of preconditioners
-
P−1 is dense, P is often not available and is not needed
-
A is rarely used by P, but Ap=A is common
-
Ap is often a sparse matrix, the \e preconditioning \e matrix
Here are some numerical methods to solve the system Ax=b
-
Matrix-based: Jacobi, Gauss-Seidel, SOR, ILU(k), LU
-
Parallel: Block-Jacobi, Schwarz, Multigrid, FETI-DP, BDDC
-
Indefinite: Schur-complement, Domain Decomposition, Multigrid
1.3. Preconditioner strategies
- Relaxation
-
Split into lower, diagonal, upper parts: A=L+D+U.
- Jacobi
-
Cheapest preconditioner: P−1=D−1.
# sequential
pc-type=jacobi
# parallel
pc-type=block_jacobi
- Successive over-relaxation (SOR)
-
Implemented as a sweep.
-
ω=1 corresponds to Gauss-Seidel.
-
Very effective at removing high-frequency components of residual.
# sequential
pc-type=sor
1.3.1. Factorization
Two phases
-
symbolic factorization: find where fill occurs, only uses sparsity pattern.
-
numeric factorization: compute factors.
- LU decomposition
-
-
preconditioner.
-
Expensive, for m×m sparse matrix with bandwidth b, traditionally requires O(mb2) time and O(mb) space.
-
Bandwidth scales as md−1d in d-dimensions.
-
Optimal in 2D: O(m⋅logm) space, O(m3/2) time.
-
Optimal in 3D: O(m4/3) space, O(m2) time.
-
-
-
Symbolic factorization is problematic in parallel.
-
1.3.2. 1-level Domain decomposition
Domain size L, subdomain size H, element size h
-
Overlapping/Schwarz
-
Solve Dirichlet problems on overlapping subdomains.
-
No overlap: its∈O(L√Hh).
-
Overlap δ: its∈(L√Hδ).
-
pc-type=gasm # has a coarse grid preconditioner
pc-type=asm
-
Neumann-Neumann
-
Solve Neumann problems on non-overlapping subdomains.
-
its∈O(LH(1+logHh)).
-
Tricky null space issues (floating subdomains).
-
Need subdomain matrices, not globally assembled matrix.
-
Notes: Multilevel variants knock off the leading LH.
Both overlapping and nonoverlapping with this bound.
-
BDDC and FETI-DP
-
Neumann problems on subdomains with coarse grid correction.
-
its∈O(1+logHh).
-
1.3.3. Multigrid
Hierarchy: Interpolation and restriction operators Π↑:Xcoarse→XfineΠ↓:Xfine→Xcoarse
-
Geometric: define problem on multiple levels, use grid to compute hierarchy.
-
Algebraic: define problem only on finest level, use matrix structure to build hierarchy.
Galerkin approximation
Assemble this matrix: Acoarse=Π↓AfineΠ↑
Application of multigrid preconditioner (V-cycle)
-
Apply pre-smoother on fine level (any preconditioner).
-
Restrict residual to coarse level with Π↓.
-
Solve on coarse level Acoarsex=r.
-
Interpolate result back to fine level with Π↑.
-
Apply post-smoother on fine level (any preconditioner).
Multigrid convergence properties
-
Textbook: P−1A is spectrally equivalent to identity
-
Constant number of iterations to converge up to discretization error.
-
-
Most theory applies to SPD systems
-
variable coefficients (e.g. discontinuous): low energy interpolants.
-
mesh- and/or physics-induced anisotropy: semi-coarsening/line smoothers.
-
complex geometry: difficult to have meaningful coarse levels.
-
-
Deeper algorithmic difficulties
-
nonsymmetric (e.g. advection, shallow water, Euler).
-
indefinite (e.g. incompressible flow, Helmholtz).
-
-
Performance considerations
-
Aggressive coarsening is critical in parallel.
-
Most theory uses SOR smoothers, ILU often more robust.
-
Coarsest level usually solved semi-redundantly with direct solver.
-
-
Multilevel Schwarz is essentially the same with different language
-
assume strong smoothers, emphasize aggressive coarsening.
-
2. Principles
Feel++ abstracts the PETSc library and provides a subset (sufficient in most cases) to the PETSc features. It interfaces with the following PETSc libraries: Mat
, Vec
, KSP
, PC
, SNES.
-
Vec
Vector handling library -
Mat
Matrix handling library -
KSP
Krylov SubSpace library implements various iterative solvers -
PC
Preconditioner library implements various preconditioning strategies -
SNES
Nonlinear solver library implements various nonlinear solve strategies
All linear algebra are encapsulated within backends using the command line option --backend=<backend>
or config file option backend=<backend>
which provide interface to several libraries
Library |
Format |
Backend |
PETSc |
sparse |
|
Eigen |
sparse |
|
Eigen |
dense |
|
The default backend
is petsc.
3. Somes generic examples
The configuration files .cfg
allow for a wide range of options to solve a linear or non-linear system.
We consider now the following example [import:"marker1"](../../codes/mylaplacian.cpp)
To execute this example
# sequential
./feelpp_tut_laplacian
# parallel on 4 cores
mpirun -np 4 ./feelpp_tut_laplacian
As described in section
3.1. Direct solver
cholesky
and lu
factorisation are available. However the parallel implementation depends on the availability of MUMPS. The configuration is very simple.
# no iterative solver
ksp-type=preonly
#
pc-type=cholesky
Using the PETSc backend allows to choose different packages to compute the factorization.
Package |
Description |
Parallel |
|
PETSc own implementation |
yes |
|
MUltifrontal Massively Parallel sparse direct Solver |
yes |
|
Unsymmetric MultiFrontal package |
no |
|
Parallel Sparse matriX package |
yes |
To choose between these factorization package
# choose mumps
pc-factor-mat-solver-package=mumps
# choose umfpack (sequential)
pc-factor-mat-solver-package=umfpack
In order to perform a cholesky type of factorisation, it is required to set the underlying matrix to be SPD.
// matrix
auto A = backend->newMatrix(_test=...,_trial=...,_properties=SPD);
// bilinear form
auto a = form2( _test=..., _trial=..., _properties=SPD );
3.2. Using iterative solvers
3.2.1. Using CG and ICC(3)
with a relative tolerance of 1e-12:
ksp-rtol=1.e-12
ksp-type=cg
pc-type=icc
pc-factor-levels=3
4. Solving the Laplace problem
We start with the quickstart Laplacian example, recall that we wish to, given a domain Ω, find u such that
4.2. Viewing KSP solvers
shell> mpirun -np 2 feelpp_qs_laplacian --ksp-monitor=1 --ksp-view=1
0 KSP Residual norm 8.953261456448e-01
1 KSP Residual norm 7.204431786960e-16
KSP Object: 2 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=1000
tolerances: relative=1e-13, absolute=1e-50, divergence=100000
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
type: shell
Shell:
linear system matrix = precond matrix:
Matrix Object: 2 MPI processes
type: mpiaij
rows=525, cols=525
total: nonzeros=5727, allocated nonzeros=5727
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
5. Solvers and preconditioners
You can now change the Krylov subspace solver using the --ksp-type
option and the preconditioner using --pc-ptype
option.
For example,
-
to solve use the conjugate gradient,
cg
, solver and the default preconditioner use the following
./feelpp_qs_laplacian --ksp-type=cg --ksp-view=1 --ksp-monitor=1
-
to solve using the algebraic multigrid preconditioner,
gamg
, withcg
as a solver use the following
./feelpp_qs_laplacian --ksp-type=cg --ksp-view=1 --ksp-monitor=1 --pc-type=gamg
6. Block factorisation
6.1. Stokes
We now turn to the quickstart Stokes example, recall that we wish to, given a domain Ω, find (u,p) such that
This problem is indefinite. Possible solution strategies are
-
Uzawa,
-
penalty(techniques from optimisation),
-
augmented lagrangian approach (Glowinski,Le Tallec)
that The Inf-sup condition must be satisfied. In particular for a multigrid strategy, the smoother needs to preserve it. |
6.1.1. General approach for saddle point problems
The Krylov subspace solvers for indefinite problems are MINRES, GMRES. As to preconditioning, we look first at the saddle point matrix M and its block factorization M=LDLT, indeed we have :
-
Elman, Silvester and Wathen propose 3 preconditioners:
where ˜S≈S−1=BTA−1B and ˜A−1≈A−1