**Elementary Variable Types: W****hat are integer, real, complex, logical, or character variables? Storage demands? Precision of floating point numbers? what is a (parameter) constant?**

Memory is distributed in Bytes (sets of 8 bits). One “word” is considered usually 4 or 8 Bytes.

- Integer: Integer numbers, -1, 0, 1, 2, … Represented exactly by converting them to the binary system.
- Real: Floating point numbers, 1.3e-6, 4.44e4, … Takes one word of memory.
- Complex: Complex floating point numbers, represented by two floating point numbers, (3.0e-6,1.3e4). Takes two words of memory.
- Logical: Boolean; True or False.
- Character: character strings, ‘this is a string’. Strings have a certain length. Enconded character by character using a table (usually the ASCII code).
- Floating point numbers: Split into sign, exponent, and mantissa.

** **** **The accuracy of floating points depends on the number of Bytes per word:

- 32 Bit ~ 6 decimal digits == Single Precision
- 64 Bit ~ 15 decimal digits == Double Precision

These are declared in the declaration block as:

real(4) x !Single precision real(8) y !Double precision real(k) z !Another precision defined by k, e.g. k=16 -> 128 Bits

The default for integers is usually 32 Bit (sufficient accuracy for -2**31 to +2**31). But it can be increased, e.g.

integer(8) m

values of the “kind parameter” depend on compiler, but most compilers provide double precision floating point numbers. Parameters are constant values not computed, and available during compilation. They are also protected from being over-written.

real(8), parameter :: pi = 3.14159265

** Arrays and allocation: Declaration? How are one- and multi-dimensional arrays stored in Fortran? Static, automatic, and dynamic allocation of arrays. **

Declaration:

real(8),allocatable :: mata(:,:)

Allocation:

allocate(mata(ndim1,ndim2),stat_ierr) if (ierr.ne.0) then react to error condition end if

- Arrays are stored in successive memory blocks in a given order.
- Matrices are not stored in some sort of funky matrix-shaped arrangement in the memory, but also successively as an array (see two examples below) in column major order, i.e. the matrix is stored in memory in the order column-row.
- Index are run faster from first to last (opposite in Python).
- Array operations are done element-wise. For instance, the product of two arrays gives simply the product of each pair of components (not the mathematical matrix multiplication!)

Symmetric matrices can be packed as:

Q: Difference between static, automatic and dynamic allocation?

**Array operations**

Arrays can be efficiently operated with the BLAS (Basic Linear Algebra Subprograms). It is organized in 3 levels:

- Vector operations
- Vector-matrix operations
- Matrix-matrix operations

The data transfer from memory to CPU often takes longer than arithmetic operations. Therefore, each level has increasingly high performance since it requires the least amount of memory access per arithmetic operation (Level 3 >> Level 2 > Level 1, in terms of efficiency).

About LAPACK library from Wikipedia:

**LAPACK** (Linear Algebra PACKage) is a software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky and Schur decomposition.

LAPACK (…) depends upon the Basic Linear Algebra Subprograms (BLAS) (…)

**Assignment expressions**

Variables are assigned values with “=”. A value can be a number, a logical expression, a mathematical operation…

Arithmetic assignment: variable = arithmetic expression

Operators: +, -, *, /, %, **

Concerning exponentiation: Use integer exponents when possible. DO not use floating point exponents with negative basis of integer or real type, use a complex basis.

Common intrinsic functions that can be used in arithmetic expressions: abs, sqrt, log, log10, exp, cos, sin, tan, acos, asin, atan, cosh, sinh, tanh (all these have one and only one argument).

**Control structures in Fortran:**

**If block and logical if**

IF (logical-expression-1) THEN statements-1 ELSE IF (logical-expression-2) THEN statements-2 ELSE IF (logical-expression-3) THEN statement-3 ELSE IF (.....) THEN ........... ELSE statements-ELSE END IF

**Select case construct**

SELECT CASE (selector) CASE (label-list-1) statements-1 CASE (label-list-2) statements-2 CASE (label-list-3) statements-3 ............. CASE (label-list-n) statements-n CASE DEFAULT statements-DEFAULT END SELECT

**Do loops with and without counter**

In fortran, there are at least 3 types of loops:

- A counter-loop called the “DO” loop which performs the loop for a specified number of times based on an integer counter.
- A loop which repeats itself “while” a certain condition remains true. In Pascal and C, this type of loop is called the while-loop.
- –
*I don’t care about this one*– An infinite loop which never ends, it is stopped by an external interrupt. For VMS and UNIX, a control-C will do the trick. In the newer languages, this type of loop is forbidden.Source.Do loop:

do [sl] [counter]=[initial count],[final count],[increment] [loop body] [sl] continue

While loop:

[sl] continue [loop body] if([condition] is true) go to [sl] [next program statement]

**Cycle and exit statements for do loops** (see this)

The `CYCLE`

and `EXIT`

statements specify that the remaining statements in the current iteration of a particular active (enclosing) `DO`

loop are to be skipped. `CYCLE`

specifies that these statements are skipped, but the `END DO`

statement that marks the end of the `DO`

loop be executed—that is, the next iteration, if any, is to be started. If the statement marking the end of the `DO`

loop is not `END DO`

—in other words, if the loop is not a block `DO`

—the `CYCLE`

statement does not execute that statement, but does start the next iteration (if any). `EXIT`

specifies that the loop specified by the `DO`

construct is terminated. The `DO`

loop affected by `CYCLE`

and `EXIT`

is the innermost enclosing `DO`

loop when the following forms are used:

CYCLE EXIT

Otherwise, the following forms specify the construct name of the pertinent `DO`

loop:

CYCLEconstruct-nameEXITconstruct-name

**Subroutines and functions: **see this

**Syntax in Fortran**

Functions: use a function if you need to do a complicated calculation that has only one result which you may or may not want to subsequently use in an expression.

answer = functionname (argumentl, argument2, . . .)

Function schematic example. Main things:

- The function definition has the same structure as that of a program, and it is done outside the main program.
- The name of the function (AVRAGE) is defined in the declaration block of the program. This is the same variable that the function returns.
- Variables used inside the function, and passed onto the function from the main program don’t require the same name in and out of the function (in the example, A, B, C, and X, Y, Z).

PROGRAM FUNDEM C Declarations for main program REAL A,B,C REAL AV, AVSQ1, AVSQ2 REAL AVRAGE C Enter the data DATA A,B,C/5.0,2.0,3.0/ C Calculate the average of the numbers AV = AVRAGE(A,B,C) AVSQ1 = AVRAGE(A,B,C) **2 AVSQ2 = AVRAGE(A**2,B**2,C**2) PRINT *,'Statistical Analysis' PRINT *,'The average of the numbers is:',AV PRINT *,'The average squared of the numbers: ',AVSQl PRINT *,'The average of the squares is: ', AVSQ2 END REAL FUNCTION AVRAGE(X,Y,Z) REAL X,Y,Z,SUM SUM = X + Y + Z AVRAGE = SUM /3.0 RETURN END

Subroutines, on the other hand, can return several results. However, calls to subroutines cannot be placed in an expression.

call subroutinename (argument1, argument2, ...)

Example of subroutines:

PROGRAM SUBDEM REAL A,B,C,SUM,SUMSQ CALL INPUT( + A,B,C) CALL CALC(A,B,C,SUM,SUMSQ) CALL OUTPUT(SUM,SUMSQ) END SUBROUTINE INPUT(X, Y, Z) REAL X,Y,Z PRINT *,'ENTER THREE NUMBERS => ' READ *,X,Y,Z RETURN END SUBROUTINE CALC(A,B,C, SUM,SUMSQ) REAL A,B,C,SUM,SUMSQ SUM = A + B + C SUMSQ = SUM **2 RETURN END SUBROUTINE OUTPUT(SUM,SUMSQ) REAL SUM, SUMSQ PRINT *,'The sum of the numbers you entered are: ',SUM PRINT *,'And the square of the sum is:',SUMSQ RETURN END

Subroutines have the same structure as programs too, with declaration and statement blocks. They’re also put outside the main program (that’s why they are “called”). Note that the name of the subroutine is not declared in the main program. The variables passed onto subroutines and/or obtained from them are not required to have the same name in the program and in the subroutine.

**Argument passing**

Fortran passes arguments by **reference**. Once the subroutine is called, the memory addresses of the arguments are put on stack. These are used as addresses for the arguments of the subroutine. All modifications are seen by the calling routine.

Arguments can be also passed by **value**. This is done in C, C++, or Java, for example. Instead of putting the memory addresses on stack, as in passing by reference, *copies of the argument values* are put on stack. These copies are used in the subroutine, and modifications are *not* seen by the calling routine.

Passing by reference:

- Good: simpler to understand, no efficiency issues with arrays.
- Bad: less data separation.

Passing by value:

- Good: allows clear separation of data.
- Bad: you don’t usually want to make copies of large arrays…

**Intent attribute for arguments**

The intent attribute can be used when variables are passed in or out functions and subroutines. Its usage is recommended for readability. It helps the compiler to detect errors, and allows it to do additional optimisations.

**intent(in)** means that the variable value can enter, but not be changed

**intent(out)** means the variable is set inside the procedure and sent back to the main program with any initial values ignored.

**intent(inout)** means that the variable comes in with a value and leaves with a value (default).

Little example from here:

subroutine square_cube(i,isquare,icube) integer, intent(in) :: i ! input integer, intent(out) :: isquare,icube ! output isquare = i**2 icube = i**3 end subroutine square_cube

**Interface blocks** (check this out for more info)

The interface block contains the functions and subroutines and comes after the main program is ended. The interface block is *implicit* in subroutine and function definitions.

Example:

PROGRAM (...) END PROGRAM Interface Function vector_add(a,b,c,n) real, intent(in) :: a(:),b(:),n real vector_add(size(a)) end function vector_add end interface

This is not *explicitly* required, except if **pointers** are passed as arguments, and in a few other situations.

*** Matrix eigenvalue problems: How does one solve small & dense eigenvalue problems? How does one solve large & sparse eigenvalue problems? (some key ideas)**

Sparse and dense matrices: from wikipedia

A

sparse matrixis a matrix populated primarily with zeros (Stoer & Bulirsch 2002, p. 619) as elements of the table. The term itself was coined by Harry M. Markowitz. If the majority of elements differ from zero, then it is common to refer to the matrix as adense matrix.When storing and manipulating sparse matrices on a computer, it is beneficial and often necessary to use specialized algorithms and data structures that take advantage of the sparse structure of the matrix. Operations using standard dense-matrix structures and algorithms are relatively slow and consume large amounts of memory when applied to large sparse matrices. Sparse data is by nature easily compressed, and this compression almost always results in significantly less computer data storage usage. Indeed, some very large sparse matrices are infeasible to manipulate using standard dense algorithms.

For small enough eigenvalue problems —**small and dense matrices**, the LAPACK solver for eigenvalue problems is efficient enough. Dense matrices correspond to strongly coupled systems.

The LAPACK subroutines used depend on the type of matrix:

See also the beginning of this site:

Eigenvalue problems have also provided a fertile ground for the development of higher performance algorithms. These algorithms generally all consist of three phases:

- Reduction of the original dense matrix to a condensed form by orthogonal transformations,
- Solution of condensed form, and
- Optional backtransformation of the solution of the condensed form to the solution of the original matrix.

- Reduction of a
symmetricmatrix to tridiagonal form to solve a symmetric eigenvalue problem: LAPACK routine xSYTRD applies a symmetric block update using the Level 3 BLAS routine xSYR2K; Level 3 BLAS account for at most half the work.- Reduction of a
rectangularmatrix to bidiagonal form to compute a singular value decomposition: LAPACK routine xGEBRD applies a block update using two calls to the Level 3 BLAS routine xGEMM; Level 3 BLAS account for at most half the work.- Reduction of a
nonsymmetricmatrix to Hessenberg form to solve a nonsymmetric eigenvalue problem: LAPACK routine xGEHRD applies a block update. Level 3 BLAS account for at most three-quarters of the work.

**Large sparse matrices** correspond to loosely coupled systems (i.e. each element of the system is coupled to only a few other elements or neighbours). These are common to problems that involve solving partial differential equations. In such cases, algorithms which work efficiently with small dense matrices, can be slow for large sparse matrices, thus special algorithms need to be used.

We want to solve an eigenvalue problem of the form:

**A** is assumed to be a Hermitian *m*m* matrix.

Computational cost of diagonalizing a dense *n*n *matrix is.

- Operation counts scale as
*kn*^{3}, 1 <*k*< 30 - Memory demands ~< 2
*n*^{2}+ 5*n*

Something that we want to avoid is computing operations using large arrays. The compiler might create a temporary copy of an already huge array! Instead, we want to perform operations using *single components* as input, if possible.

Whenever possible, we want to avoid storing large matrices **A**. Instead, we can use matrix-vector products such as σ = **Ab** and store σ and **b** (much less stuff to put in memory).

To find iterative solutions, we choose a starting guess of the eigenvalues that we want to obtain within a defined precision. Then the **residuals** (error) for approximate eigenpairs are:

We don’t need to find all eigenpairs, only the ones that interest us.

**Rayleigh quotient**: Can be defined for *any* vector **x** different than zero, as:

Then:

If **x** is an eigenvector **u*** _{i}* of

**A**, then:

Next, I introduce 3 different algorithms to deal with large sparse matrices:

**The Power Iterations**:

Read this from wikipedia:

In mathematics, the

power iterationis an eigenvalue algorithm: given a matrixA, the algorithm will produce a number λ (the eigenvalue) and a nonzero vectorv(the eigenvector), such thatAv= λv.^{ }The power iteration is a very simple algorithm. It does not compute a matrix decomposition, and hence it can be used when

Ais a very large sparse matrix. However, it will find only one eigenvalue (the one with the greatest absolute value) and it may converge only slowly.

And check the explanation in the beginning of either of these documents to understand something:

http://www.cs.cornell.edu/~bindel/class/cs6210-f09/lec26.pdf

http://www.math.usm.edu/lambers/mat610/sum10/lecture14.pdf

In sum: Power Iterations provide the dominant eigenvector (that with largest eigenvalue), getting rid of the other eigenvectors by expanding the eigenvalue problem in a power series. Such power series depends on the factor:

Which tends to zero as *k* becomes larger, for *i* != 1.

Algorithm:

For one eigenpair:

- The iterations converge linearly to the “dominant” eigenvector (or most extremal one) of
**A**(the one with the largest eigenvalue). - The convergence of this method depends on the ratios between the eigenvalues.
- Extremal eigenvalues are easier to find than inner eigenvalues.
- A good starting guess is important for convergence.
- We can increase convergence if we can apply some cheap way of transforming the eigenvalue that we want to find into an isolated extremal eigenvalue (
*shift*).

But power iterations are not very efficient or stable…

During the power iterations, a set of vectors **u**,** Au**,** A^{2}u**, … are generated. The vector space of these vectors is the

**Krylov space**. These have information about other eigenpairs than the most extremal ones. But this information is discarded during power iterations.

About the Krylov space (not necessary, but somewhat clarifying): wikipedia

**The Lanczos Methods**

Lanczos methods exploit all vectors in the Krylov space.

- Use an orthonormal basis {
**v**_{1},**v**_{2}, …,**v**_{m}} for the Krylov space (found by Gram-Schmidt orthogonalization). Working with an orthonormal basis improves numerical stability. - Project the eigenvalue problem into Krylov space to find a fast solution.

General algorithm:

In the orthonormal basis of the Krylov space, the reduced **A** matrix becomes tridiagonal.

This provides a very efficient computation for eigenvalues (~30*m*^{2} FLOPS) and eigenvectors (~3*m*^{3} FLOPS).

After solving the eigenvalue problem, the *norms of the residuals* are obtained. If these are smaller than the desired *tolerance*, the full space of eigenvectors can be obtained:

The good:

- Better convergence and more robust than power iterations.
- Little memory demand.
- Operation count per iteration only slightly larger than power iterations.

The bad:

- Gram-Schmidt is numerically unstable, is not suited for the computation of many eigenvalues, and needs a reasonably good starting vector.
- Cannot deal with degenerate eigenvalues without further modification.
- Can only find eigenvectors that have some overlap with starting vector. If the starting vector is an eigenvector, we can’t find any additional eigenvectors.

(What is Gram-Schmidt? Check this, it basically enforces orthogonalization vector by vector, and in the end it normalizes each basis vector)

**The Jacobi-Davidson Methods**

Improve the stability and convergence with respect to Lanczos. Instead of exploiting the Krylov subspace, one can also try to find better basis vectors.

In the **Block-Davidson algorithm**, we construct new basis vectors in each iteration step, ~1 per each non converged eigenvector.

A bit more detail on how it works and how we compute **t**:

We then make use of the following:

(we use an approximate matrix which is easier to solve, e.g. a diagonal matrix)

Then, we obtain **t** by solving this:

The good:

- Robust and fast convergence.
- No problem with degenerate eigenvalues.
- Very flexible

The bad:

- Higher CPU cost per iteration.

*k*= eigenpairs

*m*= vectors of reduced space - Higher storage and I/O demands.

*** Ordinary differential equations: xxx**

**how does one solve them on a computer?**

**key ideas and characteristics of basic algorithms: Euler, Velocity Verlet, Leapfrog**

**how does one solve initial value problems and how two-point boundary problems?**

*** Root searching: bisection and secant method xxx**

*** Computational “Costs”:**

Computational cost can be divided in:

- CPU time
- Main memory
- Disc space

Only CPU time is really critical for scientific computations.

**What is an operation or FLOP count?**

FLOPS = FLoating point Operations Per Second. It is a measure of the processing speed of a computer.

**How can the operation count for simple linear algebra operations be estimated?**

Another important quantity to assess the performance required from a computer is the operation count *N _{op}* of a program.

For linear algebra tasks, it can be estimated:

- “Cheap” operations are neglected. =
- Only floating point multiplications are counted. E.g. multiplication of a
*n1*x*n2*matrix and a*n2*x*n3*matrix.*N*=_{op}*n1·n2·n3*

- For other operations,
*N*must be evaluated._{op}

**What are “wall time” and “cpu time”?**

Wall time, or real time, is the time elapsed during the computation, from the start of the program to the end, as it would be measured by a watch.

CPU time: time that the machine has spent doing computations and processing instructions.

**What determines the wall time of a program?**

The wall time is determined by the CPU time, but also by the “waits” incurred due to **I/O** communication, time spend waiting for **availability** of processing power, and “**sleep** times”, where the program is temporarily interrupted.