Fortran

Elementary Variable Types: What are integer, real, complex, logical, or character variables? Storage demands? Precision of floating point numbers? what is a (parameter) constant?

Memory is distributed in Bytes (sets of 8 bits). One “word” is considered usually 4 or 8 Bytes.

  • Integer: Integer numbers, -1, 0, 1, 2, … Represented exactly by converting them to the binary system.
  • Real: Floating point numbers, 1.3e-6, 4.44e4, … Takes one word of memory.
  • Complex: Complex floating point numbers, represented by two floating point numbers, (3.0e-6,1.3e4). Takes two words of memory.
  • Logical: Boolean; True or False.
  • Character: character strings, ‘this is a string’. Strings have a certain length. Enconded character by character using a table (usually the ASCII code).
  • Floating point numbers: Split into sign, exponent, and mantissa.

floating-point The accuracy of floating points depends on the number of Bytes per word:

  • 32 Bit ~ 6 decimal digits == Single Precision
  • 64 Bit ~ 15 decimal digits == Double Precision

These are declared in the declaration block as:

real(4)   x !Single precision
real(8)   y !Double precision
real(k)   z !Another precision defined by k, e.g. k=16 -> 128 Bits

The default for integers is usually 32 Bit (sufficient accuracy for -2**31 to +2**31). But it can be increased, e.g.

integer(8) m

values of the “kind parameter” depend on compiler, but most compilers provide double precision floating point numbers. Parameters are constant values not computed, and available during compilation. They are also protected from being over-written.

real(8), parameter :: pi = 3.14159265

 Arrays and allocation: Declaration? How are one- and multi-dimensional arrays stored in Fortran? Static, automatic, and dynamic allocation of arrays. 

Declaration:

real(8),allocatable :: mata(:,:)

Allocation:

allocate(mata(ndim1,ndim2),stat_ierr)
if (ierr.ne.0) then
  react to error condition
end if
  • Arrays are stored in successive memory blocks in a given order.
  • Matrices are not stored in some sort of funky matrix-shaped arrangement in the memory, but also successively as an array (see two examples below) in column major order, i.e. the matrix is stored in memory in the order column-row.
  • Index are run faster from first to last (opposite in Python).
  • Array operations are done element-wise. For instance, the product of two arrays gives simply the product of each pair of components (not the mathematical matrix multiplication!)

matrix-storage

Symmetric matrices can be packed as:

matrix-storage-sym

Q: Difference between static, automatic and dynamic allocation?

Array operations

Arrays can be efficiently operated with the BLAS (Basic Linear Algebra Subprograms). It is organized in 3 levels:

  1. Vector operations
  2. Vector-matrix operations
  3. Matrix-matrix operations

The data transfer from memory to CPU often takes longer than arithmetic operations. Therefore, each level has increasingly high performance since it requires the least amount of memory access per  arithmetic operation (Level 3 >> Level 2 > Level 1, in terms of efficiency).

About LAPACK library from Wikipedia:

LAPACK (Linear Algebra PACKage) is a software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squareseigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LUQRCholesky and Schur decomposition.

LAPACK (…) depends upon the Basic Linear Algebra Subprograms (BLAS) (…)

Assignment expressions

Variables are assigned values with “=”. A value can be a number, a logical expression, a mathematical operation…

Arithmetic assignment: variable = arithmetic expression

Operators: +, -, *, /, %, **

Concerning exponentiation: Use integer exponents when possible. DO not use floating point exponents with negative basis of integer or real type, use a complex basis.

Common intrinsic functions that can be used in arithmetic expressions: abs, sqrt, log, log10, exp, cos, sin, tan, acos, asin, atan, cosh, sinh, tanh (all these have one and only one argument).

Control structures in Fortran:

If block and logical if

IF (logical-expression-1) THEN
   statements-1
ELSE IF (logical-expression-2) THEN
   statements-2
ELSE IF (logical-expression-3) THEN
   statement-3
ELSE IF (.....) THEN
   ...........
ELSE
   statements-ELSE
END IF

Select case construct

SELECT CASE (selector)
   CASE (label-list-1)
      statements-1
   CASE (label-list-2)
      statements-2
   CASE (label-list-3)
      statements-3
     .............
   CASE (label-list-n)
      statements-n
   CASE DEFAULT
      statements-DEFAULT
END SELECT

Do loops with and without counter

In fortran, there are at least 3 types of loops:

  1. A counter-loop called the “DO” loop which performs the loop for a specified number of times based on an integer counter.
  2. A loop which repeats itself “while” a certain condition remains true. In Pascal and C, this type of loop is called the while-loop.
  3. I don’t care about this one– An infinite loop which never ends, it is stopped by an external interrupt. For VMS and UNIX, a control-C will do the trick. In the newer languages, this type of loop is forbidden.Source.Do loop:
        do [sl] [counter]=[initial count],[final count],[increment]
           [loop body]
[sl]    continue

While loop:

[sl]    continue
           [loop body]
        if([condition] is true) go to [sl]
        [next program statement]

Cycle and exit statements for do loops (see this)

The CYCLE and EXIT statements specify that the remaining statements in the current iteration of a particular active (enclosing) DO loop are to be skipped. CYCLE specifies that these statements are skipped, but the END DO statement that marks the end of the DO loop be executed—that is, the next iteration, if any, is to be started. If the statement marking the end of the DO loop is not END DO—in other words, if the loop is not a block DO—the CYCLE statement does not execute that statement, but does start the next iteration (if any). EXIT specifies that the loop specified by the DO construct is terminated. The DO loop affected by CYCLE and EXIT is the innermost enclosing DO loop when the following forms are used:

     CYCLE
     EXIT

Otherwise, the following forms specify the construct name of the pertinent DO loop:

     CYCLE construct-name
     EXIT construct-name

Subroutines and functions: see this

Syntax in Fortran

Functions: use a function if you need to do a complicated calculation that has only one result which you may or may not want to subsequently use in an expression.

answer = functionname (argumentl, argument2, . . .)

Function schematic example. Main things:

  • The function definition has the same structure as that of a program, and it is done outside the main program.
  • The name of the function (AVRAGE) is defined in the declaration block of the program. This is the same variable that the function returns.
  • Variables used inside the function, and passed onto the function from the main program don’t require the same name in and out of the function (in the example, A, B, C, and X, Y, Z).
      PROGRAM FUNDEM
C     Declarations for main program
      REAL A,B,C
      REAL AV, AVSQ1, AVSQ2
      REAL AVRAGE
C     Enter the data
      DATA A,B,C/5.0,2.0,3.0/

C     Calculate the average of the numbers
      AV = AVRAGE(A,B,C)
      AVSQ1 = AVRAGE(A,B,C) **2
   	  AVSQ2 = AVRAGE(A**2,B**2,C**2)

	  PRINT *,'Statistical Analysis'
      PRINT *,'The average of the numbers is:',AV
      PRINT *,'The average squared of the numbers: ',AVSQl
      PRINT *,'The average of the squares is: ', AVSQ2
      END

     REAL FUNCTION AVRAGE(X,Y,Z)
     REAL X,Y,Z,SUM
     SUM = X + Y + Z
     AVRAGE = SUM /3.0
     RETURN
     END

Subroutines, on the other hand, can return several results. However, calls to subroutines cannot be placed in an expression.

call subroutinename (argument1, argument2, ...)

Example of subroutines:

	PROGRAM SUBDEM 
        REAL A,B,C,SUM,SUMSQ 
        CALL INPUT( + A,B,C)
   	CALL CALC(A,B,C,SUM,SUMSQ)
   	CALL OUTPUT(SUM,SUMSQ)
   	END

   	SUBROUTINE INPUT(X, Y, Z)
   	REAL X,Y,Z
   	PRINT *,'ENTER THREE NUMBERS => '
 	READ *,X,Y,Z
 	RETURN
 	END

 	SUBROUTINE CALC(A,B,C, SUM,SUMSQ)
   	REAL A,B,C,SUM,SUMSQ
   	SUM = A + B + C
   	SUMSQ = SUM **2
   	RETURN
   	END

   	SUBROUTINE OUTPUT(SUM,SUMSQ)
   	REAL SUM, SUMSQ
   	PRINT *,'The sum of the numbers you entered are: ',SUM
   	PRINT *,'And the square of the sum is:',SUMSQ
   	RETURN
   	END

Subroutines have the same structure as programs too, with declaration and statement blocks. They’re also put outside the main program (that’s why they are “called”). Note that the name of the subroutine is not declared in the main program. The variables passed onto subroutines and/or obtained from them are not required to have the same name in the program and in the subroutine.

Argument passing

Fortran passes arguments by reference. Once the subroutine is called, the memory addresses of the arguments are put on stack. These are used as addresses for the arguments of the subroutine. All modifications are seen by the calling routine.

Arguments can be also passed by value. This is done in C, C++, or Java, for example. Instead of putting the memory addresses on stack, as in passing by reference, copies of the argument values are put on stack. These copies are used in the subroutine, and modifications are not seen by the calling routine.

Passing by reference:

  • Good: simpler to understand, no efficiency issues with arrays.
  • Bad: less data separation.

Passing by value:

  • Good: allows clear separation of data.
  • Bad: you don’t usually want to make copies of large arrays…

Intent attribute for arguments

The intent attribute can be used when variables are passed in or out functions and subroutines. Its usage is recommended for readability. It helps the compiler to detect errors, and allows it to do additional optimisations.

intent(in) means that the variable value can enter, but not be changed

intent(out) means the variable is set inside the procedure and sent back to the main program with any initial values ignored.

intent(inout) means that the variable comes in with a value and leaves with a value (default).

Little example from here:

subroutine square_cube(i,isquare,icube)
  integer, intent(in)  :: i             ! input
  integer, intent(out) :: isquare,icube ! output
  isquare = i**2
  icube   = i**3
end subroutine square_cube

Interface blocks (check this out for more info)

The interface block contains the functions and subroutines and comes after the main program is ended. The interface block is implicit in subroutine and function definitions.

Example:

		
                PROGRAM
                (...)
                END PROGRAM

                Interface
		  Function vector_add(a,b,c,n)
		   real, intent(in) :: a(:),b(:),n
		   real vector_add(size(a))
		   end function vector_add
		end interface

This is not explicitly required, except if pointers are passed as arguments, and in a few other situations.

* Matrix eigenvalue problems: How does one solve small & dense eigenvalue problems? How does one solve large & sparse eigenvalue problems? (some key ideas)

Sparse and dense matrices: from wikipedia

sparse matrix is a matrix populated primarily with zeros (Stoer & Bulirsch 2002, p. 619) as elements of the table. The term itself was coined by Harry M. Markowitz. If the majority of elements differ from zero, then it is common to refer to the matrix as a dense matrix.

When storing and manipulating sparse matrices on a computer, it is beneficial and often necessary to use specialized algorithms and data structures that take advantage of the sparse structure of the matrix. Operations using standard dense-matrix structures and algorithms are relatively slow and consume large amounts of memory when applied to large sparse matrices. Sparse data is by nature easily compressed, and this compression almost always results in significantly less computer data storage usage. Indeed, some very large sparse matrices are infeasible to manipulate using standard dense algorithms.

For small enough eigenvalue problems —small and dense matrices, the LAPACK solver for eigenvalue problems is efficient enough. Dense matrices correspond to strongly coupled systems.

The LAPACK subroutines used depend on the type of matrix:

lapack-matrix

See also the beginning of this site:

Eigenvalue problems have also provided a fertile ground for the development of higher performance algorithms. These algorithms generally all consist of three phases:

  1. Reduction of the original dense matrix to a condensed form by orthogonal transformations,
  2. Solution of condensed form, and
  3. Optional backtransformation of the solution of the condensed form to the solution of the original matrix.
  • Reduction of a symmetric matrix to tridiagonal form to solve a symmetric eigenvalue problem: LAPACK routine xSYTRD applies a symmetric block update using the Level 3 BLAS routine xSYR2K; Level 3 BLAS account for at most half the work.
  • Reduction of a rectangular matrix to bidiagonal form to compute a singular value decomposition: LAPACK routine xGEBRD applies a block update using two calls to the Level 3 BLAS routine xGEMM; Level 3 BLAS account for at most half the work.
  • Reduction of a nonsymmetric matrix to Hessenberg form to solve a nonsymmetric eigenvalue problem: LAPACK routine xGEHRD applies a block update. Level 3 BLAS account for at most three-quarters of the work.

Large sparse matrices correspond to loosely coupled systems (i.e. each element of the system is coupled to only a few other elements or neighbours). These are common to problems that involve solving partial differential equations. In such cases, algorithms which work efficiently with small dense matrices, can be slow for large sparse matrices, thus special algorithms need to be used.

We want to solve an eigenvalue problem of the form:

eval-prob

A is assumed to be a Hermitian m*m matrix.

Computational cost of diagonalizing a dense n*n matrix is.

  • Operation counts scale as kn3, 1 < k < 30
  • Memory demands ~< 2n2 + 5n

Something that we want to avoid is computing operations using large arrays. The compiler might create a temporary copy of an already huge array! Instead, we want to perform operations using single components as input, if possible.

Whenever possible, we want to avoid storing large matrices A. Instead, we can use matrix-vector products such as σ = Ab and store σ and b (much less stuff to put in memory).

To find iterative solutions, we choose a starting guess of the eigenvalues that we want to obtain within a defined precision. Then the residuals (error) for approximate eigenpairs are:

error-eigenpairs

We don’t need to find all eigenpairs, only the ones that interest us.

Rayleigh quotient: Can be defined for any vector x different than zero, as:

rayleigh-quotient

Then:

eval-rayleigh-gt-lt

If x is an eigenvector ui of A, then:

evail-eq-rayleigh

Next, I introduce 3 different algorithms to deal with large sparse matrices:

The Power Iterations:

Read this from wikipedia:

In mathematics, the power iteration is an eigenvalue algorithm: given a matrix A, the algorithm will produce a number λ (the eigenvalue) and a nonzero vector v (the eigenvector), such that Av = λv.

The power iteration is a very simple algorithm. It does not compute a matrix decomposition, and hence it can be used when A is a very large sparse matrix. However, it will find only one eigenvalue (the one with the greatest absolute value) and it may converge only slowly.

And check the explanation in the beginning of either of these documents to understand something:
http://www.cs.cornell.edu/~bindel/class/cs6210-f09/lec26.pdf
http://www.math.usm.edu/lambers/mat610/sum10/lecture14.pdf

In sum: Power Iterations provide the dominant eigenvector (that with largest eigenvalue), getting rid of the other eigenvectors by expanding the eigenvalue problem in a power series. Such power series depends on the factor:

power-iteration-lambda

Which tends to zero as k becomes larger, for i != 1.

Algorithm:

For one eigenpair:

power-series

power-series-general

  • The iterations converge linearly to the “dominant” eigenvector (or most extremal one) of A (the one with the largest eigenvalue).
  • The convergence of this method depends on the ratios between the eigenvalues.
  • Extremal eigenvalues are easier to find than inner eigenvalues.
  • A good starting guess is important for convergence.
  • We can increase convergence if we can apply some cheap way of transforming the eigenvalue that we want to find into an isolated extremal eigenvalue (shift).

But power iterations are not very efficient or stable…

During the power iterations, a set of vectors u, Au, A2u, … are generated. The vector space of these vectors is the Krylov space. These have information about other eigenpairs than the most extremal ones. But this information is discarded during power iterations.

About the Krylov space (not necessary, but somewhat clarifying): wikipedia

The Lanczos Methods

Lanczos methods exploit all vectors in the Krylov space.

  • Use an orthonormal basis {v1, v2, …, vm} for the Krylov space (found by Gram-Schmidt orthogonalization). Working with an orthonormal basis improves numerical stability.
  • Project the eigenvalue problem into Krylov space to find a fast solution.

General algorithm:

lanczos

In the orthonormal basis of the Krylov space, the reduced A matrix becomes tridiagonal.

tridiagonal-matrix

This provides a very efficient computation for eigenvalues (~30m2 FLOPS) and eigenvectors (~3m3 FLOPS).

After solving the eigenvalue problem, the norms of the residuals are obtained. If these are smaller than the desired tolerance, the full space of eigenvectors can be obtained:

full-ev-space

The good:

  • Better convergence and more robust than power iterations.
  • Little memory demand.
  • Operation count per iteration only slightly larger than power iterations.

The bad:

  • Gram-Schmidt is numerically unstable, is not suited for the computation of many eigenvalues, and needs a reasonably good starting vector.
  • Cannot deal with degenerate eigenvalues without further modification.
  • Can only find eigenvectors that have some overlap with starting vector. If the starting vector is an eigenvector, we can’t find any additional eigenvectors.

(What is Gram-Schmidt? Check this, it basically enforces orthogonalization vector by vector, and in the end it normalizes each basis vector)

The Jacobi-Davidson Methods

Improve the stability and convergence with respect to Lanczos. Instead of exploiting the Krylov subspace, one can also try to find better basis vectors.

In the Block-Davidson algorithm, we construct new basis vectors in each iteration step, ~1 per each non converged eigenvector.

davidson

A bit more detail on how it works and how we compute t:

davidson-1davidson-2

We then make use of the following:

davidson-3
davidson-4
davidson-5 (we use an approximate matrix which is easier to solve, e.g. a diagonal matrix)

Then, we obtain t by solving this:

davidson-6

The good:

  • Robust and fast convergence.
  • No problem with degenerate eigenvalues.
  • Very flexible

The bad:

  • Higher CPU cost per iteration.
    cpu-cost
    k = eigenpairs
    m = vectors of reduced space
  • Higher storage and I/O demands.

* Ordinary differential equations: xxx

how does one solve them on a computer?

key ideas and characteristics of basic algorithms: Euler, Velocity Verlet, Leapfrog

how does one solve initial value problems and how two-point boundary problems?

* Root searching: bisection and secant method xxx

* Computational “Costs”:

Computational cost can be divided in:

  • CPU time
  • Main memory
  • Disc space

Only CPU time is really critical for scientific computations.

What is an operation or FLOP count?

FLOPS = FLoating point Operations Per Second. It is a measure of the processing speed of a computer.

How can the operation count for simple linear algebra operations be estimated?

Another important quantity to assess the performance required from a computer is the operation count Nop of a program.

For linear algebra tasks, it can be estimated:

  • “Cheap” operations are neglected. =
  • Only floating point multiplications are counted. E.g. multiplication of a n1 x n2 matrix and a n2 x n3 matrix. Nop = n1·n2·n3
  • For other operations, Nop must be evaluated.

What are “wall time” and “cpu time”?

Wall time, or real time, is the time elapsed during the computation, from the start of the program to the end, as it would be measured by a watch.

CPU time: time that the machine has spent doing computations and processing instructions.

What determines the wall time of a program?

The wall time is determined by the CPU time, but also by the “waits” incurred due to I/O communication, time spend waiting for availability of processing power, and “sleep times”, where the program is temporarily interrupted.

This entry was posted in Uncategorized and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s