Python

* Basics: – Difference between Fortran and Python? (compile step, weak vs. strong typed language)

Whereas Fortran is a compiled language, Python is an interpreted language that doesn’t require compilation. Python programs can be tested faster, without need of compiling after each change in the code. It also means that for computationally demanding programs, Fortran will be executed faster. One can take advantage of it by performing the number crunching with Fortran and using Python as a glue.

The second major difference is that whereas Fortran is a strongly typed language (meaning that variable types must be explicitly declared), Python is a weakly typed language, that is, variable types are implicitly assigned by Python, which usually implies less work from the programmer’s side.

– Basic datatypes (integer division!) and type conversion rules Everything, every kind of datatype, in python is called an object.

Integer, float, character, and boolean (True or False). Think of them as made by single elements. Beware of integer division:

>>> 3 / 2
1

To specify that a number is of float type, simply write it as 3. (adding a point):

>>> 3./2
1.5

Floating datatypes have priority over integers in arithmetic operations. This means that if I add for example an integer and a float, the result will be a float.

– Compound datatypes (mutable and immutable): e.g. tuple, string, list, dictionary

Strings: “…”, tuples (..,..), lists […,…], dictionaries { key: value, key:value}. Think on compound data types as lists of objects. Compound datatypes can be acted upon element-wise, and each element has its own corresponding index. Strings can be though of as lists of characters. Tuples are immutable, their elements cannot change once written. Example:

>>> a=(1,2,3)
>>> a[2]
3
>>> a[2] = 4
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

Note: strings behave like tuples, they’re immutable.

Dictionaries are similar to lists, except that the key (think of it as the index) is also used defined.

– Basic usage of compound datatypes (to be used for what?) Compound datatypes are a way to store collections of objects.

– Variables are not really “variables” but “name-tags” on objects Name-tags point to a specific amount of data in memory. Example:

>>> s = 'hello'
>>> l = s
>>> l
'hello'
>>> s = 'hey'
>>> l
'hello'

Up there, we don’t create two ‘hello’s, there is one ‘hello’ and two name-tags pointing to it (s and l). If we point the name-tag s somewhere else (e.g. ‘hey’), absolutely nothing happens to ‘hello’ and the other name-tags pointing to it.

– What are “side effects”? how to avoid them (if not wanted)? An example for a side effect:

>>> s = [1,2,3]
>>> l = [4,5,s]
>>> l
[4, 5, [1, 2, 3]]
>>> s[1]
2
>>> s[1] = 17
>>> l
[4, 5, [1, 17, 3]]

The logic is:

  1. We create a list s.
  2. We create a list l which points to s.
  3. We modify one object in s.
  4. We see this change also in l, since l is simply pointing at s.

To avoid side effects, an explicit copy of the list can be made by using:

listcopy = listname.copy()

– indexing and slicing

Sequences (e.g. strings, tuples, lists) can be indexed with o[index]. Dictionaries can be indexed by keys. Indexation begins at zero. Slicing takes slices of a sequence. Pay attention to the indexation, and to where the slice is being done. The syntax is: o[first:last] Examples:

>>> a = [0,1,2,3,4,5]
>>> a[:]
[0, 1, 2, 3, 4, 5]
>>> a[0:1] # Think of the slicing index as the commas, rather tha
# the elements. It indicates where we slice.
[0]
>>> a[0:-1]
[0, 1, 2, 3, 4]
>>> a[1:-1]
[1, 2, 3, 4]
>>> a[1:2]
[1]
>>> a[1:3]
[1, 2]

– Use of operators like “+” or “*” for lists or strings

“+” joins strings and lists. Example:

>>> 'hello' + 'world'
'helloworld'

“*” appends one string or list to itself an integer number of times. Example:

>>> 'hello'*3
'hellohellohello'
>>> a = [1,2,3]
>>> a*3
[1, 2, 3, 1, 2, 3, 1, 2, 3]

– I/O and string operations

Whatever is read from a file is a string. For reading and writing from/to files, we use the methods open, write, read, close, readline:

f = open("filename","x") # x is the option, it can be: w (write),                          # rt (read & write), r (read only),
                         # a (append)
f.write("text") # "text" is a string. To write a new line, we use "\n"
f.read()
f.close() # Always close a file once it is not needed
f.readline() # Reads line by line

Extras:

f.closed # Returns a boolean, True for file closed, elsewise False
#
with open("file","w") as name:
  name.write("text") # Automatically closes the file
                     #after the with statement.

* Control Flow:

Examples with each control block. Note the general syntax:

Control statement [condition or iteration]: # Note the colon
  Statement # Note the indentation
Additional control statements: # Again the colon
  Statement # Indentation of whatever happens inside the block
Statement # No indentation indicates that we are outside the 
          # control block

Important: The indentation must be consistant. Choose a fixed number of spaces to indent in each python program. Control blocks inside other control blocks are further indentated:

Control block 1:
  Statement
  Control block 2:
    Statement
  Statement outside CB 2 but inside CB 1
Statement outside CB 1

– if/elif/else

In [3]: if a == 1:
   ...:   print(1)
   ...: elif a == 2:
   ...:   print(2)
   ...: else:
   ...:   print(’A lot’)

– while. I don’t think that the break is required… But can be used to make sure that the program gets out of the control block under some conditions.

while True:
    n = raw_input("Please enter 'hello':")
    if n.strip() == 'hello':
        break

– for

>>> for w in words:
...     print w, len(w)

– How to write the code? What is special about Python?

See the comment above about indentation.

* Writing Code:

– what is a “namespace”?

Namespaces are spaces containing a bunch of names (or variables). Different namespaces would be the namespace of the program, the namespace of a function inside the program, the namespace inside a module.

Each object has its own namespace.

Inside a function, the namespace is local to the function. The namespace of the main program is another one, and is called global. Do not be confused by that: The global namespace is NOT a “shared” namespace available to all objects (as it is in Fortran) Global is just the namespace corresponding to the program. If inside a function we want to use a nametag from the global namespace, we need to use the syntax “global name-tag“. Example:

a = 5
def add2():
 global a # Here we do it 
 return a + 3
a = add2()
print a
>>> 8

A name inside a module can only be used in the main program if it is imported. Example:

>>> pi # Name "pi" does not exist
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
NameError: name 'pi' is not defined
>>> import math # One way to import it
>>> math.pi
3.141592653589793
>>> from math import pi # Another way
>>> pi
3.141592653589793

So a namespace is a way to understand the scope of a name (= variable). A good way to think about namespaces is as boxes where you dump in names. One box cannot have two equal names. If we put a name into a box where there is one name called the same, the new name replaces the old one. In particular, importing names from modules should be done with some care, to make sure that something important is not being lost.

More about namespaces and all that, here.

– what is a module and how to use it?

A module is a file separate of the main program which contains objects. These objects can be easily made available to the main program by importing (see above).

– defining subroutines/functions

Example:

def square( a ):
   # This function returns the square of the argument
   a = a**2
   return a

Note the def keyword needed, the syntax similar to a block statement, and the return command, by which we set the output that is returned to the main program when the function is executed.

– basics of object orientation:

Every object has a class, a method (function of the object), and attributes. Classes can be predefined, or user defined.

xxx add scheme of namespace and “objectspace”

how to construct a class?

Example of constructing a new class called NewClass (I know…):

class NewClass(object):
  def __init__(self):
    self.method = method

This is kinda the minimum possible class (not exactly, but almost. See this for details).

Note the “object” object. This defines the type in the “object tree”.  The object type is the default for a new class. The self.method is an attribute of the class. It is simply a nametag stored inside the namespace of this class. Note also the syntax similar to a function.

After defining a class, we can access any object (nametags, methods, functions, subclasses…) inside this class.

why is it useful? (what is “self”? why is __init__ needed?)

New classes allow to create, hold, and access user defined collections of data in an organised way. It also allows to define ways to operate with this collection of data, and manipulate it.

The self is a way of calling the object of this class, within the class. A class must be defined without any knowledge about the nametags of the objects that the user or the programmer will want to create when they are needed. The self simply represents the nametag for this unknown object. Every method inside the class must have at least one argument, and this is the self argument.

The __init__ (double underscore) method is the way that classes are initialised in python.

* Using Numpy – what is numpy good for? how to use it?

Numpy has built-in methods specifically designed for working with multidimensional arrays and performing fast element-to-element operations. This makes numpy very good for avoiding loops over lists of objects and speeds up execution.

Numpy also has methods to compute vector-vector, vector-matrix, and matrix-matrix products:

np.dot(a,b) # Scalar product

To use numpy, download it and import it to python as a module:

import numpy as np # "np" is an arbitrary shorthand name

bla

– basic objects: arrays (important methods, attributes; how to generate them)

Array generation. Examples:

a = np.array([[1,2,3],[4,5,6]]) # Note () for argument delimitor
# [] for outer list, and [] for inner lists.
a = np.zeros([3,3],dtype="float64") # 3 x 3 array filled with 0
# Double precision
a = np.arange(1000) # 1-D Array with values from 1 to 1000
############################
# Next some general examples:
np.arange(start,end,step)
np.ones((ndim,...)) # The argument is a tuple
np.zeros((ndim,...))
np.empty((n,...)) # Does not initialise the array
np.eye(n) # n x n unit matrix (diagonal = 1, non-diag. = 0)
np.linspace(start,stop,num_elements) # Equally spaced points
np.diag(np.array([1,2,...])) # Diagonal matrix with diagonal =
# 1,2,...
np.random.rand(4)
np.random.random_integers(0,20,15) # 15 random int from 0 to 20

Main methods for arrays:

a.dtype # Datatype of the array
a.shape # Size of the array
len(a.shape) # Number of dimensions
a.ndim # Also number of dimensions
np.ravel() # "Flattens" array
np.reshape((2,3)) # Changes the shape (opposite to ravel)
np.sort(a,axis=1)
a[:,np.newaxis] # Enforces the creation of a new dimension

The main attribute of an array is the type: e.g. “int64”, “int32”, “float64”. The syntax is:

a = array(content or size, dtype="datatype")

– indexing and slicing

Indexing and slicing works same as in regular python. Indexing begins with the element number 0.

Slicing works also same as regular python. A range can be indicated by a colon (e.g. a[1:3]), and a single colon indicates the whole range (see example below).

a = np.array([[1,2,3],[4,5,6]])
a[0,0]
>> 1
a[0,:] # Print only first row.
>> [[1,2,3]]

See that the ordering of rows and columns goes as:

a[row_number,column_number]

In numpy, the last index changes quickest.

– what is “broadcasting”? how do numerical operations work for arrays (and how to exploit this)?

Broadcasting can take place when operating with two arrays of different size. The smallest array will try to operate the largest one as much as it can. The simplest example would be the addition of the number 1 and any array “a“. The number 1 can be thought of as an array of dimension 1 and rank 1. When it operates, it operates on each element of a, thus it is the same as adding to an array of the same size as but made up of ones everywhere.

The same concept would happen to a small array which “fits” a larger array.

xxx Add some visual scheme of broadcasting

Conditions for array operation. If one these is not fulfilled, there will be an error:

  • Arrays have equal size, or
  • Arrays have different size, and for each pair of differing dimensions, one has always size one.

The trick for exploiting this is manipulating the arrays, adding and/or shifting dimensions so that the second condition is fulfilled.

– how to use numpy implicit loops to speed up things

Simple case: we have two vectors, and we want to multiply them element by element. Instead of constructing a loop that goes through each elements and performs the operation, we simply construct the two arrays and multiply them. This speeds up the program by avoiding a loop.

* Example Cases (relevant):

– compute a COM for a molecule

xxx

– enforce periodic boundary conditions

xxx

– use numpy for numerics in non time critical parts of simulation codes

xxx

* Advanced Python/Numpy usage (no details, only the basic concepts)

– what is scipy, usage for fitting or optimization

Scipy is an open source library of algorithms and mathematical tools for the Python programming language. It adds some fortran routines to python. The basic data structure in scipy is the array provided by numpy.

In fact, numpy is a subset of scipy.

xxx map

– parallelization with mpi4py

Parallelization can be used in python by importing modules prepared for this:

from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

Parallelization is implemented in a quite automatized way, so that the used does not need to think much about it. The only detail is that objects that will be shared amongst the processors have to be broadcasted. Example:

if size > 1: # "If nproc > 1, then broadcast self.xyz"
 self.xyz = comm.bcast(self.xyz)

For using mpi in a computer, one must make sure that the openmpi libraries are loaded, e.g.:

module add openmpi-gcc/1.3.3

Parallel python can be run like this:

mpirun -np 2 python program.py

– wrapping of F90 code into Python using f2py

F2py calls fortran from within python. We need numpy for that since arrays are returned as numpy arrays. The program takes care of everything internally, and the user does not have to think about it.

It is probably more efficient to combine python and fortran than to compute everything in fortran. Fortran is better used to perform computationally demanding calculations, while python is by far more flexible for configuration, settings control, postprocessing…

Advertisements
This entry was posted in Uncategorized and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s