Based on lectures from http://github.com/jrjohansson/scientific-python-lectures
The numpy
package (module) is used in almost all numerical computation using Python. It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), performance is very good.
from numpy import *
numpy
arrays¶There are a number of ways to initialize new numpy arrays, for example from
arange
, linspace
, zeros
, ones
, etc.M = array([[1,2],[3,4]]) # from list
M.shape, M.size, M.dtype
M = array([[1,2],[3,4]], dtype=complex) # from list
print(M)
M.shape, M.size, M.dtype
x = arange(0,10,0.5) # linear mesh start:stop:increment
print(x)
y = linspace(0,10,21) # linear mesh start,stop,number of points
print(y)
z = logspace(-3,10,10) # log mesh, 10^start, 10^stop, number of points
print(z)
zeros((3,3))
ones((3,3))
from numpy import random
random.rand(5,5) # uniform distributed (5x5) matrix
random.randn(5,5) # standard normal distribution random matrix
Very common form is comma-separated values (CSV) or tab-separated values (TSV). To read data from such files into Numpy arrays we can use the numpy.loadtxt
or numpy.genfromtxt
File stockholm_td_adj.dat.txt
contains Stockholm temperature over the years. The columns are [$year$,$month$,$day$,$T_{average}$,$T_{min}$,$T_{max}$]
## Check if file exists
!tail stockholm_td_adj.dat.txt
data = loadtxt('stockholm_td_adj.dat.txt')
data.shape
# inline figures from matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
# time in years when we have year/month/day
t = data[:,0]+data[:,1]/12.+data[:,2]/365
plt.plot(t, data[:,3])
# a bit more extended in x-direction
plt.figure(figsize=(14,4))
plt.plot(t, data[:,3])
plt.title('temperature in Stockholm')
plt.xlabel('year')
plt.ylabel('temperature [C]');
Lets save the data in the form [t,$T_{average}$]
vstack((t,data[:,3])).T.shape
savetxt('StockholmT.dat', vstack((t,data[:,3])).T)
!tail StockholmT.dat
save('ST_data',data)
!ls -ltr
data2=load('ST_data.npy')
allclose(data,data2) # how close are the two sets of data
print(data[0]) # first row from the file
print(data[:,0]) # years
array(data[-3650::365,0],dtype=int) # the years with 365 spacings, and then last years
Fancy indexing Index is itself an array of integer numbers, i.e, which rows or columns?
data[[0,365,2*365,3*365],0]
Using mask to pick data*
Create a mask of [True,....False....]
values, and pick from the array only columns/rows where True
.
How to compute average temperature in the year of 1973?
# Create mask for the year 1973
mask = logical_and(data[:,0] >= 1973, data[:,0] < 1974)
data[mask,0]; # All should have 1973
T1973 = data[mask,3]
print('Average temperature in 1973=', sum(T1973)/len(T1973))
# where tells you the index where True
indices = where(mask)
X = data[indices,0]; # This gives similar data in 1973, but not identical
print(T1973.shape, X1973.shape)
print('Average temperature in 1973=', sum(X1973[0,:])/len(X1973[0,:]))
What is the mean monthly temperatures for each month of the year?
Let's do Ferbrurary first
Febr=data[:,1]==2
mean(data[Febr,3])
Now loop for all months
monthly_mean=[mean(data[data[:,1]==month,3]) for month in range(1,13)]
plt.bar(range(1,13),monthly_mean);
plt.xlabel('month')
plt.ylabel('average temperature')
It is implemented in low level fortran/C code, and is much more efficient than code written in Python.
A = random.rand(3,3)
print(A)
A*A # It is not matrix-matrix product, but element-wise product
Matrix product or matrix-vector product can be performed by dot
command
dot(A,A)
v1 = random.rand(3)
print(v1)
print( dot(A,v1) ) # matrix.vector product
print( dot(v1,v1) ) # length of vector^2
A*v1
Slightly less efficient, but nicer code can be obtained by matrix
clas
M = matrix(A)
v = matrix(v1).T # create a column vector
M*M # this is now matrix-matrix product
M*v # this is matrix*vector product
v.T*M # vector*matrix product
v.T*v # inner-product
Array/Matrix transformations
.T
or transpose(M)
transposes matrix.H
hermitian conjugateconjugate(M)
conjugatesreal(M)
and imag(M)
takes real and imaginary part of the matrixLibrary linalg
:
linalg.det(A)
linalg.inv(A)
or just M.I
linalg.eig
, linalg.eigvals
, linalg.eigh
linalg.svd()
linalg.solve()
linalg.cholesky()
from numpy import linalg
print( linalg.det(A) )
linalg.inv(A)
M.I
The eigenvalue problem for a matrix $A$:
$\displaystyle A v_n = \lambda_n v_n$
where $v_n$ is the $n$th eigenvector and $\lambda_n$ is the $n$th eigenvalue.
To calculate eigenvalues of a matrix, use the eigvals
(symmetric/hermitian eigvalsh
) and for calculating both eigenvalues and eigenvectors, use the function eig
(or eigh
):
linalg.eigvals(A)
A = array([[1,2,3], [4,5,6], [7,8,9]])
b = array([1,2,3])
x = linalg.solve(A,b)
print(x)
dot(A,x)-b
print( sum(v1) )
print( cumsum(v1) )
print( trace(A) )
print( diag(A) )
print( sum(diag(A)) )
A.shape
Ag = reshape(A, (9,1)) # this is not new data
Ag[0]=10
A # we change A when we change Ag
Ax = A.flatten() # flatten creates 1D array of all data, but creates a copy
Ax[0]=100 # changing a copy
print(A)
Every function written in Python is very slow. However numpy type operations are fast, because they are written in fortran/C
Temp = data[:,3]
Temp**2 # this is fast, written in C
array([Temp[i]**2 for i in range(len(Temp))]) # This is slow, written in Python
What if we have a function that can not simply work on arrays?
For example, theta function?
def Theta(x):
if x>=0:
return 1
else:
return 0
# Does not work on array
Theta(Temp)
We can vectorize Theta, to make it applicable to arrays.
This is simply achieved by call to numpy function vectorize
, which will create low-level routine from your function
Theta_vec = vectorize(Theta)
# This is very fast now, and creates 0 or ones
positive_temperatures=Theta_vec(Temp)
positive_temperatures
How to calculate number of days in a year with positive temperatures?
# Boolean array to select data with positive temperatures
positives = array(positive_temperatures, dtype=bool)
# keeps data with positive temperatures only
kept = data[positives,0]
# now we just need to check how many of these data are in each year
years = list(range(1800,2013,1))
hist = histogram(kept, bins=years)
plt.plot(hist[1][:-1], hist[0]);