Processing sea level data with Python
Yesterday I was asked to do an emergency job. We had a file with sea level data (many, actually), where each page looked like this:
Mth Year Gaps Good Minimum Maximum Mean St Devn
1 1897 218 526 0.120 1.040 0.542 0.196
2 1897 0 672 0.060 1.100 0.524 0.190
3 1897 0 744 0.120 1.010 0.557 0.171
4 1897 0 720 0.300 1.070 0.655 0.161
5 1897 0 744 0.240 1.280 0.648 0.198
6 1897 0 720 0.180 1.580 0.729 0.215
7 1897 22 722 0.300 1.370 0.641 0.215
8 1897 0 744 0.060 0.910 0.518 0.154
9 1897 0 720 0.060 1.010 0.547 0.174
10 1897 0 744 0.030 0.850 0.465 0.160
11 1897 0 720 -0.060 0.980 0.499 0.198
12 1897 0 744 0.030 1.010 0.498 0.196
1 1898 0 744 0.030 1.130 0.527 0.218
2 1898 1 671 0.180 1.160 0.664 0.190
3 1898 742 2 0.370 0.430 0.400 0.030
4 1898 0 720 0.090 1.220 0.620 0.202
5 1898 0 744 0.430 1.340 0.827 0.194
6 1898 0 720 0.340 1.710 0.781 0.221
7 1898 0 744 0.240 1.650 0.736 0.232
8 1898 0 744 0.150 1.190 0.622 0.188
9 1898 0 720 0.270 1.220 0.569 0.156
10 1898 0 744 0.240 1.070 0.628 0.167
11 1898 0 720 0.120 1.010 0.534 0.185
12 1898 0 744 0.090 0.980 0.541 0.192
http://www.bom.gov.au/ntc/IDO70000/IDO70000_62230_SLD.txt
What we wanted was a mean for each year.
Here is the program:
import sys
"""
Calculate mean sea level for a number of years...
"""
data = {}
for line in open(sys.argv[1]):
line = line.strip()
fields = line.split()
if len(fields) == 8:
year = fields[1]
if not year in data:
data[year] = []
data[year].append(float(fields[6]))
for year in sorted(data):
print year, sum(data[year]) / len(data[year]), len(data[year])
How does this work? First we declare data
to be a dictionary. This
dictionary will be indexed on year, so data
will look something like
this after we have read the first few lines:
data = { 1897 : [0.542, 0.524, ...0.498], 1898: [0.527...]}
This is all done in the first for
loop. The second for
loop then
iterates over this loop