Yesterday I was asked to do an emergency job. We had a file with sea level data (many, actually), where each page looked like this:

Mth Year  Gaps     Good     Minimum    Maximum     Mean     St Devn
  1 1897   218      526      0.120      1.040      0.542     0.196
  2 1897     0      672      0.060      1.100      0.524     0.190
  3 1897     0      744      0.120      1.010      0.557     0.171
  4 1897     0      720      0.300      1.070      0.655     0.161
  5 1897     0      744      0.240      1.280      0.648     0.198
  6 1897     0      720      0.180      1.580      0.729     0.215
  7 1897    22      722      0.300      1.370      0.641     0.215
  8 1897     0      744      0.060      0.910      0.518     0.154
  9 1897     0      720      0.060      1.010      0.547     0.174
 10 1897     0      744      0.030      0.850      0.465     0.160
 11 1897     0      720     -0.060      0.980      0.499     0.198
 12 1897     0      744      0.030      1.010      0.498     0.196
  1 1898     0      744      0.030      1.130      0.527     0.218
  2 1898     1      671      0.180      1.160      0.664     0.190
  3 1898   742        2      0.370      0.430      0.400     0.030
  4 1898     0      720      0.090      1.220      0.620     0.202
  5 1898     0      744      0.430      1.340      0.827     0.194
  6 1898     0      720      0.340      1.710      0.781     0.221
  7 1898     0      744      0.240      1.650      0.736     0.232
  8 1898     0      744      0.150      1.190      0.622     0.188
  9 1898     0      720      0.270      1.220      0.569     0.156
 10 1898     0      744      0.240      1.070      0.628     0.167
 11 1898     0      720      0.120      1.010      0.534     0.185
 12 1898     0      744      0.090      0.980      0.541     0.192

http://www.bom.gov.au/ntc/IDO70000/IDO70000_62230_SLD.txt

What we wanted was a mean for each year.

Here is the program:

import sys

"""
Calculate mean sea level for a number of years...
"""

data = {}

for line in open(sys.argv[1]):
    line = line.strip()
    fields = line.split()

    if len(fields) == 8:
        year = fields[1]
        if not year in data:
            data[year] = []
        data[year].append(float(fields[6]))

for year in sorted(data):
    print year, sum(data[year]) / len(data[year]), len(data[year])

How does this work? First we declare data to be a dictionary. This dictionary will be indexed on year, so data will look something like this after we have read the first few lines:

data = { 1897 : [0.542, 0.524, ...0.498], 1898: [0.527...]}

This is all done in the first for loop. The second for loop then iterates over this loop