Handling CSV files - an example

In this chapter, we’ll look at handling structured data from spreadsheets in Python. The CSV module’s main documentation is at CSV File Reading and Writing.

Example dataset

For our example we will be using aquaculture data which is freely available from Fiskeridirektoratet.

You can get a copy of the dataset also from this local link Akvakulturregisteret.csv (Source: Fiskeridirektoratet, 13.04.2020).

The dataset contains an overview of all registered aquaculture setups in Norway, showing which species they grow, where they’re located, and so on.

The CSV file format is the most common import and export format for spreadsheets and databases, and is a plain text file. Let’s open it in an editor to see what columns it contains. Spreadsheets often have the columns explained in the first few lines:

AKVAKULTURTILLATELSER PR. 13.04.2020;;;;;;;;;;;;;(Enhet i stk. oppgis i 1000) ;;;;;;;;(Enhet i stk. oppgis i 1000) ;;;
TILL_NR;ORG.NR/PERS.NR;NAVN;ADRESSE;POSTNR;POSTSTED;TILDELINGSTIDSPUNKT;TIDSBEGRENSET;TILL_KOMNR;TILL_KOM;FORMÅL;PRODUKSJONSFORM;ART;TILL_KAP;TILL_ENHET;LOK_NR;LOK_NAVN;LOK_KOMNR;LOK_KOM;LOK_PLASS;VANNMILJØ;LOK_KAP;LOK_ENHET;N_GEOWGS84;Ø_GEOWGS84

Here, it looks like the second line is what we need. The rest of the file contains the data, one line for each data point, and the entries are separated with semicolons.

CSV module

Since the data is stored as plain text, we could use our usual file reading code to read it in, and use line.split(';') to separate out the columns.

The python library also comes with a specialised module to read CSV files, that does a lot of the repetitive work for us. Take a look at the CSV File Reading and Writing section in the Python manual. Let’s try to use the simple example they give there with our data instead. Compared to the first example on the Python CSV website, I have changed the delimiter option to ; and removed the quotechar option, since it looks like no quotes are used in our CSV file.

import csv
with open('Akvakulturregisteret.csv', newline='') as csvfile:
    akvareader = csv.reader(csvfile, delimiter=';')
    for row in akvareader:
        print(row)

Trying to run this, you may see UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 230: invalid continuation byte. This indicates that the encoding of the text in the file is not in the UTF-8 format that your Python expects as standard. Given it’s a Norwegian dataset, the other likely encoding is the older iso-8859-1 format. Let’s specify that explicitly:

import csv
with open('Akvakulturregisteret.csv', newline='', encoding='iso-8859-1') as csvfile:
    akvareader = csv.reader(csvfile, delimiter=';')
    for row in akvareader:
        print(row)

This seems to work! We get a printout of all datasets.

Application examples

The last two fields in each entry are latitude and longitude. Let’s try to plot the locations on a map. First, we check if we can extract the values:

import csv

lats = []
lons = []

with open('Akvakulturregisteret.csv', newline='', encoding='iso-8859-1') as csvfile:
    akvareader = csv.reader(csvfile, delimiter=';')
    for row in akvareader:
        lat = float(row[-2]) # latitude is second last
        lon = float(row[-1]) # longitude is last
        lats.append(lat)
        lons.append(lon)

print(lats,lons)

We get stuck on the first line in the CSV-file: ValueError: could not convert string to float: ''. Right, they didn’t contain any data, just the header information. One way of skipping them is to handle this exception with try ... except. If we can’t get a valid latitude, let’s continue to the next line:

import csv

lats = []
lons = []

with open('Akvakulturregisteret.csv', newline='', encoding='iso-8859-1') as csvfile:
    akvareader = csv.reader(csvfile, delimiter=';')
    for row in akvareader:
        try:
            lat = float(row[-2]) # latitude is second last
            lon = float(row[-1]) # longitude is last
        except ValueError:
            continue
        lats.append(lat)
        lons.append(lon)

print(lats,lons)

This works and prints long lists of numbers. Let’s plot them:

import csv

lats = []
lons = []

with open('Akvakulturregisteret.csv', newline='', encoding='iso-8859-1') as csvfile:
    akvareader = csv.reader(csvfile, delimiter=';')
    for row in akvareader:
        try:
            lat = float(row[-2]) # latitude is second last
            lon = float(row[-1]) # longitude is last
        except ValueError:
            continue
        lats.append(lat)
        lons.append(lon)

try:
    import matplotlib.pyplot as plt
    plt.plot(lons,lats,'+')
    plt.show()
except (ImportError, ModuleNotFoundError) as e:
    print(f'Import of matplotlib failed: {e}')
    print(f'We have {len(lats)} latitudes and {len(lons)} longitudes')
../../_images/akvakultur.png