Handling CSV files - an example =============================== In this chapter, we'll look at handling structured data from spreadsheets in Python. The CSV module's main documentation is at `CSV File Reading and Writing `_. Example dataset --------------- For our example we will be using aquaculture data which is freely available from Fiskeridirektoratet. You can get a copy of the dataset also from this local link :download:`Akvakulturregisteret.csv` (Source: Fiskeridirektoratet, 13.04.2020). The dataset contains an overview of all registered aquaculture setups in Norway, showing which species they grow, where they're located, and so on. The CSV file format is the most common import and export format for spreadsheets and databases, and is a plain text file. Let's open it in an editor to see what columns it contains. Spreadsheets often have the columns explained in the first few lines:: AKVAKULTURTILLATELSER PR. 13.04.2020;;;;;;;;;;;;;(Enhet i stk. oppgis i 1000) ;;;;;;;;(Enhet i stk. oppgis i 1000) ;;; TILL_NR;ORG.NR/PERS.NR;NAVN;ADRESSE;POSTNR;POSTSTED;TILDELINGSTIDSPUNKT;TIDSBEGRENSET;TILL_KOMNR;TILL_KOM;FORMÅL;PRODUKSJONSFORM;ART;TILL_KAP;TILL_ENHET;LOK_NR;LOK_NAVN;LOK_KOMNR;LOK_KOM;LOK_PLASS;VANNMILJØ;LOK_KAP;LOK_ENHET;N_GEOWGS84;Ø_GEOWGS84 Here, it looks like the second line is what we need. The rest of the file contains the data, one line for each data point, and the entries are separated with semicolons. CSV module ---------- Since the data is stored as plain text, we could use our usual file reading code to read it in, and use :code:`line.split(';')` to separate out the columns. The python library also comes with a specialised module to read CSV files, that does a lot of the repetitive work for us. Take a look at the `CSV File Reading and Writing `_ section in the Python manual. Let's try to use the simple example they give there with our data instead. Compared to the first example on the Python CSV website, I have changed the *delimiter* option to :code:`;` and removed the *quotechar* option, since it looks like no quotes are used in our CSV file. :: import csv with open('Akvakulturregisteret.csv', newline='') as csvfile: akvareader = csv.reader(csvfile, delimiter=';') for row in akvareader: print(row) Trying to run this, you may see :code:`UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 230: invalid continuation byte`. This indicates that the encoding of the text in the file is not in the UTF-8 format that your Python expects as standard. Given it's a Norwegian dataset, the other likely encoding is the older :code:`iso-8859-1` format. Let's specify that explicitly:: import csv with open('Akvakulturregisteret.csv', newline='', encoding='iso-8859-1') as csvfile: akvareader = csv.reader(csvfile, delimiter=';') for row in akvareader: print(row) This seems to work! We get a printout of all datasets. Application examples -------------------- The last two fields in each entry are latitude and longitude. Let's try to plot the locations on a map. First, we check if we can extract the values:: import csv lats = [] lons = [] with open('Akvakulturregisteret.csv', newline='', encoding='iso-8859-1') as csvfile: akvareader = csv.reader(csvfile, delimiter=';') for row in akvareader: lat = float(row[-2]) # latitude is second last lon = float(row[-1]) # longitude is last lats.append(lat) lons.append(lon) print(lats,lons) We get stuck on the first line in the CSV-file: :code:`ValueError: could not convert string to float: ''`. Right, they didn't contain any data, just the header information. One way of skipping them is to handle this exception with :code:`try ... except`. If we can't get a valid latitude, let's continue to the next line: .. literalinclude:: akva2.py This works and prints long lists of numbers. Let's plot them: .. literalinclude:: akva3.py .. image:: akvakultur.png