
This is my first blog post of the series “Bits of python”, a journey into the Python programming language done with simple examples. From time to time I’ll write about Python and Django and how they interact with Unix and Linux systems. All the scripts available are stripped versions of my daily work, mostly file parsers, databases administration and reporting tools. This first example is about how to import a CSV (Comma-separated values) files into a MySQL database and can be very handy when you work with spreadsheets and databases.
The python modules used in this example are, the CSV and the python-mysqldb, the first is already present into python and for the second one you can install it just with apt-get install python-mysqldb.
The sample code consist in three files, presidents.csv, presidents.sql and importCSV.py, can be downloaded from here, once downloaded the archive can be unpacked with tar xvfz csvintomysql.tar.gz . Now it’s time to take a look to our file, let’s go into csvintomysql directory, cd csvintomysql.
The presidents.csv file is the list of all the presidents of the United States of America and it look like this (the list goes more long):
Presidency , President, Took office, Left office, Party, Home State
1,George Washington,1789-04-30,1797-03-04,Independent ,Virginia
2,John Adams,1797-03-04,1801-03-04,Federalist ,Massachusetts
3,Thomas Jefferson,1801-03-04,1809-03-04,Democratic-Republican ,Virginia
4,James Madison,1809-03-04,1817-03-04,Democratic-Republican ,Virginia
5,James Monroe,1817-03-04,1825-03-04,Democratic-Republican ,Virginia
As you can see there are several fields delimited by a comma that need to be exported and inserted into the Mysql database using the python CSV module.
The presidents.sql file is the series of MySQL instruction to create a database called csvdb and a table called presidents. In order to keep simple the creation of the database you need to log into MySQL as root and from the database prompt type:
mysql> source presidents.sql
Query OK, 1 row affected (0.00 sec)
Database changed
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.02 sec)
The importCSV.py is the Python script that parse and insert the CSV data into the database, the connection credential are already configured to connect to the database csvdb with csv as user and csv as password. Please note, if you want to modify the credential you need to change the settings in the first place, precisley in the presidents.sql file. Do not use this script in production environment!!!!! till you changed the Mysql login credential for the csv user.
#!/usr/bin/python
#simple python csv parser that export Comma
#Separated Value data into a MySQL database.
#phillip@bailey.st
import csv
import MySQLdb
# open the connection to the MySQL server.
# using MySQLdb
mydb = MySQLdb.connect(host='localhost',
user='csv',
passwd='csv',
db='csvdb')
cursor = mydb.cursor()
# read the presidents.csv file using the python
# csv module http://docs.python.org/library/csv.html
csv_data = csv.reader(file('presidents.csv'))
# execute the for clicle and insert the csv into the
# database.
for row in csv_data:
cursor.execute('INSERT INTO PRESIDENTS(PRESIDENCY ,PRESIDENT \
,TOOKOFFICE ,LEFTOFFICE ,PARTY ,HOMESTATE)' \
'VALUES(%s, %s, %s, %s, %s, %s)', row)
#close the connection to the database.
cursor.close()
print "Import to MySQL is over"
Line 5 and 6 are the import module declaration.
Line 9 to 12 are the MySQL connection credential.
Line 13 prepare the cursor to handle the connection.
Line 16 the csv.reader will read and parse the presidents.csv file.
Line 19 a for loop will be performed.
Line 21 will execute the insert into PRESIDENTS table query.
Line 25 will close the database connection.
You can run the script with:
root@:# python importCSV.py
Import to MySQL is over
After the
mysql> use csvdb;
select * from PRESIDENTS;
+----+---------------------+------------+-----------------------------+------------+------------+-------------------------------------------+---------------+
| id | cur_timestamp | PRESIDENCY | PRESIDENT | TOOKOFFICE | LEFTOFFICE | PARTY | HOMESTATE |
+----+---------------------+------------+-----------------------------+------------+------------+-------------------------------------------+---------------+
| 1 | 2012-02-22 23:45:05 | 1 | George Washington | 1789-04-30 | 1797-03-04 | Independent | Virginia |
| 2 | 2012-02-22 23:45:05 | 2 | John Adams | 1797-03-04 | 1801-03-04 | Federalist | Massachusetts |
| 3 | 2012-02-22 23:45:05 | 3 | Thomas Jefferson | 1801-03-04 | 1809-03-04 | Democratic-Republican | Virginia |
| 4 | 2012-02-22 23:45:05 | 4 | James Madison | 1809-03-04 | 1817-03-04 | Democratic-Republican | Virginia |
| 5 | 2012-02-22 23:45:05 | 5 | James Monroe | 1817-03-04 | 1825-03-04 | Democratic-Republican
+----+---------------------+------------+-----------------------------+------------+------------+-------------------------------------------+---------------+
I hope the example will be useful to understand how to work with spreadsheets and CSV data sets. If you have doubts or questions, please leave a comment on the post. All the code is freely available at http://bailey.st/code/bitsofpy/csvintomysql/
Best.