Bits of python: import a CSV file into a MySQL database.

facebooktwittergoogle_plusredditpinterestlinkedinmailby feather

This is my first blog post of the series “Bits of python”, a journey into the Python programming language done with simple examples. From time to time I’ll write about Python and Django and how they interact with Unix and Linux systems. All the scripts available are stripped versions of my daily work, mostly file parsers, databases administration and reporting tools. This first example is about how to import a CSV (Comma-separated values) files into a MySQL database and can be very handy when you work with spreadsheets and databases.

The python modules used in this example are, the CSV  and the python-mysqldb, the first is already present into python and for the second one you can install it just with apt-get install python-mysqldb.

The sample code consist in three files, presidents.csv, presidents.sql and importCSV.py, can be downloaded from here, once downloaded the archive can be unpacked with tar xvfz csvintomysql.tar.gz . Now it’s time to take a look to our file, let’s go into csvintomysql directory, cd csvintomysql.

The presidents.csv file is the list of all the presidents of the United States of America and it look like this (the list goes more long):

Presidency , President, Took office, Left office, Party, Home State

[sourcecode language="bash"]
1,George Washington,1789-04-30,1797-03-04,Independent ,Virginia
2,John Adams,1797-03-04,1801-03-04,Federalist ,Massachusetts
3,Thomas Jefferson,1801-03-04,1809-03-04,Democratic-Republican ,Virginia
4,James Madison,1809-03-04,1817-03-04,Democratic-Republican ,Virginia
5,James Monroe,1817-03-04,1825-03-04,Democratic-Republican ,Virginia
[/sourcecode]

As you can see there are several fields delimited by a comma that need to be exported and inserted into the Mysql database using the python CSV module.

The presidents.sql file is the series of MySQL instruction to create a database called csvdb and a table called presidents. In order to keep simple the creation of the database you need to log into MySQL as root and from the database prompt type:

[sourcecode language="bash"]

mysql> source presidents.sql
Query OK, 1 row affected (0.00 sec)

Database changed
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.02 sec)

[/sourcecode]

The importCSV.py is the Python script that parse and insert the CSV data into the database, the connection credential are already configured to connect to the database csvdb with csv as user and csv as password. Please note, if you want to modify the credential you need to change the settings in the first place, precisley in the presidents.sql file. Do not use this script in production environment!!!!! till you changed the Mysql login credential for the csv user.

[sourcecode language="python"]
#!/usr/bin/python
#simple python csv parser that export Comma
#Separated Value data into a MySQL database.
#phillip@bailey.st
import csv
import MySQLdb
# open the connection to the MySQL server.
# using MySQLdb
mydb = MySQLdb.connect(host=’localhost’,
user=’csv’,
passwd=’csv’,
db=’csvdb’)
cursor = mydb.cursor()
# read the presidents.csv file using the python
# csv module http://docs.python.org/library/csv.html
csv_data = csv.reader(file(‘presidents.csv’))
# execute the for clicle and insert the csv into the
# database.
for row in csv_data:

cursor.execute(‘INSERT INTO PRESIDENTS(PRESIDENCY ,PRESIDENT \
,TOOKOFFICE ,LEFTOFFICE ,PARTY ,HOMESTATE)’ \
‘VALUES(%s, %s, %s, %s, %s, %s)’, row)
#close the connection to the database.
cursor.close()
print "Import to MySQL is over"
[/sourcecode]

Line 5 and 6 are the import module declaration.

Line 9 to 12 are the MySQL connection credential.

Line 13 prepare the cursor to handle the connection.

Line 16 the csv.reader will read and parse the presidents.csv file.

Line 19 a for loop will be performed.

Line 21 will execute the insert into PRESIDENTS table query.

Line 25 will close the database connection.

You can run the script with:

root@:# python importCSV.py
Import to MySQL is over

After the

mysql> use csvdb;

select * from PRESIDENTS;

[sourcecode language="bash"]

+—-+———————+————+—————————–+————+————+——————————————-+—————+
| id | cur_timestamp | PRESIDENCY | PRESIDENT | TOOKOFFICE | LEFTOFFICE | PARTY | HOMESTATE |
+—-+———————+————+—————————–+————+————+——————————————-+—————+
| 1 | 2012-02-22 23:45:05 | 1 | George Washington | 1789-04-30 | 1797-03-04 | Independent | Virginia |
| 2 | 2012-02-22 23:45:05 | 2 | John Adams | 1797-03-04 | 1801-03-04 | Federalist | Massachusetts |
| 3 | 2012-02-22 23:45:05 | 3 | Thomas Jefferson | 1801-03-04 | 1809-03-04 | Democratic-Republican | Virginia |
| 4 | 2012-02-22 23:45:05 | 4 | James Madison | 1809-03-04 | 1817-03-04 | Democratic-Republican | Virginia |
| 5 | 2012-02-22 23:45:05 | 5 | James Monroe | 1817-03-04 | 1825-03-04 | Democratic-Republican
+—-+———————+————+—————————–+————+————+——————————————-+—————+

[/sourcecode]

I hope the example will be useful to understand how to work with spreadsheets and CSV data sets. If you have doubts or questions, please leave a comment on the post. All the code is freely available at http://bailey.st/code/bitsofpy/csvintomysql/

Best.


Comments

  1. After trying to run the script, and it not working, I had to go back and redo. I finally realized I needed to close the connection to the database, and THEN run script, which finally worked.

  2. script getting executed completely but data not getting inserted in mysql table. why so? Tried your code only just by changing mysql database username,password and database name

  3. When trying to insert this data into our table I receive this error message below

    (1064, “You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘CHANGE,VOLUME,EARNINGS_DATE)VALUES(‘Ticker’,’Company’,’Sector’,’Industry’,’Count’ at line 1″)

    Here is my INSERT statement
    for row in csv_data:
    cursor.execute(‘INSERT INTO STOCK_DATA(TICKER,COMPANY,SECTOR,INDUSTRY,COUNTRY,MARKET_CAP,P_E,FORWARD_PE,PEG,P_S,P_B,P_CASH,P_FREE_CASH_FLOW,DIVIDEND_YIELD,PAYOUT_RATIO,EPS,EPS_GROWTH_THIS_YEAR,EPS_GROWTH_NEXT_YEAR, \
    EPS_GROWTH_PAST_5_YEARS,EPS_GROWTH_NEXT_5_YEARS,SALSES_GROWTH_PAST_5_YEARS,EPS_GROWTH_QUARTER_OVER_QUARTER,SALES_GROWTH_QUARTER_OVER_QUARTER,SHARES_OUTSTANDING,SHARES_FLOAT,INSIDER_OWNERSHIP,INSIDER_TRANSACTIONS, \
    INSTITUTIONAL_OWNERSHIP,INSTITUTIONAL_TRANSACTIONS,FLOAT_SHORT,SHORT_RATIO,RETURN_ON_ASSETS,RETURN_ON_EQUITY,RETURN_ON_INVESTMENT,CURRENT_RATIO,QUCIK_RATIO,LT_DEPT_EQUITY,TOTAL_DEBT_EQUITY,GROSS_MARGIN,_OPERATING_MARGIN, \
    PROFIT_MARGIN,PERFORMANCE_WEEK,PERFORMANCE_MONTH,PERFORMANCE_QUARTER,PERFORMANCE_HALF_YEAR,PERFORMANCE_YEAR,PERFORMANCE_YTD,BETA,AVERAGE_TRUE_RANGE,VOLATILITY_WEEK,VOLATILITY_MONTH,20_DAY_SIMPLE_MOVING_AVERAGE, \
    50_DAY_SIMPLE_MOVING_AVERAGE,200_DAY_SIMPLE_MOVING_AVERAGE,50_DAY_HIGH,50_DAY_LOW,52_WEEK_HIGH,52_WEEK_LOW,RELATIVE_STRENGTH_INDEX,CHANGE_FROM_OPEN,GAP,ANALYST_RECOM,AVERAGE_VOLUME,RELATIVE_VOLUME,PRICE,CHANGE,VOLUME,EARNINGS_DATE)’ \
    ‘VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)’, row)

Leave a Reply

Your email address will not be published / Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">