Load CSV to .mdb using pyodbc and pandas

Background story: I work on finance (not a developer, so help is very appreciated), my department currently relies heavily on excel and vba to automate as much as possible of our tasks. The company just validated a python distribution and we're now allowed to use it, so I just thought on giving a try.

Challenge: My first challenge was to load a CSV file into a MSAcess database (because not all of us are tech savy enough to work purely using dev tools and DBs, so need to make things easy for everybody).

I could find bits and pieces of different ppl's code around the internet that I could put together, it's working, but turn out it became a Frankenstein.

What it's doing and why:

  1. Load CSV to variable
  2. Strip out first rows (because source file is not realy a CSV, has rubbish rows at the start of the file)
  3. Export to a CSV in temp drive (because could not figure out how to load to panda from a variable)
  4. Load CSV to SQLite using panda (because panda is able to infer data type of each column)
  5. Export "create table" statement to variable
  6. Create table in .mdb file using pyodbc
  7. Load data to .mdb table row by row (it's very slow)

Current code is a patchwork of different codes, it's ugly and slow, what would you change to make it more efficient / to optimize it?

The goal is to have a code that loads CSV to .mdb, possibly using correct data type to create table.

import csv
import pyodbc
import pandas
import pandas.io.sql
import sqlite3
import tempfile
import time
import string

def load_csv_to_access(access_path, table_name, csv_path, skip_rows):

# open CSV file, load to a variable, output to a temp file excluding first non csv rows
filename = csv_path
csv_file = open(filename)
txt = ""
for index, line in enumerate(csv_file, start=0):  #Skip first rows
    if index > skip_rows:
        txt += line
temp_filename = time.strftime("%y%m%d%H%M%S") + '.csv'
temp_filepath = tempfile.gettempdir() + '\\' + temp_filename
file = open(temp_filepath, 'w+')
file.write(txt)  # create temp csv
print "1: temp file created: " + temp_filepath

# Use panda and SQLite to infer data type of CSV fields
df = pandas.read_csv(temp_filepath, delimiter=';', index_col=0, engine='python')
df.columns = df.columns.str.replace(' ', '_')
# connect to in-memory database for testing; replace `:memory:` w/ file path
con = sqlite3.connect('db.sqlite')
df.to_sql(table_name, con, if_exists='replace')
sqlite_query_string = "SELECT sql FROM sqlite_master where name = '" + table_name + "'"
create_table_tuple = con.execute(sqlite_query_string).fetchone()
create_table_string = create_table_tuple[0]
print "2: Data type inferred"

#Connect to AccessDB and load temp CSV
access_string = "DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=" + access_path + "; Provider=MSDASQL;"
print access_string
con = pyodbc.connect(access_string)
cur = con.cursor()
print "3: MS Access table created: " + table_name

print "4: Loading data rows:"
with open(temp_filepath, 'r') as f:
    reader = csv.reader(f, delimiter=';')
    columns = next(reader)
    query = "insert into " + table_name + "({0}) values ({1})"
    query = query.format(','.join(columns).replace(' ', '_'), ','.join(
        '?' * len(columns)))  #Create insert query (replace empty space by underscore to avoid db issues)
    for index, data in enumerate(reader, start=0):
        cur.execute(query, data)  #Insert row by row
        print index # For debugging

MS Access can directly query CSV files and run a Make-Table Query to produce a resulting table. However, some cleaning is needed to remove the rubbish rows. Below opens two files one for reading and other for writing. Assuming rubbish is in first column of csv, the if logic writes any line that has some data in second column (adjust as needed):

import os
import csv
import pyodbc

with open('C:\Path\To\Raw.csv', 'r') as reader, open('C:\Path\To\Clean.csv', 'w') as writer:
    read_csv = csv.reader(reader); write_csv = csv.writer(writer, lineterminator='\n')

    for line in read_csv:
        if len(line[1]) > 0:            

access_path = "C:\Path\To\Access\\DB.mdb"
con = pyodbc.connect("DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={};" \

strSQL = "SELECT * INTO [TableName] FROM [text;HDR=Yes;FMT=Delimited(,);" + \
cur = con.cursor()

con.close()                            # CLOSE CONNECTION
os.remove('C\Path\To\Clean.csv')       # DELETE CLEAN TEMP 


Clean CSV

MS Access Table

Notice Access can infer column types such as the Date in first column.

