Import huge data-set from SQL server to HDF5

好久不见. 提交于 2019-12-06 08:22:36

问题


I am trying to import ~12 Million records with 8 columns into Python.Because of its huge size my laptop memory would not be sufficient for this. Now I'm trying to import the SQL data into a HDF5 file format. It would be very helpful if someone can share a snippet of code that queries data from SQL and saves it in the HDF5 format in chunks.I am open to use any other file format that would be easier to use.

I plan to do some basic exploratory analysis and later on might create some decision trees/Liner regression models using pandas.

import pyodbc 
import numpy as np
import pandas as pd

con = pyodbc.connect('Trusted_Connection=yes',
                     driver = '{ODBC Driver 13 for SQL Server}',
                     server = 'SQL_ServerName')
df = pd.read_sql("select * from table_a",con,index_col=['Accountid'],chunksize=1000)

回答1:


Try this:

sql_reader = pd.read_sql("select * from table_a", con, chunksize=10**5)

hdf_fn = '/path/to/result.h5'
hdf_key = 'my_huge_df'
store = pd.HDFStore(hdf_fn)
cols_to_index = [<LIST OF COLUMNS THAT WE WANT TO INDEX in HDF5 FILE>]

for chunk in sql_reader:
     store.append(hdf_key, chunk, data_columns=cols_to_index, index=False)

# index data columns in HDFStore
store.create_table_index(hdf_key, columns=cols_to_index, optlevel=9, kind='full')
store.close()


来源:https://stackoverflow.com/questions/45200903/import-huge-data-set-from-sql-server-to-hdf5

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!