Python: unziping special files into memory and getting them into a DataFrame

巧了我就是萌 提交于 2019-12-24 09:26:12

问题


I'm quite stuck with a code I'm writing in Python, I'm a beginner and maybe is really easy, but I just can't see it. Any help would be appreciated. So thank you in advance :)

Here is the problem: I have to read some special data files with an special extension .fen into a pandas DataFrame.This .fen files are inside a zipped file .fenx that contains the .fen file and a .cfg configuration file.

In the code I've written I use zipfile library in order to unzip the files, and then get them in the DataFrame. This code is the following

import zipfile
import numpy as np
import pandas as pd

def readfenxfile(Directory,File):

    fenxzip = zipfile.ZipFile(Directory+ '\\' + File, 'r')
    fenxzip.extractall()
    fenxzip.close()

    cfgGeneral,cfgDevice,cfgChannels,cfgDtypes=readCfgFile(Directory,File[:-5]+'.CFG')
    #readCfgFile redas the .cfg file and returns some important data. 
    #Here only the cfgDtypes would be important as it contains the type of data inside the .fen and that will become the column index in the final DataFrame.
    if cfgChannels!=None:        
        dtDtype=eval('np.dtype([' + cfgDtypes + '])')
        dt=np.fromfile(Directory+'\\'+File[:-5]+'.fen',dtype=dtDtype)
        dt=pd.DataFrame(dt)
    else:
        dt=[]

    return dt,cfgChannels,cfgDtypes

Now, the extract() method saves the unzipped file in the hard drive. The .fenx files can be quite big so this need of storing (and afterwards deleting them) is really slow. I would like to do the same I do now, but getting the .fen and .cfg files into the memory, not the hard drive.

I have tried things like fenxzip.read('whateverthenameofthefileis.fen')and some other methods like .open() from the zipfile library. But I can't get what .read() returns into a numpy array in anyway i tried.

I know it can be a difficult question to answer, because you don't have the files to try and see what happens. But if someone would have any ideas I would be glad of reading them. :) Thank you very much!


回答1:


Here is the solution I finally found in case it can be helpful for anyone. It uses the tempfile library to create a temporal object in memory.

import zipfile
import tempfile
import numpy as np
import pandas as pd

def readfenxfile(Directory,File,ExtractDirectory):


    fenxzip = zipfile.ZipFile(Directory+ r'\\' + File, 'r')

    fenfile=tempfile.SpooledTemporaryFile(max_size=10000000000,mode='w+b') 
     fenfile.write(fenxzip.read(File[:-5]+'.fen'))
     cfgGeneral,cfgDevice,cfgChannels,cfgDtypes=readCfgFile(fenxzip,File[:-5]+'.CFG')

    if cfgChannels!=None:        
        dtDtype=eval('np.dtype([' + cfgDtypes + '])')
        fenfile.seek(0)
        dt=np.fromfile(fenfile,dtype=dtDtype)
        dt=pd.DataFrame(dt)
    else:
        dt=[]
    fenfile.close()
    fenxzip.close()    
    return dt,cfgChannels,cfgDtypes


来源:https://stackoverflow.com/questions/43258102/python-unziping-special-files-into-memory-and-getting-them-into-a-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!