Efficient way of reading large txt file in python

后端 未结 3 1951
隐瞒了意图╮
隐瞒了意图╮ 2021-01-27 06:35

I\'m trying to open a txt file with 4605227 rows (305 MB)

The way I have done this before is:

data = np.loadtxt(\'file.txt\', delimiter=\'\\t\', dtype=st         


        
相关标签:
3条回答
  • 2021-01-27 07:18

    You read it directly in as a Pandas DataFrame. eg

    import pandas as pd
    pd.read_csv(path)
    

    If you want to read faster, you can use modin:

    import modin.pandas as pd
    pd.read_csv(path)
    

    https://github.com/modin-project/modin

    0 讨论(0)
  • 2021-01-27 07:23

    Rather than reading it in with numpy you could just read it directly in as a Pandas DataFrame. E.g., using the pandas.read_csv function, with something like:

    df = pd.read_csv('file.txt', delimiter='\t', usecols=["a", "b", "c", "d", "e", "f", "g", "h", "i"])
    
    0 讨论(0)
  • 2021-01-27 07:26

    Method 1 :

    You can read the file by chunks , Moreover there is a buffer size which ou can mention in readline and you can read.

    inputFile = open('inputTextFile','r')
    buffer_line = inputFile.readlines(BUFFERSIZE)
    while buffer_line:
        #logic goes here
    

    Method 2:

    You can also use nmap Module , Here below is the link whic will explain the usage.

    import mmap

    with open("hello.txt", "r+b") as f:
        # memory-map the file, size 0 means whole file
        mm = mmap.mmap(f.fileno(), 0)
        # read content via standard file methods
        print(mm.readline())  # prints b"Hello Python!\n"
        # read content via slice notation
        print(mm[:5])  # prints b"Hello"
        # update content using slice notation;
        # note that new content must have same size
        mm[6:] = b" world!\n"
        # ... and read again using standard file methods
        mm.seek(0)
        print(mm.readline())  # prints b"Hello  world!\n"
        # close the map
        mm.close()
    

    https://docs.python.org/3/library/mmap.html

    0 讨论(0)
提交回复
热议问题