How to deal with a very large text file?

前端 未结 7 1023
清歌不尽
清歌不尽 2021-02-07 15:19

I\'m currently writing something that needs to handle very large text files (a few GiB at least). What\'s needed here (and this is fixed) is:

  • CSV-based, following
7条回答
  •  傲寒
    傲寒 (楼主)
    2021-02-07 15:35

    How about a table of offsets at somewhat regular intervals in the file, so you can restart parsing somewhere near the spot you are looking for?

    The idea would be that these would be byte offsets where the encoding would be in its initial state (i.e. if the data was ISO-2022 encoded, then this spot would be in the ASCII compatible mode). Any index into the data would then consist of a pointer into this table plus whatever is required to find the actual row. If you place the restart points such that each are between two points fits into the mmap window, then you can omit the check/remap/restart code from the parsing layer, and use a parser that assumes that data is sequentially mapped.

提交回复
热议问题