How to deal with a very large text file?

前端未结

关注

 7  1036

清歌不尽 2021-02-07 15:19

I\'m currently writing something that needs to handle very large text files (a few GiB at least). What\'s needed here (and this is fixed) is:

CSV-based, following

7条回答

别那么骄傲 (楼主)

2021-02-07 15:34
It's very difficult to maintain a 1:1 mapping between a sequence of Java chars (which are effectively UTF-16) and bytes which could be anything depending on your file encoding. Even with UTF-8, the "obvious" mapping of 1 byte to 1 char only works for ASCII. Neither UTF-16 nor UTF-8 guarantees that a unicode character can be stored in a single machine char or byte.

I would maintain my window into the file as a byte buffer, not a char buffer. Then to find line endings in the byte buffer, I'd encode the Java string "\r\n" (or possibly just "\n") as a byte sequence using the same encoding as the file is in. I'd then use that byte sequence to search for line endings in the byte buffer. The position of a line ending in the buffer + the offset of the buffer from the start of the file maps exactly to the byte position in the file of the line ending.

Appending lines is just a case of seeking to the end of the file and adding your new lines. Changing lines is more tricky. I think I would maintain a list or map of byte positions of changed lines and what the change is. When ready to write the changes:
1. sort the list of changes by byte position
2. read the original file up to the next change and write it to a temporary file.
3. write the changed line to the temporary file.
4. skip the changed line in the original file.
5. go back to step 2 unless you have reached the end of the original file
6. move the temp file over the original file.
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...