Linux/perl mmap performance

后端未结

关注

 9  1995

面向向阳花 2021-02-14 10:46

I\'m trying to optimize handling of large datasets using mmap. A dataset is in the gigabyte range. The idea was to mmap the whole file into memory, allowing multiple processes t

9条回答

北恋 (楼主)

2021-02-14 11:44

Ok, here's another update. Using Sys::Mmap or PerlIO's ":mmap" attribute both works fine in perl, but only up to 2 GB files (the magic 32 bit limit). Once the file is more than 2 GB, the following problems appear:

Using Sys::Mmap and substr for accessing the file, it seems that substr only accepts a 32 bit int for the position parameter, even on systems where perl supports 64 bit. There's at least one bug posted about it:

#62646: Maximum string length with substr

Using open(my $fh, "<:mmap", "bigfile.bin"), once the file is larger than 2 GB, it seems perl will either hang/or insist on reading the whole file on the first read (not sure which, I never ran it long enough to see if it completed), leading to dead slow performance.

I haven't found any workaround to either of these, and I'm currently stuck with slow file (non mmap'ed) operations for working on these files. Unless I find a workaround I may have to implement the processing in C or another higher level language that supports mmap'ing huge files better.

0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...