Linux/perl mmap performance

后端未结

关注

 9  1998

I\'m trying to optimize handling of large datasets using mmap. A dataset is in the gigabyte range. The idea was to mmap the whole file into memory, allowing multiple processes t

相关标签:

9条回答

遇见更好的自我

2021-02-14 11:27

On 32-bit systems the address space for mmap()s is rather limited (and varies from OS to OS). Be aware of that if you're using multi-gigabyte files and your are only testing on a 64-bit system. (I would have preferred to write this in a comment but I don't have enough reputation points yet)

0 讨论(0)
发布评论:

提交评论
- 加载中...
南方客

2021-02-14 11:36

That does sound surprising. Why not try a pure C version?

Or try your code on a different OS/perl version.

0 讨论(0)
发布评论:

提交评论
- 加载中...
逝去的感伤

2021-02-14 11:40

Your access to that file had better be well random to justify a full mmap. If your usage isn't evenly distributed, you're probably better off with a seek, read to a freshly malloced area and process that, free, rinse and repeat. And work with chunks of multiples of 4k, say 64k or so.

I once benchmarked a lot string pattern matching algorithms. mmaping the entire file was slow and pointless. Reading to a static 32kish buffer was better, but still not particularly good. Reading to freshly malloced chunk, processing that and then letting it go allows kernel to work wonders under the hood. The difference in speed was enormous, but then again pattern matching is very fast complexitywise and more emphasis must be put on handling efficiency than perhaps is usually needed.

0 讨论(0)
发布评论:

提交评论
- 加载中...
小鲜肉

2021-02-14 11:40

If you have a relatively recent version of Perl, you shouldn't be using Sys::Mmap. You should be using PerlIO's mmap layer.

Can you post the code you are using?

0 讨论(0)
发布评论:

提交评论
- 加载中...
北恋

2021-02-14 11:44

Ok, here's another update. Using Sys::Mmap or PerlIO's ":mmap" attribute both works fine in perl, but only up to 2 GB files (the magic 32 bit limit). Once the file is more than 2 GB, the following problems appear:

Using Sys::Mmap and substr for accessing the file, it seems that substr only accepts a 32 bit int for the position parameter, even on systems where perl supports 64 bit. There's at least one bug posted about it:

#62646: Maximum string length with substr

Using open(my $fh, "<:mmap", "bigfile.bin"), once the file is larger than 2 GB, it seems perl will either hang/or insist on reading the whole file on the first read (not sure which, I never ran it long enough to see if it completed), leading to dead slow performance.

I haven't found any workaround to either of these, and I'm currently stuck with slow file (non mmap'ed) operations for working on these files. Unless I find a workaround I may have to implement the processing in C or another higher level language that supports mmap'ing huge files better.

0 讨论(0)
发布评论:

提交评论
- 加载中...
轮回少年

2021-02-14 11:44

If I may plug my own module: I'd advice using File::Map instead of Sys::Mmap. It's much easier to use, and is less crash-prone than Sys::Mmap.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页