Linux/perl mmap performance

后端 未结 9 2020
面向向阳花
面向向阳花 2021-02-14 10:46

I\'m trying to optimize handling of large datasets using mmap. A dataset is in the gigabyte range. The idea was to mmap the whole file into memory, allowing multiple processes t

9条回答
  •  猫巷女王i
    2021-02-14 11:47

    Ok, found the problem. As suspected, neither linux or perl were to blame. To open and access the file I do something like this:

    #!/usr/bin/perl
    # Create 1 GB file if you do not have one:
    # dd if=/dev/urandom of=test.bin bs=1048576 count=1000
    use strict; use warnings;
    use Sys::Mmap;
    
    open (my $fh, "

    If you test that code, there are no delays like those I found in my original code, and after creating the minimal sample (always do that, right!) the reason suddenly became obvious.

    The error was that I in my code treated the $mh scalar as a handle, something which is light weight and can be moved around easily (read: pass by value). Turns out, it's actually a GB long string, definitively not something you want to move around without creating an explicit reference (perl lingua for a "pointer"/handle value). So if you need to store in in a hash or similar, make sure you store \$mh, and deref it when you need to use it like ${$hash->{mh}}, typically as the first parameter in a substr or similar.

提交回复
热议问题