I\'m trying to optimize handling of large datasets using mmap. A dataset is in the gigabyte range. The idea was to mmap the whole file into memory, allowing multiple processes t
Ok, found the problem. As suspected, neither linux or perl were to blame. To open and access the file I do something like this:
#!/usr/bin/perl
# Create 1 GB file if you do not have one:
# dd if=/dev/urandom of=test.bin bs=1048576 count=1000
use strict; use warnings;
use Sys::Mmap;
open (my $fh, "
If you test that code, there are no delays like those I found in my original code, and after creating the minimal sample (always do that, right!) the reason suddenly became obvious.
The error was that I in my code treated the $mh
scalar as a handle, something which is light weight and can be moved around easily (read: pass by value). Turns out, it's actually a GB long string, definitively not something you want to move around without creating an explicit reference (perl lingua for a "pointer"/handle value). So if you need to store in in a hash or similar, make sure you store \$mh
, and deref it when you need to use it like ${$hash->{mh}}
, typically as the first parameter in a substr or similar.