I have an app which reads a giant chunk of textual data into a scalar, sometimes even GBs in size. I use substr
on that scalar to read most of the data into another
Your analysis is correct.
$ perl -MDevel::Peek -e'
my $x; $x .= "x" for 1..100;
Dump($x);
substr($x, 50, length($x), "");
Dump($x);
'
SV = PV(0x24208e0) at 0x243d550
...
CUR = 100 # length($x) == 100
LEN = 120 # 120 bytes are allocated for the string buffer.
SV = PV(0x24208e0) at 0x243d550
...
CUR = 50 # length($x) == 50
LEN = 120 # 120 bytes are allocated for the string buffer.
Not only does Perl overallocate strings, it doesn't even free variables that go out of scope, instead reusing them the next time the scope is entered.
$ perl -MDevel::Peek -e'
sub f {
my ($set) = @_;
my $x;
if ($set) { $x = "abc"; $x .= "def"; }
Dump($x);
}
f(1);
f(0);
'
SV = PV(0x3be74b0) at 0x3c04228 # PV: Scalar may contain a string
REFCNT = 1
FLAGS = (POK,pPOK) # POK: Scalar contains a string
PV = 0x3c0c6a0 "abcdef"\0 # The string buffer
CUR = 6
LEN = 10 # Allocated size of the string buffer
SV = PV(0x3be74b0) at 0x3c04228 # Could be a different scalar at the same address,
REFCNT = 1 # but it's truly the same scalar
FLAGS = () # No "OK" flags: undef
PV = 0x3c0c6a0 "abcdef"\0 # The same string buffer
CUR = 6
LEN = 10 # Allocated size of the string buffer
The logic is that if you needed the memory once, there's a strong chance you'll need it again.
For the same reason, assigning undef
to a scalar doesn't free its string buffer. But Perl gives you a chance to free the buffers if you want, so passing a scalar to undef
does force the freeing of the scalar's internal buffers.
$ perl -MDevel::Peek -e'
my $x = "abc"; $x .= "def"; Dump($x);
$x = undef; Dump($x);
undef $x; Dump($x);
'
SV = PV(0x37d1fb0) at 0x37eec98 # PV: Scalar may contain a string
REFCNT = 1
FLAGS = (POK,pPOK) # POK: Scalar contains a string
PV = 0x37e8290 "abcdef"\0 # The string buffer
CUR = 6
LEN = 10 # Allocated size of the string buffer
SV = PV(0x37d1fb0) at 0x37eec98 # PV: Scalar may contain a string
REFCNT = 1
FLAGS = () # No "OK" flags: undef
PV = 0x37e8290 "abcdef"\0 # The string buffer is still allcoated
CUR = 6
LEN = 10 # Allocated size of the string buffer
SV = PV(0x37d1fb0) at 0x37eec98 # PV: Scalar may contain a string
REFCNT = 1
FLAGS = () # No "OK" flags: undef
PV = 0 # The string buffer has been freed.