Raku : is there a SUPER fast way to turn an array into a string without the spaces separating the elements? [closed]

∥☆過路亽.° 提交于 2020-02-29 20:12:10

问题


I need to convert thousands of binary byte strings, each about a megabyte long, into ASC strings. This is what I have been doing, and seems too slow:

sub fileToCorrectUTF8Str ($fileName) { # binary file
    my $finalString = "";
    my $fileBuf = slurp($fileName, :bin);    
    for @$fileBuf { $finalString = $finalString ~ $_.chr; };    
    return $finalString;
}

~@b turns @b into string with all elements separated by space, but this is not what I want. If @b = < a b c d >; the ~@b is "a b c d"; but I just want "abcd", and I want to do this REALLY fast.

So, what is the best way? I can't really use hyper for parallelism because the final string is constructed sequentially. Or can I?


回答1:


TL;DR On an old rakudo, .decode is about 100X times as fast.

In longer form to match your code:

sub fileToCorrectUTF8Str ($fileName) { # binary file
  slurp($fileName, :bin).decode
}

Performance notes

First, here's what I wrote for testing:

# Create million and 1 bytes long file:
spurt 'foo', "1234\n6789\n" x 1e5 ~ 'Z', :bin;

# (`say` the last character to check work is done)
say .decode.substr(1e6) with slurp 'foo', :bin;

# fileToCorrectUTF8Str 'foo' );

say now - INIT now;

On TIO.run's 2018.12 rakudo, the above .decode weighs in at about .05 seconds per million byte file instead of about 5 seconds for your solution.

You could/should of course test on your system and/or using later versions of rakudo. I would expect the difference to remain in the same order, but for the absolute times to improve markedly as the years roll by.[1]

Why is it 100X as fast?

Well, first, @ on a Buf / Blob explicitly forces raku to view the erstwhile single item (a buffer) as a plural thing (a list of elements aka multiple items). That means high level iteration which, for a million element buffer, is immediately a million high level iterations/operations instead of just one high level operation.

Second, using .decode not only avoids iteration but only incurs relatively slow method call overhead once per file whereas when iterating there are potentially a million .chr calls per file. Method calls are (at least semantically) late-bound which is in principle relatively costly compared to, for example, calling a sub instead of a method (subs are generally early bound).

That all said:

  • Remember Caveat Empty[1]. For example, rakudo's standard classes generate method caches, and it's plausible the compiler just in-lines the method anyway, so it's possible there is negligible overhead for the method call aspect.

  • See also the doc's Performance page, especially Use existing high performance code.

Is the Buf.Str error message LTA?

Update See Liz++'s comment.

If you try to use .Str on a Buf or Blob (or equivalent, such as using the ~ prefix on it) you'll get an exception. Currently the message is:

Cannot use a Buf as a string, but you called the Str method on it

The doc for .Str on a Buf/Blob currently says:

In order to convert to a Str you need to use .decode.

It's arguably LTA that the error message doesn't suggest the same thing.

Then again, before deciding what to do about this, if anything, we need to consider what, and how, folk could learn from anything that goes wrong, including signals about it, such as error messages, and also what and how they do in fact currently learn, and bias our reactions toward building the right culture and infrastructure.

In particular, if folk can easily connect between an error message they see, and online discussion that elaborates on it, that needs to be taken into account and perhaps encouraged and/or made easier.

For example, there's now this SO covering this issue with the error message in it, so a google is likely to get someone here. Leaning on that might well be a more appropriate path forward than changing the error message. Or it might not. The change would be easy...

Please consider commenting below and/or searching existing rakudo issues to see if improvement of the Buf.Str error message is being considered and/or whether you wish to open an issue to propose it be altered. Every rock moved is at least great exercise, and, as our collective effort becomes increasingly wise, improves (our view of) the mountain.

Footnotes

[1] As the well known Latin saying Caveat Empty goes, both absolute and relative performance of any particular raku feature, and more generally any particular code, is always subject to variation due to factors including one's system's capabilities, its load during the time it's running the code, and any optimization done by the compiler. Thus, for example, if your system is "empty", then your code may run faster. Or, as another example, if you wait a year or three for the compiler to get faster, advances in rakudo's performance continue to look promising.



来源:https://stackoverflow.com/questions/60334347/raku-is-there-a-super-fast-way-to-turn-an-array-into-a-string-without-the-spac

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!