How can I randomize the lines in a file using standard tools on Red Hat Linux?
I don\'t have the shuf
command, so I am looking for something like a
A one-liner for python:
python -c "import random, sys; lines = open(sys.argv[1]).readlines(); random.shuffle(lines); print ''.join(lines)," myFile
And for printing just a single random line:
python -c "import random, sys; print random.choice(open(sys.argv[1]).readlines())," myFile
But see this post for the drawbacks of python's random.shuffle()
. It won't work well with many (more than 2080) elements.
On OSX, grabbing latest from http://ftp.gnu.org/gnu/coreutils/ and something like
./configure make sudo make install
...should give you /usr/local/bin/sort --random-sort
without messing up /usr/bin/sort
shuf
is the best way.
sort -R
is painfully slow. I just tried to sort 5GB file. I gave up after 2.5 hours. Then shuf
sorted it in a minute.
cat yourfile.txt | while IFS= read -r f; do printf "%05d %s\n" "$RANDOM" "$f"; done | sort -n | cut -c7-
Read the file, prepend every line with a random number, sort the file on those random prefixes, cut the prefixes afterwards. One-liner which should work in any semi-modern shell.
EDIT: incorporated Richard Hansen's remarks.
Mac OS X with DarwinPorts:
sudo port install unsort
cat $file | unsort | ...
And a Perl one-liner you get!
perl -MList::Util -e 'print List::Util::shuffle <>'
It uses a module, but the module is part of the Perl code distribution. If that's not good enough, you may consider rolling your own.
I tried using this with the -i
flag ("edit-in-place") to have it edit the file. The documentation suggests it should work, but it doesn't. It still displays the shuffled file to stdout, but this time it deletes the original. I suggest you don't use it.
Consider a shell script:
#!/bin/sh
if [[ $# -eq 0 ]]
then
echo "Usage: $0 [file ...]"
exit 1
fi
for i in "$@"
do
perl -MList::Util -e 'print List::Util::shuffle <>' $i > $i.new
if [[ `wc -c $i` -eq `wc -c $i.new` ]]
then
mv $i.new $i
else
echo "Error for file $i!"
fi
done
Untested, but hopefully works.