Create a file of a specific size with random printable strings in bash

后端未结

关注

 6  794

I want to create a file of a specific size containing only printable strings in bash.

My first thought was to use /dev/urandom:

dd if=/d


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  一整个雨季        
                
              
                            
                2021-01-13 22:32
              
            
            
                                                                       
Quick and dirty bash solution

Based on your request, using string on urandom

dd if=<(strings </dev/urandom) bs=4K count=25600 of=/tmp/file


.. or even

dd bs=4K count=25600 if=/dev/urandom |
    tr \\000-\\037\\200-\\377 \\040-\\077\\040-\\077\\040-\\140 >/tmp/file

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  暖寄归人        
                
              
                            
                2021-01-13 22:34
              
            
            
                                                                       
You can do it in awk way and customize character set.

This solution is dedicated for Windows bash users - MINGW, because there is no dd, random tools at default MINGW environment.

random_readable.sh Bash script that randomize N characters from defined alphabet:

#!/bin/sh

if [ -z $1 ]; then
    echo "Pass file size as initial parameter"
    exit
fi

SIZE=$1
seed=$( date +%s )

awk -v size="$SIZE" -v seed="$seed" '
# add characters from range (a .. b) to alphabet
function add_range(a,b){
    idx=a;
    while (idx <= b) {
        alphabet[idx] = sprintf("%c",idx)
        idx+=1
    }
}
BEGIN{
    srand(seed);  
    NUM=size;  
    idx=0;  

    # creating alfphabet dictionary
    add_range(32,126)   # all printable
    ## uncomment following lines to random [a-zA-Z0-9<operators>]
    # add_range(48,57)    # numbers
    # add_range(65,90)    # LETTERS
    # add_range(97,122)   # letters
    # add_range(33,47)    # operators: !"# .. etc

    # alfphabet to alphanums array
    idx=0
    for (k in alphabet){
        alphanums[idx]=alphabet[k]
        idx+=1
    }
    alphabet_len = idx
    i=0

    # and iterate to random some characters
    idx =0
    while (idx < NUM){                         
        dec =0
        char_idx=int(rand() * alphabet_len)
        char = alphanums[char_idx]
        printf("%s",alphanums[char_idx])
        idx+=1
    }  
}  
' 


Creating file:

random_readable.sh 100 > output.txt

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  离开以前        
                
              
                            
                2021-01-13 22:36
              
            
            
                                                                       
What about this?

size=1048576 # 1MB
fname="strings.txt"

while read line ; do
    # Append strings to the file ...
    strings <<< "${line}" >> "${fname}"
    fsize="$(du -b "${fname}" | awk '{print $1}')"
    # ... until it is bigger than the desired size
    if [ ${fsize} -gt ${size} ] ; then
        # Now truncate the file to the desired size and exit the loop
        truncate -s "${size}" strings.txt
        break
    fi 
done < /dev/urandom


I admit that it is not very efficient. I faster attempt would be to use dd:

size=1048576
fname="strings.txt"

truncate -s0 "${fname}"

while true ; do
    dd if=/dev/urandom bs="${size}" count=1 | strings >> "${fname}"
    fsize="$(du -b "${fname}" | awk '{print $1}')"
    if [ ${fsize} -gt ${size} ] ; then
        truncate -s "${size}" strings.txt
        break
    fi
done

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  借酒劲吻你        
                
              
                            
                2021-01-13 22:37
              
            
            
                                                                       
A conversion of @MarekNowaczyk
answer to plain bash:

#!/bin/sh
(( $# )) || {  echo "Pass file size as initial parameter" >&2; exit 1; }
size=$1
mk_range(){ name=$1; shift; printf -v "$name" '%b' "$(printf '\\U%08x' "$@")"; }
add_chars(){ local var; mk_range var "$@"; chars+=$var; }
    ## uncomment following lines to use each range.
    add_chars {48..57}    # 0-9 numbers
    add_chars {65..90}    # A-Z LETTERS
    add_chars {97..122}   # a-z letters
    add_chars {32,{33..47},{58..64},{91..96},{123..127}}     # other chars.
    # convert list of characters to an array of characters.
    [[ $chars =~ ${chars//?/(.)} ]] && arr=("${BASH_REMATCH[@]:1}");
    alphabet_len=${#arr[@]} 
    # loop to print random characters
    for ((i=0;i<$size;i++)); do
        idx=$((RANDOM%alphabet_len))
        printf '%s' "${arr[idx]}"
    done
    # Add a trailing new line.
    echo


This code does not ensure that the resulting random distribution is uniform, it was written as an example. To ensure a random distribution in the output, 
we would have to use careful arbitrary precision arithmetic to change the base (count of output characters).

Also, RANDOM is not a CSPRNG.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  攒了一身酷        
                
              
                            
                2021-01-13 22:38
              
            
            
                                                                       
The correct way is to use a transformation like base64 to convert the random bytes to characters. That will not erase any of the randomness from the source, it will only convert it to some other form.

For a (a little bit bigger) file of 1 MegaByte of size:

dd if=/dev/urandom bs=786438 count=1 | base64 > /tmp/file


The resulting file will contain characters in the range A–Za–z0–9 and +/=.

Below is the reason for the file to be a little bigger, and a solution.

You could add a filter to translate from that list to some other list (of the same size or less) with tr.

cat /tmp/file | tr 'A-Za-z0-9+/=' 'a-z0-9A-Z$%'


I have left the = outside of the translation because for an uniform random distribution it is better to leave out the last characters that will (almost) allways be =.

Size

The size of the file will get expanded from the original size used from /dev/random in a factor of 4/3. That is because we are transforming 256 byte values into 64 different characters. That is done by taking 6 bits from the stream of bytes to encode each character. When 4 characters have been encoded (6*4=24 bits) only three bytes have been consumed (8*3=24).

So, we need a count of bytes multiple of 3 to get an exact result, and multiple of 4 because we will have to divide by that.

We can not get a random file of exactly 1024 bytes (1k) or 1024*1024 = 1,048,576 bytes (1M) because both are not exact multiple of 3. But we can produce a file a little bigger and truncate it (if such precision is needed):

wanted_size=$((1024*1024))
file_size=$(( ((wanted_size/12)+1)*12 ))
read_size=$((file_size*3/4))

echo "wanted=$wanted_size file=$file_size read=$read_size"

dd if=/dev/urandom bs=$read_size count=1 | base64 > /tmp/file

truncate -s "$wanted_size" /tmp/file 


The last step to truncate to the exact value is optional.

Randomness generation.

As you are going to extract so much random values from urandom, please do not use random (use urandom) or your app will be blocked for a long time and the rest of the computer will work without randomness.  

I'll recommend that you install the package haveged:


  haveged uses HAVEGE (HArdware Volatile Entropy Gathering and Expansion)
  to maintain a 1M pool of random bytes used to fill /dev/random
  whenever the supply of random bits in dev/random falls below the low
  water mark of the device.


If that is possible.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情书的邮戳        
                
              
                            
                2021-01-13 22:38
              
            
            
                                                                       
You can use one of the following:


truncate
You should have a baseline textfile with a size larger than what you need.
then use the following:

truncate -s 5M filename
DESCRIPTION
   Shrink or extend the size of each FILE to the specified size

[...]

 -s, --size=SIZE
      set or adjust the file size by SIZE



2.Use tail: this options requires reference text file too.

tail -c 1MB reference_big.txt> 1mb.txt

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复