How to concatenate files with the same prefix (and many prefixes)?

后端 未结 4 820
死守一世寂寞
死守一世寂寞 2021-01-15 23:21

I have many files that have the same prefix, only the bit after the underscore is different. And I have many prefixes as well! Underscore does not appear anywhere else in th

相关标签:
4条回答
  • 2021-01-15 23:49

    You can do something like:

    cat /path/prefix* >> new_file
    

    It will cat (that is, concatenate files and print on the standard output) all files whose name matches /path/prefix. The rest of the text is what can be different.

    Before executing that it is good to do ls /path/prefix* to make sure it gets all (and only these) files you want to take into consideration.

    Example

    $ ls
    aa_bb  prefix_23  prefix_235  prefix_nnn
    $ ls prefix_*
    prefix_23  prefix_235  prefix_nnn
    
    0 讨论(0)
  • 2021-01-15 23:53

    I had a similar problem, had many files and wanted to group and cat them by prefix, I used this little script:

    ls | awk -F '_' '!x[$1]++{print $1}' | while read -r line
    do
        cat $line* > all_$line\.txt
    done
    

    ls will show all the files in the directory

    In awk the -F '_' option is to set the underscore as the delimiter, and the code itself acts like uniq, meaning will print each prefix only once.

    Then we run a loop on all prefixes and cat all the files with the same prefix.

    0 讨论(0)
  • 2021-01-16 00:01

    In case your amount of files is very large, then sometimes just using shell globbing (prefix_* and the like) isn't suitable.

    You can use a loop and append them one by one then:

    find dir -type f -name 'prefix_*' -exec bash -c 'cat "{}" >> result' \;
    

    This will append all files matching prefix_* one by one to the file result (which shouldn't exist in the beginning, if in doubt use rm result).

    If you have lots of different prefixes, you can of course append one group after the other without removing the result file in between.

    All the other options the Unix tool find offers can be used as well of course. But if you need help with that, feel free to ask again.

    0 讨论(0)
  • 2021-01-16 00:09

    I had to do something very similar and I don't feel like the previous answers here solve your problem as they require a huge amount of manual input if there are many different prefixes, not just a few prefixes with lots of files all with the same prefix. If I knew the pattern of your prefix I could give you more specific advice, but for now I'm just going to assume that your prefix is numbering with leading zeros (as it is with my files). I am going to assume the following, but they need not be true to work:

    ~/test01/001-test.txt
    ~/test01/002-test.txt
    ~/test01/003-test.txt
    
    ~/test02/001-test.txt
    ~/test02/002-test.txt
    ~/test02/003-test.txt
    

    Once this is set up I'm going to change into a merge directory where I want all my merged files to be written to and then run the cat command in a for loop.

    cd ~/merge
    
    for i in {001..003}; do cat ../test*/"$i"*.txt > "$i"-merge.txt ; done
    

    This will use 001, 002, and 003 as prefixes and look in all of the test directories for files that match these prefixes and merge them together in the order they're found. The end result will appear in:

    ~/merge/001-merge.txt
    ~/merge/002-merge.txt
    ~/merge/003-merge.txt
    

    I know this is a lot late, but hopefully it helps someone else. I have to do this with 5000 prefixes, so I completely understand.

    0 讨论(0)
提交回复
热议问题