How can I re-add a unicode byte order marker in linux?

前端 未结 7 1878
-上瘾入骨i
-上瘾入骨i 2021-02-13 15:22

I have a rather large SQL file which starts with the byte order marker of FFFE. I have split this file using the unicode aware linux split tool into 100,000 line chunks. But whe

相关标签:
7条回答
  • 2021-02-13 15:24

    For a general-purpose solution—something that sets the correct byte-order mark regardless of whether the file is UTF-8, UTF-16, or UTF-32—I would use vim’s 'bomb' option:

    $ echo 'hello' > foo
    $ xxd < foo
    0000000: 6865 6c6c 6f0a                           hello.
    $ vim -e -s -c ':set bomb' -c ':wq' foo
    $ xxd < foo
    0000000: efbb bf68 656c 6c6f 0a                   ...hello.
    

    (-e means runs in ex mode instead of visual mode; -s means don’t print status messages; -c means “do this”)

    0 讨论(0)
  • 2021-02-13 15:28
    $ printf '\xEF\xBB\xBF' > bom.txt
    

    Then check:

    $ grep -rl $'\xEF\xBB\xBF' .
    ./bom.txt
    
    0 讨论(0)
  • 2021-02-13 15:35

    Based on sed's solution of Anonymous, sed -i '1s/^/\xef\xbb\xbf/' foo adds the BOM to the UTF-8 encoded file foo. Usefull is that it also converts ASCII files to UTF8 with BOM

    0 讨论(0)
  • 2021-02-13 15:39

    Something like (backup first)):

    for i in $(ls *.sql)
    do
      cp "$i" "$i.temp"
      printf '\xFF\xFE' > "$i"
      cat "$i.temp" >> "$i"
      rm "$i.temp"
    done
    
    0 讨论(0)
  • 2021-02-13 15:39

    To add BOMs to the all the files that start with "foo-", you can use sed. sed has an option to make a backup.

    sed -i '1s/^\(\xff\xfe\)\?/\xff\xfe/' foo-*
    

    straceing this shows sed creates a temp file with a name starting with "sed". If you know for sure there is no BOM already, you can simplify the command:

    sed -i '1s/^/\xff\xfe/' foo-*
    

    Make sure you need to set UTF-16, because i.e. UTF-8 is different.

    0 讨论(0)
  • 2021-02-13 15:42

    Try uconv

    uconv --add-signature
    
    0 讨论(0)
提交回复
热议问题