I have a rather large SQL file which starts with the byte order marker of FFFE. I have split this file using the unicode aware linux split tool into 100,000 line chunks. But whe
For a general-purpose solution—something that sets the correct byte-order mark regardless of whether the file is UTF-8, UTF-16, or UTF-32—I would use vim’s 'bomb'
option:
$ echo 'hello' > foo
$ xxd < foo
0000000: 6865 6c6c 6f0a hello.
$ vim -e -s -c ':set bomb' -c ':wq' foo
$ xxd < foo
0000000: efbb bf68 656c 6c6f 0a ...hello.
(-e
means runs in ex mode instead of visual mode; -s
means don’t print status messages; -c
means “do this”)
$ printf '\xEF\xBB\xBF' > bom.txt
Then check:
$ grep -rl $'\xEF\xBB\xBF' .
./bom.txt
Based on sed's solution of Anonymous, sed -i '1s/^/\xef\xbb\xbf/' foo
adds the BOM to the UTF-8 encoded file foo
. Usefull is that it also converts ASCII files to UTF8 with BOM
Something like (backup first)):
for i in $(ls *.sql)
do
cp "$i" "$i.temp"
printf '\xFF\xFE' > "$i"
cat "$i.temp" >> "$i"
rm "$i.temp"
done
To add BOMs to the all the files that start with "foo-", you can use sed
. sed
has an option to make a backup.
sed -i '1s/^\(\xff\xfe\)\?/\xff\xfe/' foo-*
strace
ing this shows sed creates a temp file with a name starting with "sed". If you know for sure there is no BOM already, you can simplify the command:
sed -i '1s/^/\xff\xfe/' foo-*
Make sure you need to set UTF-16, because i.e. UTF-8 is different.
Try uconv
uconv --add-signature