How can I re-add a unicode byte order marker in linux?

前端未结

关注

 7  1897

I have a rather large SQL file which starts with the byte order marker of FFFE. I have split this file using the unicode aware linux split tool into 100,000 line chunks. But whe

相关标签:

7条回答

被撕碎了的回忆

2021-02-13 15:24
For a general-purpose solution—something that sets the correct byte-order mark regardless of whether the file is UTF-8, UTF-16, or UTF-32—I would use vim’s 'bomb' option:
```
$ echo 'hello' > foo
$ xxd < foo
0000000: 6865 6c6c 6f0a                           hello.
$ vim -e -s -c ':set bomb' -c ':wq' foo
$ xxd < foo
0000000: efbb bf68 656c 6c6f 0a                   ...hello.
```
(-e means runs in ex mode instead of visual mode; -s means don’t print status messages; -c means “do this”)
0 讨论(0)
发布评论:

提交评论
- 加载中...
我在风中等你

2021-02-13 15:28
```
$ printf '\xEF\xBB\xBF' > bom.txt
```
Then check:
```
$ grep -rl $'\xEF\xBB\xBF' .
./bom.txt
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
悲哀的现实

2021-02-13 15:35

Based on sed's solution of Anonymous, sed -i '1s/^/\xef\xbb\xbf/' foo adds the BOM to the UTF-8 encoded file foo. Usefull is that it also converts ASCII files to UTF8 with BOM

0 讨论(0)
发布评论:

提交评论
- 加载中...

心在旅途

2021-02-13 15:39

Something like (backup first)):

for i in $(ls *.sql)
do
  cp "$i" "$i.temp"
  printf '\xFF\xFE' > "$i"
  cat "$i.temp" >> "$i"
  rm "$i.temp"
done

0 讨论(0)

小鲜肉

2021-02-13 15:39
To add BOMs to the all the files that start with "foo-", you can use sed. sed has an option to make a backup.
```
sed -i '1s/^$\xff\xfe$\?/\xff\xfe/' foo-*
```
straceing this shows sed creates a temp file with a name starting with "sed". If you know for sure there is no BOM already, you can simplify the command:
```
sed -i '1s/^/\xff\xfe/' foo-*
```
Make sure you need to set UTF-16, because i.e. UTF-8 is different.
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人及你

2021-02-13 15:42
Try uconv
```
uconv --add-signature
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页