Best way to convert text files between character sets?

后端未结

关注

 21  2071

再見小時候

What is the fastest, easiest tool or method to convert text files between character sets?

Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa.

相关标签:

21条回答

别跟我提以往

2020-11-22 05:01
Stand-alone utility approach
```
iconv -f ISO-8859-1 -t UTF-8 in.txt > out.txt
```
```
-f ENCODING  the encoding of the input
-t ENCODING  the encoding of the output
```
You don't have to specify either of these arguments. They will default to your current locale, which is usually UTF-8.
0 讨论(0)
发布评论:

提交评论
- 加载中...
囚心锁ツ

2020-11-22 05:01
Oneliner using find, with automatic character set detection

The character encoding of all matching text files gets detected automatically and all matching text files are converted to utf-8 encoding:
```
$ find . -type f -iname *.txt -exec sh -c 'iconv -f $(file -bi "$1" |sed -e "s/.*[ ]charset=//") -t utf-8 -o converted "$1" && mv converted "$1"' -- {} \;
```
To perform these steps, a sub shell sh is used with -exec, running a one-liner with the -c flag, and passing the filename as the positional argument "$1" with -- {}. In between, the utf-8 output file is temporarily named converted.

Whereby file -bi means:
- -b, --brief Do not prepend filenames to output lines (brief mode).
- -i, --mime Causes the file command to output mime type strings rather than the more traditional human readable ones. Thus it may say for example text/plain; charset=us-ascii rather than ASCII text. The sed command cuts this to only us-ascii as is required by iconv.
The find command is very useful for such file management automation. Click here for more find galore.
0 讨论(0)
发布评论:

提交评论
- 加载中...
耶瑟儿～

2020-11-22 05:02

Try Notepad++

On Windows I was able to use Notepad++ to do the conversion from ISO-8859-1 to UTF-8. Click "Encoding" and then "Convert to UTF-8".

0 讨论(0)
发布评论:

提交评论
- 加载中...
-上瘾入骨i

2020-11-22 05:04
Try VIM

If you have vim you can use this:

Not tested for every encoding.

The cool part about this is that you don't have to know the source encoding
```
vim +"set nobomb | set fenc=utf8 | x" filename.txt
```
Be aware that this command modify directly the file

Explanation part!
1. + : Used by vim to directly enter command when opening a file. Usualy used to open a file at a specific line: vim +14 file.txt
2. | : Separator of multiple commands (like ; in bash)
3. set nobomb : no utf-8 BOM
4. set fenc=utf8 : Set new encoding to utf-8 doc link
5. x : Save and close file
6. filename.txt : path to the file
7. " : qotes are here because of pipes. (otherwise bash will use them as bash pipe)
0 讨论(0)
发布评论:

提交评论
- 加载中...

夕颜

2020-11-22 05:04

In powershell:

function Recode($InCharset, $InFile, $OutCharset, $OutFile)  {
    # Read input file in the source encoding
    $Encoding = [System.Text.Encoding]::GetEncoding($InCharset)
    $Text = [System.IO.File]::ReadAllText($InFile, $Encoding)
    
    # Write output file in the destination encoding
    $Encoding = [System.Text.Encoding]::GetEncoding($OutCharset)    
    [System.IO.File]::WriteAllText($OutFile, $Text, $Encoding)
}

Recode Windows-1252 "$pwd\in.txt" utf8 "$pwd\out.txt"

For a list of supported encoding names:

https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding

0 讨论(0)

迷失自我

2020-11-22 05:05
DOS/Windows: use Code page
```
chcp 65001>NUL
type ascii.txt > unicode.txt
```
Command chcp can be used to change the code page. Code page 65001 is Microsoft name for UTF-8. After setting code page, the output generated by following commands will be of code page set.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 4 下一页

Best way to convert text files between character sets?

Oneliner using find, with automatic character set detection

Try Notepad++

Try VIM

Explanation part!