What is the fastest, easiest tool or method to convert text files between character sets?
Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa.
Stand-alone utility approach
iconv -f ISO-8859-1 -t UTF-8 in.txt > out.txt
-f ENCODING the encoding of the input
-t ENCODING the encoding of the output
You don't have to specify either of these arguments. They will default to your current locale, which is usually UTF-8.
The character encoding of all matching text files gets detected automatically and all matching text files are converted to utf-8
encoding:
$ find . -type f -iname *.txt -exec sh -c 'iconv -f $(file -bi "$1" |sed -e "s/.*[ ]charset=//") -t utf-8 -o converted "$1" && mv converted "$1"' -- {} \;
To perform these steps, a sub shell sh
is used with -exec
, running a one-liner with the -c
flag, and passing the filename as the positional argument "$1"
with -- {}
. In between, the utf-8
output file is temporarily named converted
.
Whereby file -bi means:
-b
, --brief
Do not prepend filenames to output lines (brief mode).
-i
, --mime
Causes the file command to output mime type strings rather than the more traditional human readable ones. Thus it may say for example text/plain; charset=us-ascii
rather than ASCII text
. The sed
command cuts this to only us-ascii
as is required by iconv
.
The find
command is very useful for such file management automation.
Click here for more find galore.
On Windows I was able to use Notepad++ to do the conversion from ISO-8859-1 to UTF-8. Click "Encoding"
and then "Convert to UTF-8"
.
If you have vim
you can use this:
Not tested for every encoding.
The cool part about this is that you don't have to know the source encoding
vim +"set nobomb | set fenc=utf8 | x" filename.txt
Be aware that this command modify directly the file
+
: Used by vim to directly enter command when opening a file. Usualy used to open a file at a specific line: vim +14 file.txt
|
: Separator of multiple commands (like ;
in bash)set nobomb
: no utf-8 BOMset fenc=utf8
: Set new encoding to utf-8 doc linkx
: Save and close filefilename.txt
: path to the file"
: qotes are here because of pipes. (otherwise bash will use them as bash pipe)In powershell:
function Recode($InCharset, $InFile, $OutCharset, $OutFile) {
# Read input file in the source encoding
$Encoding = [System.Text.Encoding]::GetEncoding($InCharset)
$Text = [System.IO.File]::ReadAllText($InFile, $Encoding)
# Write output file in the destination encoding
$Encoding = [System.Text.Encoding]::GetEncoding($OutCharset)
[System.IO.File]::WriteAllText($OutFile, $Text, $Encoding)
}
Recode Windows-1252 "$pwd\in.txt" utf8 "$pwd\out.txt"
For a list of supported encoding names:
https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding
DOS/Windows: use Code page
chcp 65001>NUL
type ascii.txt > unicode.txt
Command chcp
can be used to change the code page. Code page 65001 is Microsoft name for UTF-8. After setting code page, the output generated by following commands will be of code page set.