I\'m a Java developer and I\'m using Ubuntu to develop. The project was created in Windows with Eclipse and it\'s using the Windows-1252 encoding.
To convert to UTF-8
Go back to Windows, tell Eclipse to change the encoding to UTF-8, then back to Unix and run d2u
on the files.
Actually, vim does allow what you're looking for. Enter vim, and type the following commands:
:args **/*.java
:argdo set ff=unix | update | next
The first of these commands sets the argument list to every file matching **/*.java
, which is all Java files, recursively. The second of these commands does the following to each file in the argument list, in turn:
In order to overcome
Ambiguous output in step `CR-LF..data'
simply solution might be to add -f
flag to force conversion.
There should be a program called dos2unix
that will fix line endings for you. If it's not already on your Linux box, it should be available via the package manager.
Did you try the python script by Bryan Maupin found here ? (I've modified it a little bit to be more generic)
#!/usr/bin/env python
import sys
input_file_name = sys.argv[1]
output_file_name = sys.argv[2]
input_file = open(input_file_name)
output_file = open(output_file_name, 'w')
line_number = 0
for input_line in input_file:
line_number += 1
try: # first try to decode it using cp1252 (Windows, Western Europe)
output_line = input_line.decode('cp1252').encode('utf8')
except UnicodeDecodeError, error: # if there's an error
sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error)) # write to stderr
try: # then if that fails, try to decode using latin1 (ISO 8859-1)
output_line = input_line.decode('latin1').encode('utf8')
except UnicodeDecodeError, error: # if there's an error
sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error)) # write to stderr
sys.exit(1) # and just keep going
output_file.write(output_line)
input_file.close()
output_file.close()
You can use that script with
$ ./cp1252_utf8.py file_cp1252.sql file_utf8.sql
The tr command can also do this:
tr -d '\15\32' < winfile.txt > unixfile.txt
and should be available to you.
You'll need to run tr from within a script, since it cannot work with file names. For example, create a file myscript.sh:
#!/bin/bash
for f in `find -iname \*.java`; do
echo "$f"
tr -d '\15\32' < "$f" > "$f.tr"
mv "$f.tr" "$f"
recode CP1252...UTF-8 "$f"
done
Running myscript.sh
would process all the java files in the current directory and its subdirectories.