How can I normalize the EOL character in Java?

若如初见. 提交于 2019-12-03 01:15:15
lalli

Combining the two answers (by Visage & eumiro):

EDIT: After reading the comment. line. System.getProperty("line.separator") has no use then.
Before sending the file to server, open it replace all the EOLs and writeback
Make sure to use DataStreams to do so, and write in binary

String fileString;
//..
//read from the file
//..
//for windows
fileString = fileString.replaceAll("\\r\\n", "\n");
fileString = fileString.replaceAll("\\r", "\n");
//..
//write to file in binary mode.. something like:
DataOutputStream os = new DataOutputStream(new FileOutputStream("fname.txt"));
os.write(fileString.getBytes());
//..
//send file
//..

The replaceAll method has two arguments, the first one is the string to replace and the second one is the replacement. But, the first one is treated as a regular expression, so, '\' is interpreted that way. So:

"\\r\\n" is converted to "\r\n" by Regex
"\r\n" is converted to CR+LF by Java

Could you try this?

content.replaceAll("\\r\\n?", "\n")

Had to do this for a recent project. The method below will normalize the line endings in the given file to the line ending specified by the OS the JVM is running on. So if you JVM is running on Linux, this will normalize all line endings to LF (\n).

Also works on very large files due to the use of buffered streams.

public static void normalizeFile(File f) {      
    File temp = null;
    BufferedReader bufferIn = null;
    BufferedWriter bufferOut = null;        

    try {           
        if(f.exists()) {
            // Create a new temp file to write to
            temp = new File(f.getAbsolutePath() + ".normalized");
            temp.createNewFile();

            // Get a stream to read from the file un-normalized file
            FileInputStream fileIn = new FileInputStream(f);
            DataInputStream dataIn = new DataInputStream(fileIn);
            bufferIn = new BufferedReader(new InputStreamReader(dataIn));

            // Get a stream to write to the normalized file
            FileOutputStream fileOut = new FileOutputStream(temp);
            DataOutputStream dataOut = new DataOutputStream(fileOut);
            bufferOut = new BufferedWriter(new OutputStreamWriter(dataOut));

            // For each line in the un-normalized file
            String line;
            while ((line = bufferIn.readLine()) != null) {
                // Write the original line plus the operating-system dependent newline
                bufferOut.write(line);
                bufferOut.newLine();                                
            }

            bufferIn.close();
            bufferOut.close();

            // Remove the original file
            f.delete();

            // And rename the original file to the new one
            temp.renameTo(f);
        } else {
            // If the file doesn't exist...
            log.warn("Could not find file to open: " + f.getAbsolutePath());
        }
    } catch (Exception e) {
        log.warn(e.getMessage(), e);
    } finally {
        // Clean up, temp should never exist
        FileUtils.deleteQuietly(temp);
        IOUtils.closeQuietly(bufferIn);
        IOUtils.closeQuietly(bufferOut);
    }
}

Use

System.getProperty("line.separator")

That will give you the (local) EOL character(s). You can then use an analysis of the incomifile to determine what 'flavour' it is and convert accordingly.

Alternatively, get your clients to standardise!

Here is a comprehensive helper class to deal with EOL issues. It it partially based on the solution posted by tyjen.

import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;

import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;

/**
 * Helper class to deal with end-of-line markers in text files.
 * 
 * Loosely based on these examples:
 *  - http://stackoverflow.com/a/9456947/1084488 (cc by-sa 3.0)
 *  - http://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/tomcat/buildutil/CheckEol.java (Apache License v2.0)
 * 
 * This file is posted here to meet the "ShareAlike" requirement of cc by-sa 3.0:
 *    http://stackoverflow.com/a/27930311/1084488
 * 
 * @author Matthias Stevens
 */
public class EOLUtils
{

    /**
     * Unix-style end-of-line marker (LF)
     */
    private static final String EOL_UNIX = "\n";

    /**
     * Windows-style end-of-line marker (CRLF)
     */
    private static final String EOL_WINDOWS = "\r\n";

    /**
     * "Old Mac"-style end-of-line marker (CR)
     */
    private static final String EOL_OLD_MAC = "\r";

    /**
     * Default end-of-line marker on current system
     */
    private static final String EOL_SYSTEM_DEFAULT = System.getProperty( "line.separator" );

    /**
     * The support end-of-line marker modes
     */
    public static enum Mode
    {
        /**
         * Unix-style end-of-line marker ("\n")
         */
        LF,

        /**
         * Windows-style end-of-line marker ("\r\n") 
         */
        CRLF,

        /**
         * "Old Mac"-style end-of-line marker ("\r")
         */
        CR
    }

    /**
     * The default end-of-line marker mode for the current system
     */
    public static final Mode SYSTEM_DEFAULT = ( EOL_SYSTEM_DEFAULT.equals( EOL_UNIX ) ? Mode.LF : ( EOL_SYSTEM_DEFAULT
        .equals( EOL_WINDOWS ) ? Mode.CRLF : ( EOL_SYSTEM_DEFAULT.equals( EOL_OLD_MAC ) ? Mode.CR : null ) ) );
    static
    {
        // Just in case...
        if ( SYSTEM_DEFAULT == null )
        {
            throw new IllegalStateException( "Could not determine system default end-of-line marker" );
        }
    }

    /**
     * Determines the end-of-line {@link Mode} of a text file.
     * 
     * @param textFile the file to investigate
     * @return the end-of-line {@link Mode} of the given file, or {@code null} if it could not be determined
     * @throws Exception
     */
    public static Mode determineEOL( File textFile )
        throws Exception
    {
        if ( !textFile.exists() )
        {
            throw new IOException( "Could not find file to open: " + textFile.getAbsolutePath() );
        }

        FileInputStream fileIn = new FileInputStream( textFile );
        BufferedInputStream bufferIn = new BufferedInputStream( fileIn );
        try
        {
            int prev = -1;
            int ch;
            while ( ( ch = bufferIn.read() ) != -1 )
            {
                if ( ch == '\n' )
                {
                    if ( prev == '\r' )
                    {
                        return Mode.CRLF;
                    }
                    else
                    {
                        return Mode.LF;
                    }
                }
                else if ( prev == '\r' )
                {
                    return Mode.CR;
                }
                prev = ch;
            }
            throw new Exception( "Could not determine end-of-line marker mode" );
        }
        catch ( IOException ioe )
        {
            throw new Exception( "Could not determine end-of-line marker mode", ioe );
        }
        finally
        {
            // Clean up:
            IOUtils.closeQuietly( bufferIn );
        }
    }

    /**
     * Checks whether the given text file has Windows-style (CRLF) line endings.
     * 
     * @param textFile the file to investigate
     * @return
     * @throws Exception
     */
    public static boolean hasWindowsEOL( File textFile )
        throws Exception
    {
        return Mode.CRLF.equals( determineEOL( textFile ) );
    }

    /**
     * Checks whether the given text file has Unix-style (LF) line endings.
     * 
     * @param textFile the file to investigate
     * @return
     * @throws Exception
     */
    public static boolean hasUnixEOL( File textFile )
        throws Exception
    {
        return Mode.LF.equals( determineEOL( textFile ) );
    }

    /**
     * Checks whether the given text file has "Old Mac"-style (CR) line endings.
     * 
     * @param textFile the file to investigate
     * @return
     * @throws Exception
     */
    public static boolean hasOldMacEOL( File textFile )
        throws Exception
    {
        return Mode.CR.equals( determineEOL( textFile ) );
    }

    /**
     * Checks whether the given text file has line endings that conform to the system default mode (e.g. LF on Unix).
     * 
     * @param textFile the file to investigate
     * @return
     * @throws Exception
     */
    public static boolean hasSystemDefaultEOL( File textFile )
        throws Exception
    {
        return SYSTEM_DEFAULT.equals( determineEOL( textFile ) );
    }

    /**
     * Convert the line endings in the given file to Unix-style (LF).
     * 
     * @param textFile the file to process
     * @throws IOException
     */
    public static void convertToUnixEOL( File textFile )
        throws IOException
    {
        convertLineEndings( textFile, EOL_UNIX );
    }

    /**
     * Convert the line endings in the given file to Windows-style (CRLF).
     * 
     * @param textFile the file to process
     * @throws IOException
     */
    public static void convertToWindowsEOL( File textFile )
        throws IOException
    {
        convertLineEndings( textFile, EOL_WINDOWS );
    }

    /**
     * Convert the line endings in the given file to "Old Mac"-style (CR).
     * 
     * @param textFile the file to process
     * @throws IOException
     */
    public static void convertToOldMacEOL( File textFile )
        throws IOException
    {
        convertLineEndings( textFile, EOL_OLD_MAC );
    }

    /**
     * Convert the line endings in the given file to the system default mode.
     * 
     * @param textFile the file to process
     * @throws IOException
     */
    public static void convertToSystemEOL( File textFile )
        throws IOException
    {
        convertLineEndings( textFile, EOL_SYSTEM_DEFAULT );
    }

    /**
     * Line endings conversion method.
     * 
     * @param textFile the file to process
     * @param eol the end-of-line marker to use (as a {@link String})
     * @throws IOException 
     */
    private static void convertLineEndings( File textFile, String eol )
        throws IOException
    {
        File temp = null;
        BufferedReader bufferIn = null;
        BufferedWriter bufferOut = null;

        try
        {
            if ( textFile.exists() )
            {
                // Create a new temp file to write to
                temp = new File( textFile.getAbsolutePath() + ".normalized" );
                temp.createNewFile();

                // Get a stream to read from the file un-normalized file
                FileInputStream fileIn = new FileInputStream( textFile );
                DataInputStream dataIn = new DataInputStream( fileIn );
                bufferIn = new BufferedReader( new InputStreamReader( dataIn ) );

                // Get a stream to write to the normalized file
                FileOutputStream fileOut = new FileOutputStream( temp );
                DataOutputStream dataOut = new DataOutputStream( fileOut );
                bufferOut = new BufferedWriter( new OutputStreamWriter( dataOut ) );

                // For each line in the un-normalized file
                String line;
                while ( ( line = bufferIn.readLine() ) != null )
                {
                    // Write the original line plus the operating-system dependent newline
                    bufferOut.write( line );
                    bufferOut.write( eol ); // write EOL marker
                }

                // Close buffered reader & writer:
                bufferIn.close();
                bufferOut.close();

                // Remove the original file
                textFile.delete();

                // And rename the original file to the new one
                temp.renameTo( textFile );
            }
            else
            {
                // If the file doesn't exist...
                throw new IOException( "Could not find file to open: " + textFile.getAbsolutePath() );
            }
        }
        finally
        {
            // Clean up, temp should never exist
            FileUtils.deleteQuietly( temp );
            IOUtils.closeQuietly( bufferIn );
            IOUtils.closeQuietly( bufferOut );
        }
    }

}
public static String normalize(String val) {
    return val.replace("\r\n", "\n")
            .replace("\r", "\n");
}

For HTML:

public static String normalize(String val) {
    return val.replace("\r\n", "<br/>")
            .replace("\n", "<br/>")
            .replace("\r", "<br/>");
}

solution to change the file ending with recursive search in path

package handleFileLineEnd;

import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.OpenOption;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;

import sun.awt.image.BytePackedRaster;

public class handleFileEndingMain {

    static int carriageReturnTotal;
    static int newLineTotal;

    public static void main(String[] args)  throws IOException
    {       
        processPath("c:/temp/directories");

        System.out.println("carriageReturnTotal  (files have issue): " + carriageReturnTotal);

        System.out.println("newLineTotal: " + newLineTotal);
    }

    private static void processPath(String path) throws IOException
    {
        File dir = new File(path);
        File[] directoryListing = dir.listFiles();

        if (directoryListing != null) {
            for (File child : directoryListing) {
                if (child.isDirectory())                
                    processPath(child.toString());              
                else
                    checkFile(child.toString());
            }
        } 


    }

    private static void checkFile(String fileName) throws IOException
    {
        Path path = FileSystems.getDefault().getPath(fileName);

        byte[] bytes= Files.readAllBytes(path);

        for (int counter=0; counter<bytes.length; counter++)
        {
            if (bytes[counter] == 13)
            {
                carriageReturnTotal = carriageReturnTotal + 1;

                System.out.println(fileName);
                modifyFile(fileName);
                break;
            }
            if (bytes[counter] == 10)
            {
                newLineTotal = newLineTotal+ 1;
                //System.out.println(fileName);
                break;
            }
        }

    }

    private static void modifyFile(String fileName) throws IOException
    {

        Path path = Paths.get(fileName);
        Charset charset = StandardCharsets.UTF_8;

        String content = new String(Files.readAllBytes(path), charset);
        content = content.replaceAll("\r\n", "\n");
        content = content.replaceAll("\r", "\n");
        Files.write(path, content.getBytes(charset));
    }
}

Although String.replaceAll() is simpler to code, this should perform better since it doesn't go through the regex infrastructure.

    /**
 * Accepts a non-null string and returns the string with all end-of-lines
 * normalized to a \n.  This means \r\n and \r will both be normalized to \n.
 * <p>
 *     Impl Notes:  Although regex would have been easier to code, this approach
 *     will be more efficient since it's purpose built for this use case.  Note we only
 *     construct a new StringBuilder and start appending to it if there are new end-of-lines
 *     to be normalized found in the string.  If there are no end-of-lines to be replaced
 *     found in the string, this will simply return the input value.
 * </p>
 *
 * @param inputValue !null, input value that may or may not contain new lines
 * @return the input value that has new lines normalized
 */
static String normalizeNewLines(String inputValue){
    StringBuilder stringBuilder = null;
    int index = 0;
    int len = inputValue.length();
    while (index < len){
        char c = inputValue.charAt(index);
        if (c == '\r'){
            if (stringBuilder == null){
                stringBuilder = new StringBuilder();
                // build up the string builder so it contains all the prior characters
                stringBuilder.append(inputValue.substring(0, index));
            }
            if ((index + 1 < len) &&
                inputValue.charAt(index + 1) == '\n'){
                // this means we encountered a \r\n  ... move index forward one more character
                index++;
            }
            stringBuilder.append('\n');
        }else{
            if (stringBuilder != null){
                stringBuilder.append(c);
            }
        }
        index++;
    }
    return stringBuilder == null ? inputValue : stringBuilder.toString();
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!