How to handle file with different line separator in java?

邮差的信 提交于 2019-12-14 02:27:29

问题


I have a huge file (more than 3GB) that contains a single long line in the following format. "1243@818@9287@543"

Then the data I want to analyze is separated with "@". My idea is to change the default end of line character used by Java ans set "@".

I'm trying with the following code using "System.setProperty("line.separator", "@");" but is not working, since is printing the complete line and for this test I'd like as output.

1243
818
9287
543

How can I change the default line separator to "@"?

package test;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

public class Test {
    public static void main(String[] args) throws FileNotFoundException, IOException {
        System.setProperty("line.separator", "@");

        File testFile = new File("./Mypath/myfile");
        BufferedReader br = new BufferedReader(new FileReader(testFile));
        for(String line; (line = br.readLine()) != null; ) {
        // Process each the line.
            System.out.println(line); 
        }
    }

}

Thanks in advance for any help.


回答1:


Then the data I want to analyze is separated with "@". My idea is to change the default end of line character used by Java ans set "@".

I wouldn't do that as it might break God knows what else that is depending on line.separator.

As for why this doesn't work, I'm sorry to say this is a case of RTFM not being done. This is what the Javadocs for BufferedReader.readLine has to say:

public String readLine()
                throws IOException
Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
Returns: A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
Throws: IOException - If an I/O error occurs

The API docs for the readLine() method clearly says that it looks for '\n' or '\r'. It does not say it depends on line.separator.

The line.separator property is only for developing API's that need a portable, platform-independent mechanism that identifies line separators. That is all. This system property is not for controlling the internal mechanisms of Java's IO classes.

I think you are over-complicating things. Just do it the old fashion way by reading n-number of characters (say 1024KB) on a buffer, and scan for each '@' delimiter. That introduces complications such as normal cases where data between '@' delimiters get split between buffers.

So, I would suggest just read one character off the buffered reader (this is not that bad and does not typically hit IO excessively since the buffered reader does... tada... buffering for you.)

Pump each character to a string builder, and every time you find a '@' delimiter, you flush the content of the string builder to standard output or whatever (since that would represent a datum off your '@' file.)

Get the algorithm to work correctly first. Optimize later. This is the pseudo-code below, no guarantees there are no compilation errors. You should be able to trivially flesh it out in syntactically correct Java:

File testFile = new File("./Mypath/myfile");
int buffer_size = 1024 * 1024
BufferedReader br = new BufferedReader(new FileReader(testFile), buffer_size);

StringBuilder bld = StringBuilder();
int c = br.read();

while(c != -1){
    char z = (char)c;
    if(z == '@'){
        System.out.println(bld);
        if(bld.length() > 0){
            bld.delete(0, bld.length() - 1);
        }
    } else {
        bld.append(z);
    }
}



回答2:


read() char by char and append() it up to a StringBuilder until you get @




回答3:


A possbile way to do this (with smaller files) is the usage of the Scanner class:

public static void main(String[] args) throws FileNotFoundException {
    final File file = new File("test.txt");
    try (final Scanner scan = new Scanner(file)) {
        scan.useDelimiter("@");
        while(scan.hasNext()) {
            System.out.println(scan.next());
        }
    }
}

test.txt:

1243@818@9287@543

Output:

1243
818
9287
543

But since your file is very large you should avoid using Scanner, use Jigars solution with BufferedReader instead. However, if you have a chance to use smaller files, then this might become handy.




回答4:


I'm not sure if this is what you want, but you could read the entire line in as a String, and then use the method String.split(String regex) which will return an array of Strings. These Strings will be the numbers between the @. You could then iterate through the array and print out each number on a line, or analyze the data however you want.

For example:

package test;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

public class Test {
    public static void main(String[] args) throws FileNotFoundException, IOException {
        System.setProperty("line.separator", "@");

        File testFile = new File("./Mypath/myfile");
        Scanner fileScanner = new Scanner(testFile);
        String myString = fileScanner.nextLine();
        String[] data = myString.split("@");

        // Process data
    }
}

If you need to convert the numbers to integers, use Integer.parseInt(String)

Hope I helped!



来源:https://stackoverflow.com/questions/27049443/how-to-handle-file-with-different-line-separator-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!