I am writing a program in Java that requires me to compare the data in 2 files. I have to check each line from file 1 against each line of file 2 and if I find a match write the
Obviously you could just close and reopen the file like this:
while((s1=file1.data.readLine())!=null){
System.out.println("s1: "+s1);
FileReader file2=new FileReader("d:\\testfiles\\FILE2.txt");
while((s2=file2.data.readLine())!=null){
System.out.println("s2: "+s2);
//compare s1 and s2;
}
file2.closeFile()
}
But you really don't want to do it that way, since this algorithm's running time is O(n2). if there were 1000 lines in file A, and 10000 lines in file B, your inner loop would run 1,000,000 times.
What you should do is read each line and store it in a collection that allows quick checks to see if an item is already contained(probably a HashSet).
If you only need to check to see that every line in file 2 is in file 1, then you just add each line in file one to a HashSet, and then check to see that every line in file 2 is in that set.
If you need to do a cross comparison where you find every string that's in one but not the other, then you'll need two hash sets, one for each file. (Although there's a trick you could do to use just one)
If the files are so large that you don't have enough memory, then your original n2 method would never have worked anyway.
I believe RandomAccessFile is what you need. It contains: RandomAccessFile#seek
and RandomAccessFile#getFilePointer
.
rewind()
is seek(0)
As noted, there are better algorithms - investigate these
aside:
FileReader doesn't implement mark and reset, so trashgod's comments are inaccurate. You'd either have to implement a version of this (using RandomAccessFile or what not) or wrap in a BufferedReader. However, the latter will load the whole thing in memory if you mark it
If you can clearly indentify the dimension of your file you can use mark(int readAheadLimit) and reset() from the class BufferedReader. The method mark(int readAhedLimit) add a marker to the current position of your BufferedReader and you can go back to the marker using reset().
Using them you have to be careful to the number of characters to read until the reset(), you have to specify them as the argument of the function mark(int readAhedLimit).
Assuming a limit of 100 characters your code should look like:
class MyFileReader {
BufferedReader data;
int maxNumberOfCharacters = 100;
public MyFileReader(String fileName)
{
try{
FileInputStream fstream = new FileInputStream(fileName);
data = new BufferedReader(new InputStreamReader(fstream));
//mark the current position, in this case the beginning of the file
data.mark(maxNumberOfCharacters);
}
catch (IOException e) {
e.printStackTrace();
}
}
public void resetFile(){
data.reset();
}
public void closeFile()
{
try{
in.close();
}
catch (IOException e) {
e.printStackTrace();
}
}
}
If you just want to reset the file pointer to the top of the file, reinitialize your buffer reader. I assume that you are also using the try and catch block to check for end of the file.
`//To read from a file.
BufferedReader read_data_file = new BufferedReader(new FileReader("Datafile.dat"));'
Let's say this is how you have your buffer reader defined. Now, this is how you can check for end of file=null.
boolean has_data= true;
while(has_data)
{
try
{
record = read_data_file.readLine();
delimit = new StringTokenizer(record, ",");
//Reading the input in STRING format.
cus_ID = delimit.nextToken();
cus_name = delimit.nextToken();'
//And keep grabbing the data and save it in appropriate fields.
}
catch (NullPointerException e)
{
System.out.println("\nEnd of Data File... Total "+ num_of_records
+ " records were printed. \n \n");
has_data = false; //To exit the loop.
/*
------> This point is the trouble maker. Your file pointer is pointing at the end of the line.
-->If you want to again read all the data FROM THE TOP WITHOUT RECOMPILING:
Do this--> Reset the buffer reader to the top of the file.
*/
read_data_file = new BufferedReader(new FileReader(new File("datafile.dat")));
}
By reinitializing the buffer reader you will reset the file reader mark/pointer to the top of the file and you won't have to recompile the file to set the file reader marker/pointer to beginning/top of the file. You need to reinitialize the buffer reader only if you don't want to recompile and pull off the same stunt in the same run. But if you wish to just run loop one time then you don't have to all this, by simply recompiling the file, the file reader marker will be set to the top/beginning of the file.
well, Gennady S. answer is what I would use to solve your problem.
I am writing a program in Java that requires me to compare the data in 2 files
however, I would rather not code this up again.. I would rather use something like http://code.google.com/p/java-diff-utils/