Most Efficient Way to Check File for List of Words

不羁的心 提交于 2019-12-11 10:04:36

问题


I just had a homework assignment that wanted me to add all the Java keywords to a HashSet. Then read in a .java file, and count how many times any keyword appeared in the .java file.

The route I took was: Created an String[] array that contained all the keywords. Created a HashSet, and used Collections.addAll to add the array to the HashSet. Then as I iterated through the text file I would check it by HashSet.contains(currentWordFromFile);

Someone recommended using a HashTable to do this. Then I seen a similar example using a TreeSet. I was just curious.. what's the recommended way to do this?

(Complete code here: http://pastebin.com/GdDmCWj0 )


回答1:


Try a Map<String, Integer> where the String is the word and the Integer is the number of times the word has been seen.

One benefit of this is that you do not need to process the file twice.




回答2:


You said "had a homework assignment" so I'm assuming you're done with this.

I would do it a bit differently. Firstly, I think some of the keywords in your String array were incorrect. According to Wikipedia and Oracle, Java has 50 keywords. Anyway, I've commented my code fairly well. Here's what I came up with...

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Map;
import java.util.HashMap;

public class CountKeywords {

    public static void main(String args[]) {

        String[] theKeywords = { "abstract", "assert", "boolean", "break", "byte", "case", "catch", "char", "class", "const", "continue", "default", "do", "double", "else", "enum", "extends", "false", "final", "finally", "float", "for", "goto", "if", "implements", "import", "instanceof", "int", "interface", "long", "native", "new", "null", "package", "private", "protected", "public", "return", "short", "static", "strictfp", "super", "switch", "synchronized", "this", "throw", "throws", "transient", "true", "try", "void", "volatile", "while" };

        // put each keyword in the map with value 0 
        Map<String, Integer> theKeywordCount = new HashMap<String, Integer>();
        for (String str : theKeywords) {
            theKeywordCount.put(str, 0);
        }

        FileReader fr;
        BufferedReader br;
        File file = new File(args[0]);

        // attempt to open and read file
        try {
            fr = new FileReader(file);
            br = new BufferedReader(fr);

            String sLine;

            // read lines until reaching the end of the file
            while ((sLine = br.readLine()) != null) {

                // if an empty line was read
                if (sLine.length() != 0) {

                    // extract the words from the current line in the file
                    if (theKeywordCount.containsKey(sLine)) {
                        theKeywordCount.put(sLine, theKeywordCount.get(sLine) + 1);
                    }
                }
            }

        } catch (FileNotFoundException exception) {
            // Unable to find file.
            exception.printStackTrace();
        } catch (IOException exception) {
            // Unable to read line.
            exception.printStackTrace();
        } finally {
                br.close();
            }

        // count how many times each keyword was encontered
        int occurrences = 0;
        for (Integer i : theKeywordCount.values()) {
            occurrences += i;
        }

        System.out.println("\n\nTotal occurences in file: " + occurrences);
    }
}

Every time I encounter a keyword from the file, I first check if its in the Map; if it isn't, its not a valid keyword; if it is, then I update the value the keyword is associated with, i.e., I increment the associated Integer by 1 because we've seen this keyword once more.

Alternatively, you could get rid of that last for loop and just keep a running count, so you would instead have...

if (theKeywordCount.containsKey(sLine)) {
    occurrences++;
}

... and you print out the counter at the end.

I don't know if this is the most efficient way to do this, but I think its a solid start.

Let me know if you have any questions. I hope this helps.
Hristo



来源:https://stackoverflow.com/questions/5799693/most-efficient-way-to-check-file-for-list-of-words

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!