问题
I have been given a large text as input. I have made a HashMap that stores each different word as a key, and number of times that occurs as value (Integer).
Now I have to make a method called mostOften(int k):List that return a List that gives the first k-words that from max number of occurrence to min number of occurrence ( descending order ) using the HashMap that I have made before. The problem is that whenever 2 words have the same number of occurrence, then they should be sorted alphabetically.
The first idea that was on my mind was to swap keys and values of the given HashMap, and put it into TreeMap and TreeMap will sort the words by the key(Integer - number of occurrence of the word ) and then just pop the last/first K-entries from the TreeMap.
But I will have collision for sure, when the number of 2 or 3 words are the same. I will compare the words alphabetically but what Integer should I put as a key of the second word comming.
Any ideas how to implement this, or other options ?
回答1:
Here's the solution with I come up.
- First you create a class
MyWord
that can store theString
value of the word and the number of occurences it appears. - You implement the
Comparable
interface for this class to sort by occurences first and then alphabetically if the number of occurences is the same - Then for the most often method, you create a new
List
ofMyWord
from your originalmap
. You add the entries of this to yourList
- You sort this list
- You take the k-first items of this list using
subList
- You add those
Strings
to theList<String>
and you return it
public class Test {
public static void main(String [] args){
Map<String, Integer> m = new HashMap<>();
m.put("hello",5);
m.put("halo",5);
m.put("this",2);
m.put("that",2);
m.put("good",1);
System.out.println(mostOften(m, 3));
}
public static List<String> mostOften(Map<String, Integer> m, int k){
List<MyWord> l = new ArrayList<>();
for(Map.Entry<String, Integer> entry : m.entrySet())
l.add(new MyWord(entry.getKey(), entry.getValue()));
Collections.sort(l);
List<String> list = new ArrayList<>();
for(MyWord w : l.subList(0, k))
list.add(w.word);
return list;
}
}
class MyWord implements Comparable<MyWord>{
public String word;
public int occurence;
public MyWord(String word, int occurence) {
super();
this.word = word;
this.occurence = occurence;
}
@Override
public int compareTo(MyWord arg0) {
int cmp = Integer.compare(arg0.occurence,this.occurence);
return cmp != 0 ? cmp : word.compareTo(arg0.word);
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + occurence;
result = prime * result + ((word == null) ? 0 : word.hashCode());
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
MyWord other = (MyWord) obj;
if (occurence != other.occurence)
return false;
if (word == null) {
if (other.word != null)
return false;
} else if (!word.equals(other.word))
return false;
return true;
}
}
Output : [halo, hello, that]
回答2:
Hints:
Look at the javadocs for the
Collections.sort
methods ... both of them!Look at the javadocs for
Map.entries()
.Think about how to implement a
Comparator
that compares instances of a class with two fields, using the 2nd as a "tie breaker" when the other compares as equal.
回答3:
In addition to your Map
to store word counts I would use a PriorityQueue
of fixed size K (with natural order). It will allow to reach O(N) complexity. Here is a code which use this approach:
In constructor we are reading input stream word by word filling the counters in the Map.
In the same time we are updating priority queue keeping it's max size = K (we need count top K words)
public class TopNWordsCounter
{
public static class WordCount
{
String word;
int count;
public WordCount(String word)
{
this.word = word;
this.count = 1;
}
}
private PriorityQueue<WordCount> pq;
private Map<String, WordCount> dict;
public TopNWordsCounter(Scanner scanner)
{
pq = new PriorityQueue<>(10, new Comparator<WordCount>()
{
@Override
public int compare(WordCount o1, WordCount o2)
{
return o2.count-o1.count;
}
});
dict = new HashMap<>();
while (scanner.hasNext())
{
String word = scanner.next();
WordCount wc = dict.get(word);
if (wc == null)
{
wc = new WordCount(word);
dict.put(word, wc);
}
if (pq.contains(wc))
{
pq.remove(wc);
wc.count++;
pq.add(wc);
}
else
{
wc.count++;
if (pq.size() < 10 || wc.count >= pq.peek().count)
{
pq.add(wc);
}
}
if (pq.size() > 10)
{
pq.poll();
}
}
}
public List<String> getTopTenWords()
{
Stack<String> topTen = new Stack<>();
while (!pq.isEmpty())
{
topTen.add(pq.poll().word);
}
return topTen;
}
}
来源:https://stackoverflow.com/questions/20453629/how-to-get-n-most-often-words-in-given-text-sorted-from-max-to-min