Searching anagrams with Java 8

我的未来我决定 提交于 2021-02-08 06:09:51

问题


I have to write program which should be reading file for anagrams and show word + his anagrams. Txt files is very big, after using scanner, listOfWords size is: 25000.

Output example:

word anagram1 anagram2 anagram3 ...
word2 anagram1 anagram2...

I have code, it works but very slow:

  private static List<String> listOfWords = new ArrayList<String>();
  private static List<ArrayList<String>> allAnagrams = new ArrayList<ArrayList<String>>();

  public static void main(String[] args) throws Exception {
    URL url = new URL("www.xxx.pl/textFile.txt");
    Scanner scanner = new Scanner(url.openStream());
    while (scanner.hasNext()) {
      String nextToken = scanner.next();
      listOfWords.add(nextToken);
    }
    scanner.close();

    while (listOfWords.isEmpty() == false) {
      ArrayList<String> anagramy = new ArrayList<String>();
      String wzor = listOfWords.remove(0);
      anagramy.add(wzor);
      char[] ch = wzor.toCharArray();
      Arrays.sort(ch);
      for (int i = 0; i < listOfWords.size(); i++) {
        String slowo = listOfWords.get(i);
        char[] cha = slowo.toCharArray();
        Arrays.sort(cha);
        if (Arrays.equals(ch, cha)) {
          anagramy.add(slowo);
          listOfWords.remove(i);
          i--;
        }
      }
      allAnagrams.add(anagramy);
    }

    for (ArrayList<String> ar : allAnagrams) {
      String result = "";
      if (ar.size() > 1) {
        for (int i = 1; i < ar.size(); i++) {
          result = ar.get(i) + " ";
        }
        System.out.println(ar.get(0) + " " + result);
      }
    }
  }

I have to write it with Java 8 - streams but I don't know. It is possible to use Streams for reading from URL + searching anagrams? Could you help me with searching anagrams by Stream? Teacher told me that code should be shorter that mine with reading a whole list. Only a few lines, is that possible?


回答1:


You can read the words from the file into a List or directly create a Stream of it:

try (InputStream is = new URL("http://www.someurl.pl/file.txt").openConnection().getInputStream();
     BufferedReader reader = new BufferedReader(new InputStreamReader(is));
     Stream<String> stream = reader.lines()) {
       //do something with stream
}

Then just stream over the list and collect the anagrams, where all words that have the same sorted list of characters are considered anagrams:

Map<String, List<String>> anagrams =
    stream.collect(Collectors.groupingBy(w -> sorted(w)));

The sorted method is just sorting the letters as you did in your example:

public static String sorted(String word) {
    char[] chars = word.toCharArray();
    Arrays.sort(chars);
    return new String(chars);
}



回答2:


Let's create separate method which sorts letters. You can do this with Stream API as well:

private static String canonicalize(String s) {
    return Stream.of(s.split("")).sorted().collect(Collectors.joining());
}

Now you can read some Reader, extract words from it and group words by canonical form:

Map<String, Set<String>> map = new BufferedReader(reader).lines()
             .flatMap(Pattern.compile("\\W+")::splitAsStream)
             .collect(Collectors.groupingBy(Anagrams::canonicalize, Collectors.toSet()));

Next, you can remove single letter groups using Stream API for the third time:

return map.values().stream().filter(list -> list.size() > 1).collect(Collectors.toList());

Now you can pass some reader to this code to extract anagrams from it. Here's complete code:

import java.io.*;
import java.util.*;
import java.util.regex.Pattern;
import java.util.stream.*;

public class Anagrams {
    private static String canonicalize(String s) {
        return Stream.of(s.split("")).sorted().collect(Collectors.joining());
    }

    public static List<Set<String>> getAnagrams(Reader reader) {
    Map<String, Set<String>> map = new BufferedReader(reader).lines()
                                     .flatMap(Pattern.compile("\\W+")::splitAsStream)
                                     .collect(Collectors.groupingBy(Anagrams::canonicalize, Collectors.toSet()));
        return map.values().stream().filter(list -> list.size() > 1).collect(Collectors.toList());
    }

    public static void main(String[] args) throws IOException {
        getAnagrams(new StringReader("abc cab tat aaa\natt tat bbb"))
                .forEach(System.out::println);
    }
}

It prints

[att, tat]
[abc, cab]

If you want to use an URL, just replace the StringReader with new InputStreamReader(new URL("www.xxx.pl/textFile.txt").openStream(), StandardCharsets.UTF_8)


If you want to extract the first element of the anagram set, the solution should be modified slightly:

public static Map<String, Set<String>> getAnagrams(Reader reader) {
    Map<String, List<String>> map = new BufferedReader(reader).lines()
       .flatMap(Pattern.compile("\\W+")::splitAsStream)
       .distinct() // remove repeating words
       .collect(Collectors.groupingBy(Anagrams::canonicalize));
    return map.values().stream()
       .filter(list -> list.size() > 1)
       .collect(Collectors.toMap(list -> list.get(0), 
                                 list -> new TreeSet<>(list.subList(1, list.size()))));
}

Here the result is the map where the key is the first element in anagram set (first occurred in the input file) and the value is the rest elements sorted alphabetically (I make a sublist to skip the first element, then move them into TreeSet to perform sorting; an alternative would be list.stream().skip(1).sorted().collect(Collectors.toList())).

Example usage:

getAnagrams(new StringReader("abc cab tat aaa\natt tat bbb\ntta\ncabr\nrbac cab crab cabrc cabr"))
        .entrySet().forEach(System.out::println);



回答3:


You can try this method

//---------------Anagram---------------------------------
    String w1 = "Triangle".toLowerCase(), w2 = "Integral".toLowerCase();
    HashMap<String, Integer> w1Map = new HashMap<String, Integer>();
    HashMap<String, Integer> w2Map = new HashMap<String, Integer>();

    w1Map = convertToHashMap(w1);
    w2Map = convertToHashMap(w2);       

   if( !(w1.equals(w2)) && (w1Map.keySet().equals(w2Map.keySet()))) 
       System.out.println(w1+" and "+w2+" are anagrams");
   else 
       System.out.println(w1+" and "+w2+" are not anagrams");

calls below method

public static HashMap<String, Integer> convertToHashMap(String s) {
    HashMap<String, Integer> wordMap = new HashMap<String, Integer>();
    for (int i = 0;i < s.length(); i++){
        wordMap.put(String.valueOf(s.charAt(i)), Integer.valueOf(s.charAt(i)));
    }
    return wordMap;


来源:https://stackoverflow.com/questions/40756599/searching-anagrams-with-java-8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!