Collect HashSet / Java 8 / Regex Pattern / Stream API

后端 未结 6 770
日久生厌
日久生厌 2020-12-29 09:42

Recently I change version of the JDK 8 instead 7 of my project and now I overwrite some code snippets using new features that came with Java 8.

final Matcher         


        
相关标签:
6条回答
  • 2020-12-29 10:11

    Marko's answer demonstrates how to get matches into a stream using a Spliterator. Well done, give that man a big +1! Seriously, make sure you upvote his answer before you even consider upvoting this one, since this one is entirely derivative of his.

    I have only a small bit to add to Marko's answer, which is that instead of representing the matches as an array of strings (with each array element representing a match group), the matches are better represented as a MatchResult which is a type invented for this purpose. Thus the result would be a Stream<MatchResult> instead of Stream<String[]>. The code gets a little simpler, too. The tryAdvance code would be

        if (m.find()) {
            action.accept(m.toMatchResult());
            return true;
        } else {
            return false;
        }
    

    The map call in his email-matching example would change to

        .map(mr -> mr.group(2))
    

    and the OP's example would be rewritten as

    Set<String> set = matcherStream(mtr)
                          .map(mr -> mr.group(0).toLowerCase())
                          .collect(toSet());
    

    Using MatchResult gives a bit more flexibility in that it also provides offsets of match groups within the string, which could be useful for certain applications.

    0 讨论(0)
  • 2020-12-29 10:22

    A Matcher-based spliterator implementation can be quite simple if you reuse the JDK-provided Spliterators.AbstractSpliterator:

    public class MatcherSpliterator extends AbstractSpliterator<String[]>
    {
      private final Matcher m;
    
      public MatcherSpliterator(Matcher m) {
        super(Long.MAX_VALUE, ORDERED | NONNULL | IMMUTABLE);
        this.m = m;
      }
    
      @Override public boolean tryAdvance(Consumer<? super String[]> action) {
        if (!m.find()) return false;
        final String[] groups = new String[m.groupCount()+1];
        for (int i = 0; i <= m.groupCount(); i++) groups[i] = m.group(i);
        action.accept(groups);
        return true;
      }
    }
    

    Note that the spliterator provides all matcher groups, not just the full match. Also note that this spliterator supports parallelism because AbstractSpliterator implements a splitting policy.

    Typically you will use a convenience stream factory:

    public static Stream<String[]> matcherStream(Matcher m) {
      return StreamSupport.stream(new MatcherSpliterator(m), false);
    }
    

    This gives you a powerful basis to concisely write all kinds of complex regex-oriented logic, for example:

    private static final Pattern emailRegex = Pattern.compile("([^,]+?)@([^,]+)");
    public static void main(String[] args) {
      final String emails = "kid@gmail.com, stray@yahoo.com, miks@tijuana.com";
      System.out.println("User has e-mail accounts on these domains: " +
          matcherStream(emailRegex.matcher(emails))
          .map(gs->gs[2])
          .collect(joining(", ")));
    }
    

    Which prints

    User has e-mail accounts on these domains: gmail.com, yahoo.com, tijuana.com
    

    For completeness, your code will be rewritten as

    Set<String> set = matcherStream(mtr).map(gs->gs[0].toLowerCase()).collect(toSet());
    
    0 讨论(0)
  • 2020-12-29 10:25

    Here is the implementation using Spliterator interface.

        // To get the required set
       Set<String> result = (StreamSupport.stream(new MatcherGroupIterator(pattern,input ),false))
               .map( s -> s.toLowerCase() )
               .collect(Collectors.toSet());
        ...
        private static class MatcherGroupIterator implements Spliterator<String> {
          private final Matcher matcher;
    
          public MatcherGroupIterator(Pattern p, String s) {
            matcher = p.matcher(s);
          }
    
          @Override
          public boolean tryAdvance(Consumer<? super String> action) {
            if (!matcher.find()){
                return false;
            }
            action.accept(matcher.group());
            return true;
          }
    
          @Override
          public Spliterator<String> trySplit() {
            return null;
          }
    
          @Override
          public long estimateSize() {
            return Long.MAX_VALUE;
          }
    
          @Override
          public int characteristics() {
            return Spliterator.NONNULL;
          }
      }
    
    0 讨论(0)
  • 2020-12-29 10:30

    What about

    public class MakeItSimple {
    
    public static void main(String[] args) throws FileNotFoundException  {
    
        Scanner s = new Scanner(new File("C:\\Users\\Admin\\Desktop\\TextFiles\\Emails.txt"));
    
        HashSet<String> set = new HashSet<>();          
        while ( s.hasNext()) {
           String r = s.next();
           if (r.matches("([^,]+?)@([^,]+)")) {
              set.add(r);
           }
        }   
        set.stream().map( x -> x.toUpperCase()).forEach(x -> print(x)); 
        s.close();
      }
    }
    
    0 讨论(0)
  • 2020-12-29 10:35

    What about Pattern.splitAsStream ?

    Stream<String> stream = Pattern.compile(regex).splitAsStream(input);
    

    and then a collector to get a set.

    Set<String> set = stream.map(String::toLowerCase).collect(Collectors.toSet());
    
    0 讨论(0)
  • 2020-12-29 10:36

    I don't think you can turn this into a Stream without writing your own Spliterator, but, I don't know why you would want to.

    Matcher.find() is a state changing operation on the Matcher object so running each find() in a parallel stream would produce inconsistent results. Running the stream in serial wouldn't have better performance that the Java 7 equivalent and would be harder to understand.

    0 讨论(0)
提交回复
热议问题