Regexp to remove nested parenthesis

后端 未结 2 442
南方客
南方客 2021-01-23 17:34

I\'ve been stuck trying to write a regular expression in Java to remove everything in the parenthesis below while preserving everything else. Note that the parenthesis

相关标签:
2条回答
  • 2021-01-23 18:18

    Don't use regex here. As you could see from your example something like \\(.*?)\\) would try to find minimal match between first founded ( and next ) so in case of data like

    a (b (c d) e) f 
    

    regex \(.*?\) will match

    a (b (c d) e) f
      ^^^^^^^^
    

    and will leave e) part unmatched.

    You probably could write regex for this task because some regex flavors support recursion, but unfortunately regex engine used in Java doesn't.

    So to remove nested bracket you can write your own simple parser, like
    (I assume that text is well formatted so there are no such things like ({)} or unclosed bracket)

    String data = "1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 O-O 6. Be2 e5 7. dxe5 dxe5 8. Qxd8 Rxd8 9. Bg5 Nbd7 10. O-O-O {Diagram [#]} "
            + "Rf8 (10... Re8 11. Nb5 (11. Nd5)) (10... h6 11. Bxf6 Bxf6 12. Nd5) 11. Nd5 c6 (11... Nxe4 12. Nxc7 Rb8 13. Be3 b6 ) 12. Ne7+ Kh8 13. "
            + "Nxc8 Raxc8 14. Bxf6 (14. Be3) 14... Nxf6 15. Nd2 (15. Bd3) 15... Bh6 16. f3 Nd7 17. Kc2 Bxd2 (17... Rcd8 18. b4) 18. Rxd2 Nc5 19. b4 Ne6 20. Rd7 b5 "
            + "(20... Rcd8 21. Rxb7 Nd4+ 22. Kd3) 21. Rxa7 Nd4+ 22. Kd3 Rcd8 23. Ke3 Nc2+ 24. Kf2 Rd2 25. Rd1 Rfd8 26. Rxd2 {Diagram [#]} (26. cxb5 cxb5 "
            + "27. Rc7 Rxd1 28. Bxd1 Rd2+ 29. Kg3 Ne1 30. Bb3 f6 31. Rf7 Nxg2 32. Rf8+ Kg7 33. Rf7+ Kh6 34. Rxf6 Nf4 35. Kh4 (35. Rxf4 exf4+ 36. Kxf4 Rxh2) 35... "
            + "Rxh2+ 36. Kg4 Rg2+ 37. Kh4 Nd3 38. a3 Rh2+ 39. Kg4 Rh1 40. Rc6 {Diagram [#]}) 26... Rxd2 27. Kf1 Nd4 28. cxb5 cxb5 29. a4 (29. Rd7 Rxa2 30. Bd3 Ra3 31. "
            + "Be2 Ra1+ 32. Kf2 Ra2 ) (29. Bxb5 Nxb5) 29... Rxe2 (29... bxa4 30. Bc4) 30. axb5 Rb2 31. b6 Rxb4 32. b7 Kg7  ";
    
    StringBuilder buffer = new StringBuilder();
    
    int parenthesisCounter = 0;
    
    for (char c : data.toCharArray()) {
        if (c == '(' || c == '{' )
            parenthesisCounter++;
        if (c == ')' || c == '}' )
            parenthesisCounter--;
        if (!(c == '(' || c == '{' || c == ')' || c == '}') && parenthesisCounter == 0)
            buffer.append(c);
    }
    

    And after that you can just focus on removing rest of unwanted data like you used before

    .replaceAll(Pattern.quote("$") + "[0-9]+", "");
    

    so result of

    System.out.println(buffer.toString().replaceAll(
            Pattern.quote("$") + "[0-9]+", ""));
    

    will be

    1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 O-O 6. Be2 e5 7. dxe5 dxe5 8. Qxd8 Rxd8 9. Bg5 Nbd7 10. O-O-O Rf8 11. Nd5 c6 12. Ne7+ Kh8 13. Nxc8 Raxc8 14. Bxf6 14... Nxf6 15. Nd2 15... Bh6 16. f3 Nd7 17. Kc2 Bxd2 18. Rxd2 Nc5 19. b4 Ne6 20. Rd7 b5 21. Rxa7 Nd4+ 22. Kd3 Rcd8 23. Ke3 Nc2+ 24. Kf2 Rd2 25. Rd1 Rfd8 26. Rxd2 26... Rxd2 27. Kf1 Nd4 28. cxb5 cxb5 29. a4 29... Rxe2 30. axb5 Rb2 31. b6 Rxb4 32. b7 Kg7

    0 讨论(0)
  • 2021-01-23 18:29

    Pshemo's answer is good, but I wanted to show you how it could be done using regular expressions and how I think the parsing can be optimized:

    import java.util.regex.Pattern;
    
    /**
     * Created for http://stackoverflow.com/a/25335225/1266906
     */
    public class RemoveBrackets {
    
        public static void main(String[] args) {
            String testData =
                    "1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 O-O 6. Be2 e5 7. dxe5 dxe5 8. Qxd8 Rxd8 9. Bg5 Nbd7 10. O-O-O {Diagram [#]} " +
                            "Rf8 (10... Re8 11. Nb5 (11. Nd5)) (10... h6 11. Bxf6 Bxf6 12. Nd5) 11. Nd5 c6 (11... Nxe4 12. Nxc7 Rb8 13. Be3 b6 ) 12. Ne7+ Kh8 13. " +
                            "Nxc8 Raxc8 14. Bxf6 (14. Be3) 14... Nxf6 15. Nd2 (15. Bd3) 15... Bh6 16. f3 Nd7 17. Kc2 Bxd2 (17... Rcd8 18. b4) 18. Rxd2 Nc5 19. b4 Ne6 20. Rd7 b5 " +
                            "(20... Rcd8 21. Rxb7 Nd4+ 22. Kd3) 21. Rxa7 Nd4+ 22. Kd3 Rcd8 23. Ke3 Nc2+ 24. Kf2 Rd2 25. Rd1 Rfd8 26. Rxd2 {Diagram [#]} (26. cxb5 cxb5 " +
                            "27. Rc7 Rxd1 28. Bxd1 Rd2+ 29. Kg3 Ne1 30. Bb3 f6 31. Rf7 Nxg2 32. Rf8+ Kg7 33. Rf7+ Kh6 34. Rxf6 Nf4 35. Kh4 (35. Rxf4 exf4+ 36. Kxf4 Rxh2) 35... " +
                            "Rxh2+ 36. Kg4 Rg2+ 37. Kh4 Nd3 38. a3 Rh2+ 39. Kg4 Rh1 40. Rc6 {Diagram [#]}) 26... Rxd2 27. Kf1 Nd4 28. cxb5 cxb5 29. a4 (29. Rd7 Rxa2 30. Bd3 Ra3 31. " +
                            "Be2 Ra1+ 32. Kf2 Ra2 ) (29. Bxb5 Nxb5) 29... Rxe2 (29... bxa4 30. Bc4) 30. axb5 Rb2 31. b6 Rxb4 32. b7 Kg7  ";
            System.out.println(replaceLoop(testData).replaceAll(Pattern.quote("$") + "[0-9]+", ""));
            System.out.println(copyIterator(testData).replaceAll(Pattern.quote("$") + "[0-9]+", ""));
        }
    
        private static String replaceLoop(String data) {
            Pattern pattern = Pattern.compile("\\([^()]*\\)|\\{[^{}]*\\}");
            String previous, current = data;
            do {
                previous = current;
                current = pattern.matcher(previous).replaceAll("");
            } while (!previous.equals(current));
            return current;
        }
    
        private static String copyIterator(String data) {
            StringBuilder sb = new StringBuilder();
            int start = 0;
            int level = 0;
            for (int i = 0; i < data.length(); i++) {
                switch (data.charAt(i)) {
                    case '(':
                    case '{':
                        if (level == 0 && start >= 0) {
                            sb.append(data.substring(start, i));
                            start = -1;
                        }
                        level++;
                        break;
                    case ')':
                    case '}':
                        level--;
                        if (level == 0) {
                            start = i + 1;
                        } else if (level < 0) {
                            throw new IllegalStateException("Too many closing brackets.");
                        }
                        break;
                }
            }
            if (level > 0) {
                throw new IllegalStateException("Too many opening brackets.");
            }
            if (start >= 0 && start < data.length()) {
                sb.append(data.substring(start, data.length()));
            }
            return sb.toString();
        }
    }
    

    In replaceLoop I remove only brace-groups which do not contain braces themselves (inner most braces) and therefore have to repeat until the String does no longer change. This can be expensive especially if you expect braces to be nested a lot. As previously stated the problem is that you can only refer to already matched characters, not to their opposite or their count; if you know how deep things are normally nested, you can of course write a pattern that will remove all expected levels at once and only seldomly require more than two searches.

    In copyIterator I determine which chunks are not embraced and copy those chunks into a new StringBuilder. By copying chunks I minimize the number of times the StringBuilder resizes and as copying blocks is often equally expensive as copying single characters the cost per character decreases. Also by using switch the compiler can utilize an integer-map which can perform the check for the 4 relevant characters at once rather than checking them one by one as it would need to do when I do ifs (yeah, cool compilers should do this for you, but ...).

    0 讨论(0)
提交回复
热议问题