Extract sub-string between two certain words using regex in java

后端 未结 3 1880
囚心锁ツ
囚心锁ツ 2020-12-30 09:59

I would like to extract sub-string between certain two words using java.

For example:

This is an important example about regex for my work.


        
相关标签:
3条回答
  • 2020-12-30 10:26

    For your first question, make it lazy. You can put a question mark after the quantifier and then the quantifier will match as less as possible.

    (?<=an).*?(?=for)
    

    I have no idea what the additional . at the end is good for in .*. its unnecessary.

    For your second question you have to define what a "word" is. I would say here probably just a sequence of non whitespace followed by a whitespace. Something like this

    \S+\s
    

    and repeat this 3 times like this

    (?<=an)\s(\S+\s){3}(?=for)
    

    To ensure that the pattern mathces on whole words use word boundaries

    (?<=\ban\b)\s(\S+\s){1,5}(?=\bfor\b)
    

    See it online here on Regexr

    {3} will match exactly 3 for a minimum of 1 and a max of 3 do this {1,3}

    Alternative:

    As dma_k correctly stated in your case here its not necessary to use look behind and look ahead. See here the Matcher documentation about groups

    You can use capturing groups instead. Just put the part you want to extract in brackets and it will be put into a capturing group.

    \ban\b(.*?)\bfor\b
    

    See it online here on Regexr

    You can than access this group like this

    System.out.println("I found the text: " + matcher.group(1).toString());
                                                            ^
    

    You have only one pair of brackets, so its simple, just put a 1 into matcher.group(1) to access the first capturing group.

    0 讨论(0)
  • 2020-12-30 10:38

    public class SubStringBetween {

    public static String subStringBetween(String sentence, String before, String after) {
    
        int startSub = SubStringBetween.subStringStartIndex(sentence, before);
        int stopSub = SubStringBetween.subStringEndIndex(sentence, after);
    
        String newWord = sentence.substring(startSub, stopSub);
        return newWord;
    }
    
    public static int subStringStartIndex(String sentence, String delimiterBeforeWord) {
    
        int startIndex = 0;
        String newWord = "";
        int x = 0, y = 0;
    
        for (int i = 0; i < sentence.length(); i++) {
            newWord = "";
    
            if (sentence.charAt(i) == delimiterBeforeWord.charAt(0)) {
                startIndex = i;
                for (int j = 0; j < delimiterBeforeWord.length(); j++) {
                    try {
                        if (sentence.charAt(startIndex) == delimiterBeforeWord.charAt(j)) {
                            newWord = newWord + sentence.charAt(startIndex);
                        }
                        startIndex++;
                    } catch (Exception e) {
                    }
    
                }
                if (newWord.equals(delimiterBeforeWord)) {
                    x = startIndex;
                }
            }
        }
        return x;
    }
    
    public static int subStringEndIndex(String sentence, String delimiterAfterWord) {
    
        int startIndex = 0;
        String newWord = "";
        int x = 0;
    
        for (int i = 0; i < sentence.length(); i++) {
            newWord = "";
    
            if (sentence.charAt(i) == delimiterAfterWord.charAt(0)) {
                startIndex = i;
                for (int j = 0; j < delimiterAfterWord.length(); j++) {
                    try {
                        if (sentence.charAt(startIndex) == delimiterAfterWord.charAt(j)) {
                            newWord = newWord + sentence.charAt(startIndex);
                        }
                        startIndex++;
                    } catch (Exception e) {
                    }
    
                }
                if (newWord.equals(delimiterAfterWord)) {
                    x = startIndex;
                    x = x - delimiterAfterWord.length();
                }
            }
        }
        return x;
    }
    

    }

    0 讨论(0)
  • 2020-12-30 10:50

    Your regex is "an\\s+(.*?)\\s+for". It extracts all characters between an and for ignoring white spaces (\s+). The question mark means "greedy". It is needed to prevent pattern .* to eat everything including word "for".

    0 讨论(0)
提交回复
热议问题