How to split a string, but also keep the delimiters?

前端 未结 23 2288
我在风中等你
我在风中等你 2020-11-21 06:32

I have a multiline string which is delimited by a set of different delimiters:

(Text1)(DelimiterA)(Text2)(DelimiterC)(Text3)(DelimiterB)(Text4)
相关标签:
23条回答
  • 2020-11-21 06:52

    I don't know Java too well, but if you can't find a Split method that does that, I suggest you just make your own.

    string[] mySplit(string s,string delimiter)
    {
        string[] result = s.Split(delimiter);
        for(int i=0;i<result.Length-1;i++)
        {
            result[i] += delimiter; //this one would add the delimiter to each items end except the last item, 
                        //you can modify it however you want
        }
    }
    string[] res = mySplit(myString,myDelimiter);
    

    Its not too elegant, but it'll do.

    0 讨论(0)
  • 2020-11-21 06:53

    A very naive solution, that doesn't involve regex would be to perform a string replace on your delimiter along the lines of (assuming comma for delimiter):

    string.replace(FullString, "," , "~,~")
    

    Where you can replace tilda (~) with an appropriate unique delimiter.

    Then if you do a split on your new delimiter then i believe you will get the desired result.

    0 讨论(0)
  • 2020-11-21 06:55

    Here's a groovy version based on some of the code above, in case it helps. It's short, anyway. Conditionally includes the head and tail (if they are not empty). The last part is a demo/test case.

    List splitWithTokens(str, pat) {
        def tokens=[]
        def lastMatch=0
        def m = str=~pat
        while (m.find()) {
          if (m.start() > 0) tokens << str[lastMatch..<m.start()]
          tokens << m.group()
          lastMatch=m.end()
        }
        if (lastMatch < str.length()) tokens << str[lastMatch..<str.length()]
        tokens
    }
    
    [['<html><head><title>this is the title</title></head>',/<[^>]+>/],
     ['before<html><head><title>this is the title</title></head>after',/<[^>]+>/]
    ].each { 
       println splitWithTokens(*it)
    }
    
    0 讨论(0)
  • 2020-11-21 06:55

    An extremely naive and inefficient solution which works nevertheless.Use split twice on the string and then concatenate the two arrays

    String temp[]=str.split("\\W");
    String temp2[]=str.split("\\w||\\s");
    int i=0;
    for(String string:temp)
    System.out.println(string);
    String temp3[]=new String[temp.length-1];
    for(String string:temp2)
    {
            System.out.println(string);
            if((string.equals("")!=true)&&(string.equals("\\s")!=true))
            {
                    temp3[i]=string;
                    i++;
            }
    //      System.out.println(temp.length);
    //      System.out.println(temp2.length);
    }
    System.out.println(temp3.length);
    String[] temp4=new String[temp.length+temp3.length];
    int j=0;
    for(i=0;i<temp.length;i++)
    {
            temp4[j]=temp[i];
            j=j+2;
    }
    j=1;
    for(i=0;i<temp3.length;i++)
    {
            temp4[j]=temp3[i];
            j+=2;
    }
    for(String s:temp4)
    System.out.println(s);
    
    0 讨论(0)
  • 2020-11-21 06:56

    I will post my working versions also(first is really similar to Markus).

    public static String[] splitIncludeDelimeter(String regex, String text){
        List<String> list = new LinkedList<>();
        Matcher matcher = Pattern.compile(regex).matcher(text);
    
        int now, old = 0;
        while(matcher.find()){
            now = matcher.end();
            list.add(text.substring(old, now));
            old = now;
        }
    
        if(list.size() == 0)
            return new String[]{text};
    
        //adding rest of a text as last element
        String finalElement = text.substring(old);
        list.add(finalElement);
    
        return list.toArray(new String[list.size()]);
    }
    

    And here is second solution and its round 50% faster than first one:

    public static String[] splitIncludeDelimeter2(String regex, String text){
        List<String> list = new LinkedList<>();
        Matcher matcher = Pattern.compile(regex).matcher(text);
    
        StringBuffer stringBuffer = new StringBuffer();
        while(matcher.find()){
            matcher.appendReplacement(stringBuffer, matcher.group());
            list.add(stringBuffer.toString());
            stringBuffer.setLength(0); //clear buffer
        }
    
        matcher.appendTail(stringBuffer); ///dodajemy reszte  ciagu
        list.add(stringBuffer.toString());
    
        return list.toArray(new String[list.size()]);
    }
    
    0 讨论(0)
  • 2020-11-21 06:56

    I don't think it is possible with String#split, but you can use a StringTokenizer, though that won't allow you to define your delimiter as a regex, but only as a class of single-digit characters:

    new StringTokenizer("Hello, world. Hi!", ",.!", true); // true for returnDelims
    
    0 讨论(0)
提交回复
热议问题