Splitting by a semicolon not surrounded by quote signs

后端 未结 3 2000
感情败类
感情败类 2021-01-23 11:07

Well, hello community. I\'m workin\' on a CSV decoder in PHP (yeah, I know there\'s already one, but as a challenge for me, since I\'m learning it in my free time). Now the prob

相关标签:
3条回答
  • 2021-01-23 11:49

    Split is not a good choice for csv type lines.
    You could use the old tried and true \G anchor with a find globally type func.

    Practical

    Regex: '~\G(?:(?:^|;)\s*)(?|"([^"]*)"|([^;]*?))(?:\s*(?:(?=;)|$))~'

    Info:

     \G                            # G anchor, start where last match left off
     (?:                           # leading BOL or ;
          (?: ^ | ; )
          \s*                           # optional whitespaces
     )
     (?|                           # branch reset
          " 
          ( [^"]* )                     # (1), double quoted string data
          "
       |                              # or
          ( [^;]*? )                    # (1), non-quoted field
     )
     (?:                           # trailing optional whitespaces
          \s* 
          (?:
               (?= ; )                       # lookahead for ;
            |  $                             # or EOL
          )
     )
    
    0 讨论(0)
  • 2021-01-23 11:51

    You can use this function str_getcsv in this you can specify a custom delimiter(;) as well.

    Try this code snippet

    <?php
    
    $string='"0;0";1;2;3;4';
    print_r(str_getcsv($string,";"));
    

    Output:

    Array
    (
        [0] => 0;0
        [1] => 1
        [2] => 2
        [3] => 3
        [4] => 4
    )
    
    0 讨论(0)
  • 2021-01-23 12:04

    It's a bit counter-intuitive, but the simplest way to split a string by regex is often to use preg_match_all in place of preg_split:

    preg_match_all('~("[^"]*"|[^;"]*)(?:;|$)~A', $line, $m);
    $res[] = $m[1];
    

    The A modifier ensures the contiguity of the successive matches from the start of the string.

    If you don't want the quotes to be included in the result, you can use the branch reset feature (?|..(..)..|..(..)..):

    preg_match_all('~(?|"([^"]*)"|([^;"]*))(?:;|$)~A', $line, $m);
    

    Other workaround, but this time for preg_split: include the part you want to avoid before the delimiter and discard it from the whole match using the \K feature:

    $res[] = preg_split('~(?:"[^"]*")?\K;~', $line);
    
    0 讨论(0)
提交回复
热议问题