Splitting by a semicolon not surrounded by quote signs

浪尽此生 提交于 2019-12-02 06:08:42

It's a bit counter-intuitive, but the simplest way to split a string by regex is often to use preg_match_all in place of preg_split:

preg_match_all('~("[^"]*"|[^;"]*)(?:;|$)~A', $line, $m);
$res[] = $m[1];

The A modifier ensures the contiguity of the successive matches from the start of the string.

If you don't want the quotes to be included in the result, you can use the branch reset feature (?|..(..)..|..(..)..):

preg_match_all('~(?|"([^"]*)"|([^;"]*))(?:;|$)~A', $line, $m);

Other workaround, but this time for preg_split: include the part you want to avoid before the delimiter and discard it from the whole match using the \K feature:

$res[] = preg_split('~(?:"[^"]*")?\K;~', $line);

You can use this function str_getcsv in this you can specify a custom delimiter(;) as well.

Try this code snippet

<?php

$string='"0;0";1;2;3;4';
print_r(str_getcsv($string,";"));

Output:

Array
(
    [0] => 0;0
    [1] => 1
    [2] => 2
    [3] => 3
    [4] => 4
)

Split is not a good choice for csv type lines.
You could use the old tried and true \G anchor with a find globally type func.

Practical

Regex: '~\G(?:(?:^|;)\s*)(?|"([^"]*)"|([^;]*?))(?:\s*(?:(?=;)|$))~'

Info:

 \G                            # G anchor, start where last match left off
 (?:                           # leading BOL or ;
      (?: ^ | ; )
      \s*                           # optional whitespaces
 )
 (?|                           # branch reset
      " 
      ( [^"]* )                     # (1), double quoted string data
      "
   |                              # or
      ( [^;]*? )                    # (1), non-quoted field
 )
 (?:                           # trailing optional whitespaces
      \s* 
      (?:
           (?= ; )                       # lookahead for ;
        |  $                             # or EOL
      )
 )
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!