Well, hello community. I\'m workin\' on a CSV decoder in PHP (yeah, I know there\'s already one, but as a challenge for me, since I\'m learning it in my free time). Now the prob
Split is not a good choice for csv type lines.
You could use the old tried and true \G
anchor with a find globally type func.
Practical
Regex: '~\G(?:(?:^|;)\s*)(?|"([^"]*)"|([^;]*?))(?:\s*(?:(?=;)|$))~'
Info:
\G # G anchor, start where last match left off
(?: # leading BOL or ;
(?: ^ | ; )
\s* # optional whitespaces
)
(?| # branch reset
"
( [^"]* ) # (1), double quoted string data
"
| # or
( [^;]*? ) # (1), non-quoted field
)
(?: # trailing optional whitespaces
\s*
(?:
(?= ; ) # lookahead for ;
| $ # or EOL
)
)
You can use this function str_getcsv
in this you can specify a custom delimiter(;
) as well.
Try this code snippet
<?php
$string='"0;0";1;2;3;4';
print_r(str_getcsv($string,";"));
Output:
Array
(
[0] => 0;0
[1] => 1
[2] => 2
[3] => 3
[4] => 4
)
It's a bit counter-intuitive, but the simplest way to split a string by regex is often to use preg_match_all
in place of preg_split
:
preg_match_all('~("[^"]*"|[^;"]*)(?:;|$)~A', $line, $m);
$res[] = $m[1];
The A modifier ensures the contiguity of the successive matches from the start of the string.
If you don't want the quotes to be included in the result, you can use the branch reset feature (?|..(..)..|..(..)..)
:
preg_match_all('~(?|"([^"]*)"|([^;"]*))(?:;|$)~A', $line, $m);
Other workaround, but this time for preg_split
: include the part you want to avoid before the delimiter and discard it from the whole match using the \K
feature:
$res[] = preg_split('~(?:"[^"]*")?\K;~', $line);