Regex with possible empty matches and multi-line match

后端 未结 4 1079
猫巷女王i
猫巷女王i 2021-01-23 22:28

I\'ve been trying to \"parse\" some data using a regex, and I feel as if I\'m close, but I just can\'t seem to bring it all home.
The data that needs parsing gener

4条回答
  •  被撕碎了的回忆
    2021-01-23 22:44

    I think I'd avoid using regex to do this task, instead split it into sub-tasks.

    Basic algorithm outline

    1. Split the string on \n using explode
    2. Loop over the resulting array
      1. Split the resulting strings on : also using explode with a limit of 2.
      2. If the produced array's length is less than 2, add the entirety of the data to the previous key's value
      3. Else, use the first array index as your key, the second as the value unless the split colon was escaped (in which case, instead add the key + split + value to the previous key's value)

    This algorithm does assume there are no keys with escaped colons. Escaped colons in values will be dealt with just fine (i.e. user input).

    Code

    $str = << 0) {
        // -> Nope, append the value to the previous key's value
        $output[$prevKey] .= "\n" . $keyValuePair[0];
      }
      else {
        // -> Maybe
        // ?: Did we miss an escaped colon
        if (substr($keyValuePair[0], -1) === '\\') {
          // -> Yep, this means this is a value, not a key/value pair append both key and
          // value (including the split between) to the previous key's value ignoring
          // any colons in the rest of the string (allowing dates to pass through)
          $output[$prevKey] .= "\n" . $keyValuePair[0] . $split . $keyValuePair[1];
        }
        else {
          // -> Nope, create a new key with a value
          $output[$keyValuePair[0]] = $keyValuePair[1];
          $prevKey = $keyValuePair[0];
        }
      }
    }
    
    var_dump($output);
    

    Output

    array(5) {
      ["FooID"]=>
      string(6) "123456"
      ["Name"]=>
      string(5) "Chuck"
      ["When"]=>
      string(19) "01/02/2013 01:23:45"
      ["InternalID"]=>
      string(0) ""
      ["User Message"]=>
      string(293) "Hello,
    this is nillable, but can be quite long. Text can be spread out over many lines
    This\: works too. And can start with any number of \n's. It can be empty, too.
    What's worse, though is that this CAN contain colons (but they're _"escaped"_
    
    
    using `\`) like so `\:`, and even basic markup!"
    }
    

    Online demo

提交回复
热议问题