问题
I am trying to split a string without regex in a more idiomatic functional approach.
case class Parsed(blocks: Vector[String], block: String, depth: Int)
def processChar(parsed: Parsed, c: Char): Parsed = {
import parsed._
c match {
case '|' if depth == 0
=> parsed.copy(block = "", blocks = blocks :+ block ,
depth = depth)
case '[' => parsed.copy(block = block + c,
depth = depth + 1)
case ']' if depth == 1
=> parsed.copy( block = "", blocks = blocks :+ (block + c),
depth = depth - 1)
case ']' => parsed.copy(block = block + c,
depth = depth - 1)
case _ => parsed.copy(block = block + c)
}
}
val s = "Str|[ts1:tssub2|ts1:tssub2]|BLANK|[INT1|X.X.X.X|INT2|BLANK |BLANK | |X.X.X.X|[INT3|s1]]|[INT3|INT4|INT5|INT6|INT7|INT8|INT9|INT10|INT11|INT12|INT13|INT14|INT15]|BLANK |BLANK |[s2|s3|s4|INT16|INT17];[s5|s6|s7|INT18|INT19]|[[s8|s9|s10|INT20|INT21]|ts3:tssub3| | ];[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK |BLANK ]|BLANK |BLANK |[s14|s15]"
val parsed = s.foldLeft(Parsed(Vector(), "", 0))(processChar)
parsed.blocks.size //20
parsed.blocks foreach println
I would expect to get the following result (parsed.blocks.size should be 12).
Str
[ts1:tssub2|ts1:tssub2]
BLANK|
[INT1|X.X.X.X|INT2|BLANK |BLANK | |X.X.X.X|[INT3|s1]]
[INT3|INT4|INT5|INT6|INT7|INT8|INT9|INT10|INT11|INT12|INT13|INT14|INT15]
BLANK
BLANK
[s2|s3|s4|INT16|INT17];[s5|s6|s7|INT18|INT19]
[[s8|s9|s10|INT20|INT21]|ts3:tssub3| | ];[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK |BLANK ]
BLANK
BLANK
[s14|s15]
However result I am getting is (parsed.blocks.size is 20)
Str
[ts1:tssub2|ts1:tssub2]
BLANK
[INT1|X.X.X.X|INT2|BLANK|BLANK||X.X.X.X|[INT3|s1]]
[INT3|INT4|INT5|INT6|INT7|INT8|INT9|INT10|INT11|INT12|INT13|INT14|INT15]
BLANK
BLANK
[s2|s3|s4|INT16|INT17]
;[s5|s6|s7|INT18|INT19]
[[s8|s9|s10|INT20|INT21]|ts1:tssub2||]
;[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK|BLANK]
BLANK
BLANK
[s14|s15]
To my understanding this is slight variation of parenthesis balancing problem. However in this case ;
would mean kind of continuation.
I have two questions in this case
1) How the extra entry /space after [ts1:tssub2|ts1:tssub2]
came, also after
[INT1|X.X.X.X|INT2|BLANK|BLANK||X.X.X.X|[INT3|s1]]
, [INT3|INT4|INT5|INT6|INT7|INT8|INT9|INT10|INT11|INT12|INT13|INT14|INT15]
and
;[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK|BLANK]
in my result as well ?
2) Here at the moment [s2|s3|s4|INT16|INT17]
and ;[s5|s6|s7|INT18|INT19]
go in as two different entries. However this should be merged as
[s2|s3|s4|INT16|INT17];[s5|s6|s7|INT18|INT19]
a single entry[So does
[[s8|s9|s10|INT20|INT21]|ts1:tssub2||]
and
;[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK|BLANK])
as well]. Any clues to how to do so ?
回答1:
Question 1
The extra empty string block appears because the very previous case each time is
case ']' if depth == 1
It adds an empty block and decreases the depth. Then we have
case '|' if depth == 0
which also adds another empty block, pushing the previous empty one into the resulting Vector.
Before answering the second question, I would like to suggest another approach to the implementation of this parser, which is slightly more idiomatic. My major criticism about the current one is the usage of an intermediate object (Parsed
) to wrap state and copying it in each and every case. Indeed, we do not need it: more frequent approach is to use a recursive function, especially when depth is involved.
So, without modifying significantly the processing of your case
s, it can be represented as follows:
def parse(blocks: Seq[String],
currentBlock: String,
remaining: String,
depth: Int): Seq[String] =
if (remaining.isEmpty) {
blocks
} else {
val curChar = remaining.head
curChar match {
case '|' if depth == 0 =>
parse(blocks :+ currentBlock, "", remaining.tail, depth)
case '[' =>
parse(blocks, currentBlock + curChar, remaining.tail, depth + 1)
case ']' =>
if (depth == 1)
parse(blocks :+ (currentBlock + curChar), "", remaining.tail, depth - 1)
else
parse(blocks, currentBlock + curChar, remaining.tail, depth - 1)
case _ =>
parse(blocks, currentBlock + curChar, remaining.tail, depth)
}
}
It produces exactly the same output as the original solution.
To fix the issue with empty blocks, we need to change case '|'
:
case '|' if depth == 0 =>
val updatedBlocks = if (currentBlock.isEmpty) blocks
else blocks :+ currentBlock
parse(updatedBlocks, "", remaining.tail, depth)
We just skip the current block if it contains an empty string.
Question 2
To merge the two blocks between ;
char, we need to bring back one parsed block and return it into the currentBlock
reference. This represents an additional case:
case ';' =>
parse(blocks.init, blocks.last + curChar, remaining.tail, depth)
Now, after
val result = parse(Seq(), "", s, 0)
result.foreach(println)
The output is
Str
[ts1:tssub2|ts1:tssub2]
BLANK
[INT1|X.X.X.X|INT2|BLANK |BLANK | |X.X.X.X|[INT3|s1]]
[INT3|INT4|INT5|INT6|INT7|INT8|INT9|INT10|INT11|INT12|INT13|INT14|INT15]
BLANK
BLANK
[s2|s3|s4|INT16|INT17];[s5|s6|s7|INT18|INT19]
[[s8|s9|s10|INT20|INT21]|ts3:tssub3| | ];[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK |BLANK ]
BLANK
BLANK
[s14|s15]
And it looks very similar to what you were looking for.
来源:https://stackoverflow.com/questions/49782126/splitting-complex-string-pattern-without-regex-in-a-functional-approach