I uploaded a txt
file in to R
as follows: Election_Parties <- readr::read_lines(\"Election_Parties.txt\")
Let\'s say the following text
You may use
strsplit(paste(Election_Parties, collapse=" "), "\\s+(?=P\\d+-)", perl=TRUE)[[1]]
See the R demo online.
Output:
[1] "P23-Andalusian Social Democratic Party (Partido Social-Demócrata Andaluz [PSDA])"
[2] "P24-Andalusian Socialist Movement (Movimiento Socialista Andaluz [MSA])"
[3] "P235-Andalusian Socialist Party-Andalucian Party (Partido Socialista Andalucista-Partido Andalucista [PSA-PA])"
[4] "P26-Andalusist Party (Partido Andalucista [PA])"
[5] "P217-Andecha Astur (Andecha Astur [AA])"
The \s+(?=P\d+-)
pattern matches 1+ whitespaces that are followed with P
, 1+ digits, -
, but the P<numbers>-
is not consumed since the pattern resides in the positive lookahead construct that is a zero-width assertion. Due to this lookahead, the perl=TRUE
argument is necessary to process the regex with the PCRE regex engine.