I\'m trying to clean a bunch of .txt files in a folder using regex. I can\'t seem to get R to find line breaks.
This is the code I\'m using. It works for character subst
You can't do that with xfun::gsub_dir
.
Have a look at the source code:
read_utf8
that basically executes x = readLines(con, encoding = 'UTF-8', warn = FALSE)
, gsub
is fed with these lines, and when all replacements are done,You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #
:
lbr_change_gsub_dir = function(newline = '\n', encoding = 'UTF-8', dir = '.', recursive = TRUE) {
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files) {
x = readLines(f, encoding = encoding, warn = FALSE)
cat(x, sep = newline, file = f)
}
}
folder <- "C:\\MyFolder\\Here"
lbr_change_gsub_dir(newline="#", dir=folder)
If you want to be able to match multiline patterns, paste
the lines collape
ing them with newline
and use any pattern you like:
lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = '\n', encoding = 'UTF-8', dir = '.', recursive = TRUE) {
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files) {
x <- readLines(f, encoding = encoding, warn = FALSE)
x <- paste(x, collapse = newline)
x <- gsub(pattern, replacement, x, perl = perl)
cat(x, file = f)
}
}
folder <- "C:\\1"
lbr_gsub_dir("(?m)\\d+\\R(.+)", "\\1", dir = folder)
This will remove lines that follow digit only lines.