I have a large data set with thousands of columns. The column names include various unwanted characters as follows:
col1_3x_xxx
col2_3y_xyz
col3_3z_zyx
We can try the str_extract
with regular expression pattern "^[^_]+(?=_)"
:
stringr::str_extract(c("col1_3x_xxx", "col2_3y_xyz", "col3_3z_zyx"), "^[^_]+(?=_)")
[1] "col1" "col2" "col3"
where in the pattern:
The first
^
matches the beginning of the string;[^_]+
matches one or more non_
character,^_
means any character but_
.(?=...)
stands for lookahead, so we are looking for pattern ahead of_
.