How to replace square brackets with curly brackets using R's regex?

前端 未结 3 525
一生所求
一生所求 2021-01-14 11:59

Due to conversions between pandoc-citeproc and latex I\'d like to replace this

[@Fotheringham1981]

with this

\\cite{Fotheringham1

相关标签:
3条回答
  • 2021-01-14 12:53

    You need to use capturing group.

    x <- c("[@Fotheringham1981]", "df[1,2]")
    gsub("\\[@([^\\]]*)\\]", "\\\\cite{\\1}", x, perl=T)
    # [1] "\\cite{Fotheringham1981}" "df[1,2]" 
    

    or

    gsub("\\[@(.*?)\\]", "\\\\cite{\\1}", x)
    # [1] "\\cite{Fotheringham1981}" "df[1,2]"
    
    0 讨论(0)
  • 2021-01-14 13:01

    You can use

    gsub("\\[@([^]]*)]", "\\\\cite{\\1}", x)
    

    See IDEONE demo

    Regex breakdown:

    • \\[@ - a literal [@ symbol sequence
    • ([^]]*) - a capture group 1 that matches 0 or more occurrences of any symbol but a ] (note that if ] appears at the beginning of a character class, it does not need escaping)
    • ] - a literal ] symbol

    You do not need to use perl=T with this one because the ] inside a character class is not escaped. Otherwise, it would require using that option.

    Also, I believe we should only escape what must be escaped. If there is a way to avoid backslash hell, we should. Thus, you can even use

    gsub("[[]@([^]]*)]", "\\\\cite{\\1}", x)
    

    Here is another demo

    Why TRE-based regex works better than the PCRE one:

    In R 2.10.0 and later, the default regex engine is a modified version of Ville Laurikari's TRE engine [source]. The library's author states that time spent for matching grows linearly with increasing of input text length, while memory requirements are almost constant (tens of kilobytes). TRE is also said to use predictable and modest memory consumption and a quadratic worst-case time in the length of the used regular expression matching algorithm. That is why it seems best to rely on TRE rather than on PCRE regex when dealing with larger documents.

    0 讨论(0)
  • 2021-01-14 13:03

    This matches [@ and then sets up a capture group, i.e. everything within (...), and then .*? matches the shortest string until ] :

    gsub("\\[(@.*?)\\]", "\\\\cite{\\1}", x)
    ## [1] "\\cite{@Fotheringham1981}" "df[1,2]" 
    

    Here is a railroad diagram of the regular expression:

    \[(@.*?)\]
    

    Regular expression visualization

    Debuggex Demo

    0 讨论(0)
提交回复
热议问题