Extracting unique numbers from string in R

前端 未结 7 2068
Happy的楠姐
Happy的楠姐 2020-11-27 05:11

I have a list of strings which contain random characters such as:

list=list()
list[1] = \"djud7+dg[a]hs667\"
list[2] = \"7fd*hac11(5)\"
list[3] = \"2tu,g7gka         


        
相关标签:
7条回答
  • 2020-11-27 05:45

    For the second answer, you can use gsub to remove everything from the string that's not a number, then split the string as follows:

    unique(as.numeric(unlist(strsplit(gsub("[^0-9]", "", unlist(ll)), ""))))
    # [1] 7 6 1 5 2
    

    For the first answer, similarly using strsplit,

    unique(na.omit(as.numeric(unlist(strsplit(unlist(ll), "[^0-9]+")))))
    # [1]   7 667  11   5   2
    

    PS: don't name your variable list (as there's an inbuilt function list). I've named your data as ll.

    0 讨论(0)
  • 2020-11-27 05:50

    Here is yet another answer, this one using gregexpr to find the numbers, and regmatches to extract them:

    l <- c("djud7+dg[a]hs667", "7fd*hac11(5)", "2tu,g7gka5")
    
    temp1 <- gregexpr("[0-9]", l)   # Individual digits
    temp2 <- gregexpr("[0-9]+", l)  # Numbers with any number of digits
    
    as.numeric(unique(unlist(regmatches(l, temp1))))
    # [1] 7 6 1 5 2
    as.numeric(unique(unlist(regmatches(l, temp2))))
    # [1]   7 667  11   5   2
    
    0 讨论(0)
  • 2020-11-27 05:52

    A stringr solution with str_match_all and piped operators. For the first solution:

    library(stringr)
    str_match_all(ll, "[0-9]+") %>% unlist %>% unique %>% as.numeric
    

    Second solution:

    str_match_all(ll, "[0-9]") %>% unlist %>% unique %>% as.numeric
    

    (Note: I've also called the list ll)

    0 讨论(0)
  • 2020-11-27 05:55

    Check out the str_extract_numbers() function from the strex package.

    pacman::p_load(strex)
    list=list()
    list[1] = "djud7+dg[a]hs667"
    list[2] = "7fd*hac11(5)"
    list[3] = "2tu,g7gka5"
    charvec <- unlist(list)
    print(charvec)
    #> [1] "djud7+dg[a]hs667" "7fd*hac11(5)"     "2tu,g7gka5"
    str_extract_numbers(charvec)
    #> [[1]]
    #> [1]   7 667
    #> 
    #> [[2]]
    #> [1]  7 11  5
    #> 
    #> [[3]]
    #> [1] 2 7 5
    unique(unlist(str_extract_numbers(charvec)))
    #> [1]   7 667  11   5   2
    

    Created on 2018-09-03 by the reprex package (v0.2.0).

    0 讨论(0)
  • 2020-11-27 05:59

    You could use ?strsplit (like suggested in @Arun's answer in Extracting numbers from vectors (of strings)):

    l <- c("djud7+dg[a]hs667", "7fd*hac11(5)", "2tu,g7gka5")
    
    ## split string at non-digits
    s <- strsplit(l, "[^[:digit:]]")
    
    ## convert strings to numeric ("" become NA)
    solution <- as.numeric(unlist(s))
    
    ## remove NA and duplicates
    solution <- unique(solution[!is.na(solution)])
    # [1]   7 667  11   5   2
    
    0 讨论(0)
  • 2020-11-27 06:03

    A solution using stringi

     # extract the numbers:
    
     nums <- stri_extract_all_regex(list, "[0-9]+")
    
     # Make vector and get unique numbers:
    
     nums <- unlist(nums)
     nums <- unique(nums)
    

    And that's your first solution

    For the second solution I would use substr:

    nums_first <- sapply(nums, function(x) unique(substr(x,1,1)))
    
    0 讨论(0)
提交回复
热议问题