R count times word appears in element of list

前端 未结 4 658
误落风尘
误落风尘 2020-12-20 04:51

I have a list comprised of words.

> head(splitWords2)
[[1]]
 [1] \"Some\"        \"additional\"  \"information\" \"that\"        \"we\"          \"would\"         


        
相关标签:
4条回答
  • 2020-12-20 05:12

    Something like this:

    wordlist <- list(
        c("the","and","it"),
        c("we","and","it")
    )
    require(plyr); require(stringr)
    > ldply(wordlist, function(x) str_count(x, "we"))
      V1 V2 V3
    1  0  0  0
    2  1  0  0
    
    0 讨论(0)
  • 2020-12-20 05:20
    library(qdap)
    
    #create a fake data set like yours:
    words <- list(first = c("a","b","c","a","a","bc", "dBs"), 
        second = c("w","w","q","a"))
    ## termco functions require sentence like structure in a data frame so covert:
    words2 <- list2df(lapply(words, paste, collapse = " "), "wl", "list")[2:1]
    
    
    ## trailing and leading spaces are important in match terms
    ## both a trailing and leading space will match exactly that trerm
    termco(text.var=words2$wl, grouping.var=words2$list, match.list=c(" a "))
    termco(words2$wl, words2$list, match.list=c(" b ", " a "))
    
    ## notice no space at the end of b finds and case of b + any.chunk
    termco(words2$wl, words2$list, match.list=c(" b", " a "))
    
    ## no trailing/leading spaces means find any words containing the chunk b
    termco(words2$wl, words2$list, match.list=c("b", " a "))
    
    #ignores case
    termco(words2$wl, words2$list, match.list=c("b", " a "), ignore.case=T)
    
    ## Last use yields:
    ## 
    ##     list word.count  term(b) term( a )
    ## 1  first          7 3(42.86)  2(28.57)
    ## 2 second          4        0     1(25)
    ## Also:
    
    
    ## transpose like function that transposes a raw matrix 
    with(words2, termco2mat(termco(wl, list, match.list=c("b", " a "))))
    
    ## Which yields raw.score(percentage):
    ## 
    ##   first second
    ## b     2      0
    ## a     2      1
    

    Note that termco creates a class that is actually a list of data.frames.

    raw = raw frequency counts (numeric) prop = proportion of counts (numeric) rnp = raw and proportion combined (character)

    Using Scott's example:

    words <- list(
        first=c("the","and","it", "we're"),
        second=c("we","and","it")
    )
    words2 <- data.frame(list=names(words), 
        wl=unlist(lapply(words, paste, collapse=" ")))
    
    termco(words2$wl, words2$list, match.list=c(" we ", " we"))
    termco(words2$wl, words2$list, match.list=c(" we ", " we"), short.term = FALSE)
    
    0 讨论(0)
  • 2020-12-20 05:24

    You could always stick to grep in the base package for simplicity...

    LinesList <- list ( "1"=letters[1:10], "2"=rep(letters[1:3],3) )
    CountsA <- grep("[a]", LinesList) # find 'a' in each element of list
    length(CountsA) <- length(LinesList) # gives NAs if not counted
    data.frame( lineNum = names(LinesList), count = CountsA)
    
    0 讨论(0)
  • 2020-12-20 05:25

    For one specific word:

    words <- list(a = c("a","b","c","a","a","b"), b = c("w","w","q","a"))
    $a
    [1] "a" "b" "c" "a" "a" "b"
    
    $b
    [1] "w" "w" "q" "a"
    wt <- data.frame(lineNum = 1:length(words))
    wt$count <- sapply(words, function(x) sum(str_count(x, "a")))
      lineNum count
    1       1     3
    2       2     1
    

    If vector w contains words that you want to count:

    w <- c("a","q","e")
    allwords <- lapply(w, function(z) data.frame(lineNum = 1:length(words), 
                count = sapply(words, function(x) sum(str_count(x, z)))))
    names(allwords) <- w
    $a
      lineNum count
    a       1     3
    b       2     1
    
    $q
      lineNum count
    a       1     0
    b       2     1
    
    $e
      lineNum count
    a       1     0
    b       2     0
    
    0 讨论(0)
提交回复
热议问题