Count the number of occurrences of a string using sed?

前端 未结 6 1821
星月不相逢
星月不相逢 2020-12-28 17:18

I have a file which contains \"title\" written in it many times. How can I find the number of times \"title\" is written in that file using the sed command provided that \"t

相关标签:
6条回答
  • 2020-12-28 17:24

    This might work for you:

    sed '/^title/!d' file | sed -n '$='
    
    0 讨论(0)
  • 2020-12-28 17:25

    Never say never. Pure sed (although it may require the GNU version).

    #!/bin/sed -nf
    # based on a script from the sed info file (info sed)
    # section 4.8 Numbering Non-blank Lines (cat -b)
    # modified to count lines that begin with "title"
    
    /^title/! be
    
    x
    /^$/ s/^.*$/0/
    /^9*$/ s/^/0/
    s/.9*$/x&/
    h
    s/^.*x//
    y/0123456789/1234567890/
    x
    s/x.*$//
    G
    s/\n//
    h
    
    :e
    
    $ {x;p}
    

    Explanation:

    #!/bin/sed -nf
    # run sed without printing output by default (-n)
    # using the following file as the sed script (-f)
    
    /^title/! be        # if the current line doesn't begin with "title" branch to label e
    
    x                   # swap the counter from hold space into pattern space
    /^$/ s/^.*$/0/      # if pattern space is empty start the counter at zero
    /^9*$/ s/^/0/       # if pattern space starts with a nine, prepend a zero
    s/.9*$/x&/          # mark the position of the last digit before a sequence of nines (if any)
    h                   # copy the marked counter to hold space
    s/^.*x//            # delete everything before the marker
    y/0123456789/1234567890/   # increment the digits that were after the mark
    x                   # swap pattern space and hold space
    s/x.*$//            # delete everything after the marker leaving the leading digits
    G                   # append hold space to pattern space
    s/\n//              # remove the newline, leaving all the digits concatenated
    h                   # save the counter into hold space
    
    :e                  # label e
    
    $ {x;p}             # if this is the last line of input, swap in the counter and print it
    

    Here are excerpts from a trace of the script using sedsed:

    $ echo -e 'title\ntitle\nfoo\ntitle\nbar\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle' | sedsed-1.0 -d -f ./counter 
    PATT:title$
    HOLD:$
    COMM:/^title/ !b e
    COMM:x
    PATT:$
    HOLD:title$
    COMM:/^$/ s/^.*$/0/
    PATT:0$
    HOLD:title$
    COMM:/^9*$/ s/^/0/
    PATT:0$
    HOLD:title$
    COMM:s/.9*$/x&/
    PATT:x0$
    HOLD:title$
    COMM:h
    PATT:x0$
    HOLD:x0$
    COMM:s/^.*x//
    PATT:0$
    HOLD:x0$
    COMM:y/0123456789/1234567890/
    PATT:1$
    HOLD:x0$
    COMM:x
    PATT:x0$
    HOLD:1$
    COMM:s/x.*$//
    PATT:$
    HOLD:1$
    COMM:G
    PATT:\n1$
    HOLD:1$
    COMM:s/\n//
    PATT:1$
    HOLD:1$
    COMM:h
    PATT:1$
    HOLD:1$
    COMM::e
    COMM:$ {
    PATT:1$
    HOLD:1$
    PATT:title$
    HOLD:1$
    COMM:/^title/ !b e
    COMM:x
    PATT:1$
    HOLD:title$
    COMM:/^$/ s/^.*$/0/
    PATT:1$
    HOLD:title$
    COMM:/^9*$/ s/^/0/
    PATT:1$
    HOLD:title$
    COMM:s/.9*$/x&/
    PATT:x1$
    HOLD:title$
    COMM:h
    PATT:x1$
    HOLD:x1$
    COMM:s/^.*x//
    PATT:1$
    HOLD:x1$
    COMM:y/0123456789/1234567890/
    PATT:2$
    HOLD:x1$
    COMM:x
    PATT:x1$
    HOLD:2$
    COMM:s/x.*$//
    PATT:$
    HOLD:2$
    COMM:G
    PATT:\n2$
    HOLD:2$
    COMM:s/\n//
    PATT:2$
    HOLD:2$
    COMM:h
    PATT:2$
    HOLD:2$
    COMM::e
    COMM:$ {
    PATT:2$
    HOLD:2$
    PATT:foo$
    HOLD:2$
    COMM:/^title/ !b e
    COMM:$ {
    PATT:foo$
    HOLD:2$
    . . .
    PATT:10$
    HOLD:10$
    PATT:title$
    HOLD:10$
    COMM:/^title/ !b e
    COMM:x
    PATT:10$
    HOLD:title$
    COMM:/^$/ s/^.*$/0/
    PATT:10$
    HOLD:title$ 
    COMM:/^9*$/ s/^/0/
    PATT:10$
    HOLD:title$
    COMM:s/.9*$/x&/
    PATT:1x0$
    HOLD:title$
    COMM:h
    PATT:1x0$
    HOLD:1x0$
    COMM:s/^.*x//
    PATT:0$
    HOLD:1x0$
    COMM:y/0123456789/1234567890/
    PATT:1$
    HOLD:1x0$
    COMM:x
    PATT:1x0$
    HOLD:1$
    COMM:s/x.*$//
    PATT:1$
    HOLD:1$
    COMM:G
    PATT:1\n1$
    HOLD:1$
    COMM:s/\n//
    PATT:11$
    HOLD:1$
    COMM:h
    PATT:11$
    HOLD:11$
    COMM::e
    COMM:$ {
    COMM:x
    PATT:11$
    HOLD:11$
    COMM:p
    11
    PATT:11$
    HOLD:11$
    COMM:}
    PATT:11$
    HOLD:11$
    

    The ellipsis represents lines of output I omitted here. The line with "11" on it by itself is where the final count is output. That's the only output you'd get when the sedsed debugger isn't being used.

    0 讨论(0)
  • 2020-12-28 17:27

    I don't think sed would be appropriate, unless you use it in a pipeline to convert your file so that the word you need appears on separate lines, and then use grep -c to count the occurrences.

    I like Jonathan's idea of using tr to convert spaces to newlines. The beauty of this method is that successive spaces get converted to multiple blank lines but it doesn't matter because grep will be able to count just the lines with the single word 'title'.

    0 讨论(0)
  • 2020-12-28 17:32

    Revised answer

    Succinctly, you can't - sed is not the correct tool for the job (it cannot count).

    sed -n '/^title/p' file | grep -c
    

    This looks for lines starting title and prints them, feeding the output into grep to count them. Or, equivalently:

    grep -c '^title' file
    

    Original answer - before the question was edited

    Succinctly, you can't - it is not the correct tool for the job.

    grep -c title file
    
    sed -n /title/p file | wc -l
    

    The second uses sed as a surrogate for grep and sends the output to 'wc' to count lines. Both count the number of lines containing 'title', rather than the number of occurrences of title. You could fix that with something like:

    cat file |
    tr ' ' '\n' |
    grep -c title
    

    The 'tr' command converts blanks into newlines, thus putting each space separated word on its own line, and therefore grep only gets to count lines containing the word title. That works unless you have sequences such as 'title-entitlement' where there's no space separating the two occurrences of title.

    0 讨论(0)
  • 2020-12-28 17:33
    sed 's/title/title\n/g' file | grep -c title
    
    0 讨论(0)
  • 2020-12-28 17:45

    just one gawk command will do. Don't use grep -c because it only counts line with "title" in it, regardless of how many "title"s there are in the line.

    $ more file
    #         title
    #  title
    one
    two
    #title
    title title
    three
    title junk title
    title
    four
    fivetitlesixtitle
    last
    
    $ awk '!/^#.*title/{m=gsub("title","");total+=m}END{print "total: "total}' file
    total: 7
    

    if you just want "title" as the first string, use "==" instead of ~

    awk '$1 == "title"{++c}END{print c}' file
    
    0 讨论(0)
提交回复
热议问题