I have a file which contains \"title\" written in it many times. How can I find the number of times \"title\" is written in that file using the sed command provided that \"t
This might work for you:
sed '/^title/!d' file | sed -n '$='
Never say never. Pure sed
(although it may require the GNU version).
#!/bin/sed -nf
# based on a script from the sed info file (info sed)
# section 4.8 Numbering Non-blank Lines (cat -b)
# modified to count lines that begin with "title"
/^title/! be
x
/^$/ s/^.*$/0/
/^9*$/ s/^/0/
s/.9*$/x&/
h
s/^.*x//
y/0123456789/1234567890/
x
s/x.*$//
G
s/\n//
h
:e
$ {x;p}
Explanation:
#!/bin/sed -nf
# run sed without printing output by default (-n)
# using the following file as the sed script (-f)
/^title/! be # if the current line doesn't begin with "title" branch to label e
x # swap the counter from hold space into pattern space
/^$/ s/^.*$/0/ # if pattern space is empty start the counter at zero
/^9*$/ s/^/0/ # if pattern space starts with a nine, prepend a zero
s/.9*$/x&/ # mark the position of the last digit before a sequence of nines (if any)
h # copy the marked counter to hold space
s/^.*x// # delete everything before the marker
y/0123456789/1234567890/ # increment the digits that were after the mark
x # swap pattern space and hold space
s/x.*$// # delete everything after the marker leaving the leading digits
G # append hold space to pattern space
s/\n// # remove the newline, leaving all the digits concatenated
h # save the counter into hold space
:e # label e
$ {x;p} # if this is the last line of input, swap in the counter and print it
Here are excerpts from a trace of the script using sedsed:
$ echo -e 'title\ntitle\nfoo\ntitle\nbar\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle' | sedsed-1.0 -d -f ./counter
PATT:title$
HOLD:$
COMM:/^title/ !b e
COMM:x
PATT:$
HOLD:title$
COMM:/^$/ s/^.*$/0/
PATT:0$
HOLD:title$
COMM:/^9*$/ s/^/0/
PATT:0$
HOLD:title$
COMM:s/.9*$/x&/
PATT:x0$
HOLD:title$
COMM:h
PATT:x0$
HOLD:x0$
COMM:s/^.*x//
PATT:0$
HOLD:x0$
COMM:y/0123456789/1234567890/
PATT:1$
HOLD:x0$
COMM:x
PATT:x0$
HOLD:1$
COMM:s/x.*$//
PATT:$
HOLD:1$
COMM:G
PATT:\n1$
HOLD:1$
COMM:s/\n//
PATT:1$
HOLD:1$
COMM:h
PATT:1$
HOLD:1$
COMM::e
COMM:$ {
PATT:1$
HOLD:1$
PATT:title$
HOLD:1$
COMM:/^title/ !b e
COMM:x
PATT:1$
HOLD:title$
COMM:/^$/ s/^.*$/0/
PATT:1$
HOLD:title$
COMM:/^9*$/ s/^/0/
PATT:1$
HOLD:title$
COMM:s/.9*$/x&/
PATT:x1$
HOLD:title$
COMM:h
PATT:x1$
HOLD:x1$
COMM:s/^.*x//
PATT:1$
HOLD:x1$
COMM:y/0123456789/1234567890/
PATT:2$
HOLD:x1$
COMM:x
PATT:x1$
HOLD:2$
COMM:s/x.*$//
PATT:$
HOLD:2$
COMM:G
PATT:\n2$
HOLD:2$
COMM:s/\n//
PATT:2$
HOLD:2$
COMM:h
PATT:2$
HOLD:2$
COMM::e
COMM:$ {
PATT:2$
HOLD:2$
PATT:foo$
HOLD:2$
COMM:/^title/ !b e
COMM:$ {
PATT:foo$
HOLD:2$
. . .
PATT:10$
HOLD:10$
PATT:title$
HOLD:10$
COMM:/^title/ !b e
COMM:x
PATT:10$
HOLD:title$
COMM:/^$/ s/^.*$/0/
PATT:10$
HOLD:title$
COMM:/^9*$/ s/^/0/
PATT:10$
HOLD:title$
COMM:s/.9*$/x&/
PATT:1x0$
HOLD:title$
COMM:h
PATT:1x0$
HOLD:1x0$
COMM:s/^.*x//
PATT:0$
HOLD:1x0$
COMM:y/0123456789/1234567890/
PATT:1$
HOLD:1x0$
COMM:x
PATT:1x0$
HOLD:1$
COMM:s/x.*$//
PATT:1$
HOLD:1$
COMM:G
PATT:1\n1$
HOLD:1$
COMM:s/\n//
PATT:11$
HOLD:1$
COMM:h
PATT:11$
HOLD:11$
COMM::e
COMM:$ {
COMM:x
PATT:11$
HOLD:11$
COMM:p
11
PATT:11$
HOLD:11$
COMM:}
PATT:11$
HOLD:11$
The ellipsis represents lines of output I omitted here. The line with "11" on it by itself is where the final count is output. That's the only output you'd get when the sedsed
debugger isn't being used.
I don't think sed
would be appropriate, unless you use it in a pipeline to convert your file so that the word you need appears on separate lines, and then use grep -c
to count the occurrences.
I like Jonathan's idea of using tr
to convert spaces to newlines. The beauty of this method is that successive spaces get converted to multiple blank lines but it doesn't matter because grep
will be able to count just the lines with the single word 'title'.
Succinctly, you can't - sed is not the correct tool for the job (it cannot count).
sed -n '/^title/p' file | grep -c
This looks for lines starting title and prints them, feeding the output into grep to count them. Or, equivalently:
grep -c '^title' file
Succinctly, you can't - it is not the correct tool for the job.
grep -c title file
sed -n /title/p file | wc -l
The second uses sed as a surrogate for grep and sends the output to 'wc' to count lines. Both count the number of lines containing 'title', rather than the number of occurrences of title. You could fix that with something like:
cat file |
tr ' ' '\n' |
grep -c title
The 'tr' command converts blanks into newlines, thus putting each space separated word on its own line, and therefore grep only gets to count lines containing the word title. That works unless you have sequences such as 'title-entitlement' where there's no space separating the two occurrences of title.
sed 's/title/title\n/g' file | grep -c title
just one gawk command will do. Don't use grep -c because it only counts line with "title" in it, regardless of how many "title"s there are in the line.
$ more file
# title
# title
one
two
#title
title title
three
title junk title
title
four
fivetitlesixtitle
last
$ awk '!/^#.*title/{m=gsub("title","");total+=m}END{print "total: "total}' file
total: 7
if you just want "title" as the first string, use "==" instead of ~
awk '$1 == "title"{++c}END{print c}' file