How to find the number of occurrences of a string in file using windows command line?

前端未结

关注

 9  1301

I have a huge files with e-mail addresses and I would like to count how many of them are in this file. How can I do that using Windows\' command line ?

I have tried

相关标签:

9条回答

暗喜

2021-01-02 00:10
This is how I do it, using an AND condition with FINDSTR (to count number of errors in a log file):
```
SET COUNT=0
FOR /F "tokens=4*" %%a IN ('TYPE "soapui.log" ^| FINDSTR.exe /I /R^
 /C:"Assertion" ^| FINDSTR.exe /I /R /C:"has status VALID"') DO (
  :: counts number of lines containing both "Assertion" and "has status VALID"
  SET /A COUNT+=1
)
SET /A PASSNUM=%COUNT%
```
NOTE: This counts "number of lines containing string match" rather than "number of total occurrences in file".
0 讨论(0)
发布评论:

提交评论
- 加载中...
温柔的废话

2021-01-02 00:11
I would install the unix tools on your system (handy in any case :-), then it's really simple - look e.g. here:

Count the number of occurrences of a string using sed?

(Using awk:
```
awk '$1 ~ /title/ {++c} END {print c}' FS=: myFile.txt
```
).

You can get the Windows unix tools here:

http://unxutils.sourceforge.net/
0 讨论(0)
发布评论:

提交评论
- 加载中...

长情又很酷

2021-01-02 00:13

May be it's a little bit late, but the following script worked for me (the source file contained quote characters, this is why I used 'usebackq' parameter). The caret sign(^) acts as escape character in windows batch scripting language.

@setlocal enableextensions enabledelayedexpansion    
SET TOTAL=0
FOR /F "usebackq tokens=*" %%I IN (file.txt) do (
    SET LN=%%I
    FOR %%J IN ("!LN!") do (
        FOR /F %%K IN ('ECHO %%J ^| FIND /I /C "searchPhrase"') DO (
            @SET /A TOTAL=!TOTAL!+%%K
        )
    )
)
ECHO Number of occurences is !TOTAL!

0 讨论(0)

南旧

2021-01-02 00:14
OK - way late to the table, but... it seems many respondents missed the original spec that all email addresses occur on 1 line. This means unless you introduce a CRLF with each occurrence of the @ symbol, your suggestions to use variants of FINDSTR /c will not help.

Among the Unix tools for DOS is the very powerful SED.exe. Google it. It rocks RegEx. Here's a suggestion:
```
find "@" datafile.txt | find "@" | sed "s/@/@\n/g" | find /n "@" | SED "s/\[$.*$\].*/Set \/a NumFound=\1/">CountChars.bat
```
Explanation: (assuming the file with the data is named "Datafile.txt") 1) The 1st FIND includes 3 lines of header info, which throws of a line-count approach, so pipe the results to a 2nd (identical) find to strip off unwanted header info.

2) Pipe the above results to SED, which will search for each "@" character and replace it with itself+ "\n" (which is a "new line" aka a CRLF) which gets each "@" on its own line in the output stream...

3) When you pipe the above output from SED into the FIND /n command, you'll be adding line numbers to the beginning of each line. Now, all you have to do is isolate the numeric portion of each line and preface it with "SET /a" to convert each line into a batch statement that (increasingly with each line) sets the variable equal to that line's number.

4) isolate each line's numeric part and preface the isolated number per the above via:
| SED "s/\[$.*$\].*/Set \/a NumFound=\1/"

In the above snippet, you're piping the previous commands's output to SED, which uses this syntax "s/WhatToLookFor/WhatToReplaceItWith/", to do these steps:

a) look for a "[" (which must be "escaped" by prefacing it with "\")

b) begin saving (or "tokenizing") what follows, up to the closing "]"
```
    --> in other words it ignores the brackets but stores the number
    --> the ".*" that follows the bracket wildcards whatever follows the "]"
```
c) the stuff between the $ and the $ is "tokenized", which means it can be referred-to later, in the "WhatToReplaceItWith" section. The first stuff that's tokenized is referred to via "\1" then second as "\2", etc.

So... we're ignoring the [ and the ] and we're saving the number that lies between the brackets and IGNORING all the wild-carded remainder of each line... thus we're replacing the line with the literal string: Set /a NumFound= + the saved, or "tokenized" number, i.e. ...the first line will read: Set /a NumFound=1 ...& the next line reads: Set /a NumFound=2 etc. etc.

Thus, if you have 1,283 email addresses, your results will have 1,283 lines.

The last one executed = the one that matters.

If you use the ">" character to redirect all of the above output to a batch file, i.e.: > CountChars.bat

...then just call that batch file & you'll have a DOS environment variable named "NumFound" with your answer.
0 讨论(0)
发布评论:

提交评论
- 加载中...
悲&欢浪女

2021-01-02 00:19
Using what you have, you could pipe the results through a find. I've seen something like this used from time to time.
```
findstr /c:"@" mail.txt | find /c /v "GarbageStringDefNotInYourResults"
```
So you are counting the lines resulting from your findstr command that do not have the garbage string in it. Kind of a hack, but it could work for you. Alternatively, just use the find /c on the string you do care about being there. Lastly, you mentioned one address per line, so in this case the above works, but multiple addresses per line and this breaks.
0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2021-01-02 00:30
Use this:
```
type file.txt | find /i "@" /c
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页