How to find the number of occurrences of a string in file using windows command line?

前端 未结 9 1301
生来不讨喜
生来不讨喜 2021-01-01 23:35

I have a huge files with e-mail addresses and I would like to count how many of them are in this file. How can I do that using Windows\' command line ?

I have tried

相关标签:
9条回答
  • 2021-01-02 00:10

    This is how I do it, using an AND condition with FINDSTR (to count number of errors in a log file):

    SET COUNT=0
    FOR /F "tokens=4*" %%a IN ('TYPE "soapui.log" ^| FINDSTR.exe /I /R^
     /C:"Assertion" ^| FINDSTR.exe /I /R /C:"has status VALID"') DO (
      :: counts number of lines containing both "Assertion" and "has status VALID"
      SET /A COUNT+=1
    )
    SET /A PASSNUM=%COUNT%
    

    NOTE: This counts "number of lines containing string match" rather than "number of total occurrences in file".

    0 讨论(0)
  • 2021-01-02 00:11

    I would install the unix tools on your system (handy in any case :-), then it's really simple - look e.g. here:

    Count the number of occurrences of a string using sed?

    (Using awk:

    awk '$1 ~ /title/ {++c} END {print c}' FS=: myFile.txt
    

    ).

    You can get the Windows unix tools here:

    http://unxutils.sourceforge.net/

    0 讨论(0)
  • 2021-01-02 00:13

    May be it's a little bit late, but the following script worked for me (the source file contained quote characters, this is why I used 'usebackq' parameter). The caret sign(^) acts as escape character in windows batch scripting language.

    @setlocal enableextensions enabledelayedexpansion    
    SET TOTAL=0
    FOR /F "usebackq tokens=*" %%I IN (file.txt) do (
        SET LN=%%I
        FOR %%J IN ("!LN!") do (
            FOR /F %%K IN ('ECHO %%J ^| FIND /I /C "searchPhrase"') DO (
                @SET /A TOTAL=!TOTAL!+%%K
            )
        )
    )
    ECHO Number of occurences is !TOTAL!
    
    0 讨论(0)
  • 2021-01-02 00:14

    OK - way late to the table, but... it seems many respondents missed the original spec that all email addresses occur on 1 line. This means unless you introduce a CRLF with each occurrence of the @ symbol, your suggestions to use variants of FINDSTR /c will not help.

    Among the Unix tools for DOS is the very powerful SED.exe. Google it. It rocks RegEx. Here's a suggestion:

    find "@" datafile.txt | find "@" | sed "s/@/@\n/g" | find /n "@" | SED "s/\[\(.*\)\].*/Set \/a NumFound=\1/">CountChars.bat
    

    Explanation: (assuming the file with the data is named "Datafile.txt") 1) The 1st FIND includes 3 lines of header info, which throws of a line-count approach, so pipe the results to a 2nd (identical) find to strip off unwanted header info.

    2) Pipe the above results to SED, which will search for each "@" character and replace it with itself+ "\n" (which is a "new line" aka a CRLF) which gets each "@" on its own line in the output stream...

    3) When you pipe the above output from SED into the FIND /n command, you'll be adding line numbers to the beginning of each line. Now, all you have to do is isolate the numeric portion of each line and preface it with "SET /a" to convert each line into a batch statement that (increasingly with each line) sets the variable equal to that line's number.

    4) isolate each line's numeric part and preface the isolated number per the above via:
    | SED "s/\[\(.*\)\].*/Set \/a NumFound=\1/"

    In the above snippet, you're piping the previous commands's output to SED, which uses this syntax "s/WhatToLookFor/WhatToReplaceItWith/", to do these steps:

    a) look for a "[" (which must be "escaped" by prefacing it with "\")

    b) begin saving (or "tokenizing") what follows, up to the closing "]"

        --> in other words it ignores the brackets but stores the number
        --> the ".*" that follows the bracket wildcards whatever follows the "]"
    

    c) the stuff between the \( and the \) is "tokenized", which means it can be referred-to later, in the "WhatToReplaceItWith" section. The first stuff that's tokenized is referred to via "\1" then second as "\2", etc.

    So... we're ignoring the [ and the ] and we're saving the number that lies between the brackets and IGNORING all the wild-carded remainder of each line... thus we're replacing the line with the literal string: Set /a NumFound= + the saved, or "tokenized" number, i.e. ...the first line will read: Set /a NumFound=1 ...& the next line reads: Set /a NumFound=2 etc. etc.

    Thus, if you have 1,283 email addresses, your results will have 1,283 lines.

    The last one executed = the one that matters.

    If you use the ">" character to redirect all of the above output to a batch file, i.e.: > CountChars.bat

    ...then just call that batch file & you'll have a DOS environment variable named "NumFound" with your answer.

    0 讨论(0)
  • 2021-01-02 00:19

    Using what you have, you could pipe the results through a find. I've seen something like this used from time to time.

    findstr /c:"@" mail.txt | find /c /v "GarbageStringDefNotInYourResults"
    

    So you are counting the lines resulting from your findstr command that do not have the garbage string in it. Kind of a hack, but it could work for you. Alternatively, just use the find /c on the string you do care about being there. Lastly, you mentioned one address per line, so in this case the above works, but multiple addresses per line and this breaks.

    0 讨论(0)
  • 2021-01-02 00:30

    Use this:

    type file.txt | find /i "@" /c
    
    0 讨论(0)
提交回复
热议问题