问题
I am complete new in writting powershell scripts. So far I was using plain batch for my purpose as this is the requirement by my company. Inside this batch I am using nested foor loops to make a comparison of two .txt files, in detail I wantdo do the following:
- File 1 contains lots of strings. Each string is in one seperate line with a preceded number and semicolon like so:
658;RMS
- File 2 is some long text.
The aim is to count the amount of occurences of each string from File 1 in File 2, e.g. RMS is counted 300 times.
As my previous code hase some huge drawbacks concerning runtime (File 1 has approx. 400 lines and File 2 500.000) I read that the Select-String from Powershell is much more efficient. However, as I am reading some tutorials it is not clear to me how I can proceed here, beside that I have to run the powershellcode inside my .bat. My biggest problem is I am not sure how and where to place my 'variables', so the two inputfiles 1 and 2
So far I was testing the Select-String method like this:
powershell -command "& {Select-String -Path *.txt -Pattern "RMS"}"
My assumption would be to make use of piping, so something like this:
powershell -command "& {<<path to file one, should read line by line>> | Select-String -Path File2.txt -Pattern "value of file 1"}"
However, I am not getting this to work. Powershell is excpecting some kind of psobject
before the first pipe?
回答1:
For optimal performance, I would approach this task like so.
- Read the file with the terms as a CSV (it is a CSV, with a
;
delimiter) - Read the other file into a string
- For each term, count how often it can be found in the target string (using
.IndexOf()
)
For example
$data = Import-Csv "file1.txt" -Delimiter ";" -Header ID,Term
$target = Get-Content "file2.txt" -Raw
$counts = @{}
foreach ($term in $data.Term) {
$index = -1
$count = 0
do {
$index = $target.IndexOf($term, $index + 1)
if ($index -gt -1) { $count++ } else { break; }
} while ($true);
$counts[$term] = $count
}
$counts
Notes
Import-Csv
will automatically use the first line in the input file as the header. If your file already has a header, you can remove the-Headers
parameter.Get-Content
will will read the input file into an array of lines by default. But for this approach, having the entire file as one big string is the right thing - that's what-Raw
does.@{}
creates an empty hashtable$data.Term
will access one column of the CSV.IndexOf()
is case sensitive. By default, PowerShell is case-insenstive, but native .NET methods like this one will not change their behavior. This might or might not be what you need - use.ToLower()
on the$target
and the$term
if you don't care for case.
回答2:
Select-String
is useful, but it isn't magic :)
Performance impact in mind, I would approach it like this:
- For each line in
File2
:- Test for occurences of all terms in
File1
- Test for occurences of all terms in
This way, you only need to read and evalulate File2
once:
# prepare hashtable to keep track of count
$count = @{}
# read terms to search for from file1
$termsToFind = Get-Content .\file1 |ForEach-Object {
$_ -split ';' |Select -Last 1
}
# loop over lines in file2, count the words we're searching for
Get-Content .\test\file2 |ForEach-Object {
foreach($term in $termsToFind){
# Using `Regex.Matches()` will help us find multiple occurrences of the same term
$count[$term] += [regex]::Matches($_,"\b$([regex]::Escape($term))\b").Count
}
}
Now $count
will be a hashtable where the key is the term from file1, and the value is the count of each word.
Output to the same format as file1
with:
$count.GetEnumerator() |ForEach-Object { $_.Value,$_.Key -join ';' } |Set-Content output.txt
回答3:
If you check the docs, you can't pipe -pattern to select-string. You can use parentheses to make the output of something become the pattern argument:
powershell select-string -pattern (get-content file1) -path file2
Using the fact that pattern is position 0 and path is position 1. -pattern can also be an array.
powershell select-string (get-content file1) file2
来源:https://stackoverflow.com/questions/62003411/using-select-string-for-checking-two-txt-files-in-powershell