问题
I would like to search through a file (std_serverX.out) for a value of string cpu= that is 11 characters or greater. This file can contain anywhere up to or exceeding 1 Million lines.
To restrict the search further, I would like the search for cpu= to start after the first occurrence of the string Java Thread Dump has been found. In my source file, the string Java Thread Dump does not begin until approximately the line # 1013169, of a file 1057465 lines long, so therefore 96% of what precedes Java Thread Dump is unnecessary..
Here is a section of the file that I would like to search:
cpu=191362359.38 [reset 191362359.38] ms elapsed=1288865.05 [reset 1288865.05] s allocated=86688238148864 B (78.84 TB) [reset 86688238148864 B (78.84 TB)] defined_classes=468
io= file i/o: 588014/275091 B, net i/o: 36449/41265 B, files opened:19, socks opened:0 [reset file i/o: 588014/275091 B, net i/o: 36449/41265 B, files opened:19, socks opened:0 ]
user="Guest" application="JavaEE/ResetPassword" tid=0x0000000047a8b000 nid=0x1b10 / 6928 runnable [_thread_blocked (_call_back), stack(0x0000000070de0000,0x0000000070fe0000)] [0x0000000070fdd000] java.lang.Thread.State: RUNNABLE
Above, you can see that cpu=191362359.38 is 12 characters long (including full stop and 2 decimal places). How do I match it so that values of cpu= smaller than 11 characters are ignored and not printed to file?
Here is what I have so far:
Get-Content -Path .\std_server*.out | Select-String '(cpu=)' | out-File -width 1024 .\output.txt
I have stripped my command down to its absolute basics so I do not get confused by other search requirements.
Also, I want this command to be as basic as possible that it can be run in one command-line in Powershell, if possible. So no advanced scripts or defined variables, if we can avoid it... :)
This is related to a previous message I opened which got complicated by my not defining precisely my requirements.
Thanks in advance for your help.
Antóin
回答1:
regex to look for 9 digits followed by a literal .
followed by 1 or more digits. all one line
Get-Content -Path .\std_server*.out |
Select-String -Pattern 'cpu=\d{9}\.\d+' -AllMatches |
Select-Object -ExpandProperty matches |
Select-Object -ExpandProperty value
回答2:
It can certainly be done, but piping a million lines, the first 96% of which you know has no relevance is not going to be very fast/efficient.
A faster approach would be to use a StreamReader
and just skip over the lines until the Java Thread Dump
string is found:
$CPULines = @()
foreach($file in Get-Item .\std_server*.out)
{
# Create stream reader from file
$Reader = New-Object -TypeName 'System.IO.StreamReader' -ArgumentList $file.FullName
$JTDFound = $false
# Read file line by line
while(($line = $Reader.ReadLine()))
{
# Keep looking until 'Java Thread Dump' is found
if(-not $JTDFound)
{
$JTDFound = $line.Contains('Java Thread Dump')
}
else
{
# Then, if a value matching your description is found, add that line to our results
if($line -match '^cpu=([\d\.]{11,})\s')
{
$CPULines += $line
}
}
}
# dispose of the stream reader
$Reader.Dispose()
}
# Write output to file
$CPULines |Out-File .\output.txt
来源:https://stackoverflow.com/questions/35221414/match-select-string-of-11-characters-and-also-starting-after-a-certain-point-i