Match select-string of > 11 characters and also starting after a certain point in file

旧巷老猫 提交于 2019-12-25 03:45:22

问题


I would like to search through a file (std_serverX.out) for a value of string cpu= that is 11 characters or greater. This file can contain anywhere up to or exceeding 1 Million lines.

To restrict the search further, I would like the search for cpu= to start after the first occurrence of the string Java Thread Dump has been found. In my source file, the string Java Thread Dump does not begin until approximately the line # 1013169, of a file 1057465 lines long, so therefore 96% of what precedes Java Thread Dump is unnecessary..

Here is a section of the file that I would like to search:

cpu=191362359.38 [reset 191362359.38] ms elapsed=1288865.05 [reset 1288865.05] s allocated=86688238148864 B (78.84 TB) [reset 86688238148864 B (78.84 TB)] defined_classes=468 
io= file i/o: 588014/275091 B, net i/o: 36449/41265 B, files opened:19, socks opened:0 [reset file i/o: 588014/275091 B, net i/o: 36449/41265 B, files opened:19, socks opened:0 ] 
user="Guest" application="JavaEE/ResetPassword" tid=0x0000000047a8b000 nid=0x1b10 / 6928 runnable [_thread_blocked (_call_back), stack(0x0000000070de0000,0x0000000070fe0000)] [0x0000000070fdd000] java.lang.Thread.State: RUNNABLE

Above, you can see that cpu=191362359.38 is 12 characters long (including full stop and 2 decimal places). How do I match it so that values of cpu= smaller than 11 characters are ignored and not printed to file?

Here is what I have so far:

Get-Content -Path .\std_server*.out | Select-String '(cpu=)' | out-File  -width 1024 .\output.txt

I have stripped my command down to its absolute basics so I do not get confused by other search requirements.

Also, I want this command to be as basic as possible that it can be run in one command-line in Powershell, if possible. So no advanced scripts or defined variables, if we can avoid it... :)

This is related to a previous message I opened which got complicated by my not defining precisely my requirements.

Thanks in advance for your help.

Antóin


回答1:


regex to look for 9 digits followed by a literal . followed by 1 or more digits. all one line

Get-Content -Path .\std_server*.out | 
 Select-String -Pattern 'cpu=\d{9}\.\d+' -AllMatches | 
  Select-Object -ExpandProperty matches  | 
    Select-Object -ExpandProperty value



回答2:


It can certainly be done, but piping a million lines, the first 96% of which you know has no relevance is not going to be very fast/efficient.

A faster approach would be to use a StreamReader and just skip over the lines until the Java Thread Dump string is found:

$CPULines = @()

foreach($file in Get-Item .\std_server*.out)
{

    # Create stream reader from file
    $Reader = New-Object -TypeName 'System.IO.StreamReader' -ArgumentList $file.FullName
    $JTDFound = $false

    # Read file line by line
    while(($line = $Reader.ReadLine()))
    {
        # Keep looking until 'Java Thread Dump' is found 
        if(-not $JTDFound)
        {
            $JTDFound = $line.Contains('Java Thread Dump')
        }
        else
        {
            # Then, if a value matching your description is found, add that line to our results
            if($line -match '^cpu=([\d\.]{11,})\s')
            {
                $CPULines += $line
            }
        }
    }

    # dispose of the stream reader
    $Reader.Dispose()
}

# Write output to file
$CPULines |Out-File .\output.txt


来源:https://stackoverflow.com/questions/35221414/match-select-string-of-11-characters-and-also-starting-after-a-certain-point-i

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!