Get last n lines or bytes of a huge file in Windows (like Unix's tail). Avoid time consuming options

自闭症网瘾萝莉.ら 提交于 2019-12-18 12:12:38

问题


I need to retrieve the last n lines of huge files (1-4 Gb), in Windows 7. Due to corporate restrictions, I cannot run any command that is not built-in. The problem is that all solutions I found appear to read the whole file, so they are extremely slow.

Can this be accomplished, fast?

Notes:

  1. I managed to get the first n lines, fast.
  2. It is ok if I get the last n bytes. (I used this https://stackoverflow.com/a/18936628/2707864 for the first n bytes).

Solutions here Unix tail equivalent command in Windows Powershell did not work. Using -wait does not make it fast. I do not have -tail (and I do not know if it will work fast).

PS: There are quite a few related questions for head and tail, but not focused on the issue of speed. Therefore, useful or accepted answers there may not be useful here. E.g.,

Windows equivalent of the 'tail' command

CMD.EXE batch script to display last 10 lines from a txt file

Extract N lines from file using single windows command

https://serverfault.com/questions/490841/how-to-display-the-first-n-lines-of-a-command-output-in-windows-the-equivalent

powershell to get the first x MB of a file

https://superuser.com/questions/859870/windows-equivalent-of-the-head-c-command


回答1:


How about this (reads last 8 bytes for demo):

$fpath = "C:\10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-8, 'End') | Out-Null
for ($i = 0; $i -lt 8; $i++)
{
    $fs.ReadByte()
}

UPDATE. To interpret bytes as string (but be sure to select correct encoding - here UTF8 is used):

$N = 8
$fpath = "C:\10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$N, [System.IO.SeekOrigin]::End) | Out-Null
$buffer = new-object Byte[] $N
$fs.Read($buffer, 0, $N) | Out-Null
$fs.Close()
[System.Text.Encoding]::UTF8.GetString($buffer)

UPDATE 2. To read last M lines, we'll be reading the file by portions until there are more than M newline char sequences in the result:

$M = 3
$fpath = "C:\10GBfile.dat"

$result = ""
$seq = "`r`n"
$buffer_size = 10
$buffer = new-object Byte[] $buffer_size

$fs = [IO.File]::OpenRead($fpath)
while (([regex]::Matches($result, $seq)).Count -lt $M)
{
    $fs.Seek(-($result.Length + $buffer_size), [System.IO.SeekOrigin]::End) | Out-Null
    $fs.Read($buffer, 0, $buffer_size) | Out-Null
    $result = [System.Text.Encoding]::UTF8.GetString($buffer) + $result
}
$fs.Close()

($result -split $seq) | Select -Last $M

Try playing with bigger $buffer_size - this ideally is equal to expected average line length to make fewer disk operations. Also pay attention to $seq - this could be \r\n or just \n. This is very dirty code without any error handling and optimizations.




回答2:


If you have PowerShell 3 or higher, you can use the -Tail parameter for Get-Content to get the last n lines.

Get-content -tail 5 PATH_TO_FILE;

On a 34MB text file on my local SSD, this returned in 1 millisecond vs. 8.5 seconds for get-content |select -last 5




回答3:


With the awesome answer by Aziz Kabyshev, which solves the issue of speed, and with some googling, I ended up using this script

$fpath = $Args[1]
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$Args[0], 'End') | Out-Null
$mystr = ''
for ($i = 0; $i -lt $Args[0]; $i++)
{
    $mystr = ($mystr) + ([char[]]($fs.ReadByte()))
}
$fs.Close()
Write-Host $mystr

which I call from a batch file containing

@PowerShell -NoProfile -ExecutionPolicy Bypass -Command "& '.\myscript.ps1' %1 %2"

(thanks to How to run a PowerShell script from a batch file).




回答4:


This is not an answer, but a large comment as reply to sancho.s' answer.

When you want to use small PowerShell scripts from a Batch file, I suggest you to use the method below, that is simpler and allows to keep all the code in the same Batch file:

@PowerShell  ^
   $fpath = %2;  ^
   $fs = [IO.File]::OpenRead($fpath);  ^
   $fs.Seek(-%1, 'End') ^| Out-Null;  ^
   $mystr = '';  ^
   for ($i = 0; $i -lt %1; $i++)  ^
   {  ^
      $mystr = ($mystr) + ([char[]]($fs.ReadByte()));  ^
   }  ^
   Write-Host $mystr
%End PowerShell%


来源:https://stackoverflow.com/questions/36507343/get-last-n-lines-or-bytes-of-a-huge-file-in-windows-like-unixs-tail-avoid-ti

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!