问题
I need to retrieve the last n lines of huge files (1-4 Gb), in Windows 7. Due to corporate restrictions, I cannot run any command that is not built-in. The problem is that all solutions I found appear to read the whole file, so they are extremely slow.
Can this be accomplished, fast?
Notes:
- I managed to get the first n lines, fast.
- It is ok if I get the last n bytes. (I used this https://stackoverflow.com/a/18936628/2707864 for the first n bytes).
Solutions here Unix tail equivalent command in Windows Powershell did not work.
Using -wait
does not make it fast. I do not have -tail
(and I do not know if it will work fast).
PS: There are quite a few related questions for head
and tail
, but not focused on the issue of speed. Therefore, useful or accepted answers there may not be useful here. E.g.,
Windows equivalent of the 'tail' command
CMD.EXE batch script to display last 10 lines from a txt file
Extract N lines from file using single windows command
https://serverfault.com/questions/490841/how-to-display-the-first-n-lines-of-a-command-output-in-windows-the-equivalent
powershell to get the first x MB of a file
https://superuser.com/questions/859870/windows-equivalent-of-the-head-c-command
回答1:
How about this (reads last 8 bytes for demo):
$fpath = "C:\10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-8, 'End') | Out-Null
for ($i = 0; $i -lt 8; $i++)
{
$fs.ReadByte()
}
UPDATE. To interpret bytes as string (but be sure to select correct encoding - here UTF8 is used):
$N = 8
$fpath = "C:\10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$N, [System.IO.SeekOrigin]::End) | Out-Null
$buffer = new-object Byte[] $N
$fs.Read($buffer, 0, $N) | Out-Null
$fs.Close()
[System.Text.Encoding]::UTF8.GetString($buffer)
UPDATE 2. To read last M lines, we'll be reading the file by portions until there are more than M newline char sequences in the result:
$M = 3
$fpath = "C:\10GBfile.dat"
$result = ""
$seq = "`r`n"
$buffer_size = 10
$buffer = new-object Byte[] $buffer_size
$fs = [IO.File]::OpenRead($fpath)
while (([regex]::Matches($result, $seq)).Count -lt $M)
{
$fs.Seek(-($result.Length + $buffer_size), [System.IO.SeekOrigin]::End) | Out-Null
$fs.Read($buffer, 0, $buffer_size) | Out-Null
$result = [System.Text.Encoding]::UTF8.GetString($buffer) + $result
}
$fs.Close()
($result -split $seq) | Select -Last $M
Try playing with bigger $buffer_size
- this ideally is equal to expected average line length to make fewer disk operations. Also pay attention to $seq - this could be \r\n
or just \n
.
This is very dirty code without any error handling and optimizations.
回答2:
If you have PowerShell 3 or higher, you can use the -Tail
parameter for Get-Content
to get the last n
lines.
Get-content -tail 5 PATH_TO_FILE;
On a 34MB text file on my local SSD, this returned in 1 millisecond vs. 8.5 seconds for get-content |select -last 5
回答3:
With the awesome answer by Aziz Kabyshev, which solves the issue of speed, and with some googling, I ended up using this script
$fpath = $Args[1]
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$Args[0], 'End') | Out-Null
$mystr = ''
for ($i = 0; $i -lt $Args[0]; $i++)
{
$mystr = ($mystr) + ([char[]]($fs.ReadByte()))
}
$fs.Close()
Write-Host $mystr
which I call from a batch file containing
@PowerShell -NoProfile -ExecutionPolicy Bypass -Command "& '.\myscript.ps1' %1 %2"
(thanks to How to run a PowerShell script from a batch file).
回答4:
This is not an answer, but a large comment as reply to sancho.s' answer.
When you want to use small PowerShell scripts from a Batch file, I suggest you to use the method below, that is simpler and allows to keep all the code in the same Batch file:
@PowerShell ^
$fpath = %2; ^
$fs = [IO.File]::OpenRead($fpath); ^
$fs.Seek(-%1, 'End') ^| Out-Null; ^
$mystr = ''; ^
for ($i = 0; $i -lt %1; $i++) ^
{ ^
$mystr = ($mystr) + ([char[]]($fs.ReadByte())); ^
} ^
Write-Host $mystr
%End PowerShell%
来源:https://stackoverflow.com/questions/36507343/get-last-n-lines-or-bytes-of-a-huge-file-in-windows-like-unixs-tail-avoid-ti