In the code sample below, I am using 10 kilobytes for that chunking. I found that number suited most of my needs. However, you can greatly increase that number for large number of lines to be returned (I used 4MB for my million line test). You can also do a little automatic tuning by altering the number of bytes using the number of lines you are seeking. One thing to be aware when passing files to this code, if you pass a file to System.IO.File/FileStream without a full path, it will not assume the file is located in the path of the executed script so Test-Path is not a valid test. Using System.IO.Directory.GetCurrentDirectory, you can find this by running the following in PowerShell:
[System.IO.Directory]::GetCurrentDirectory()More than likely, it will point to the home directory of the profile the shell is executed under.
Also be aware that this tail-like function does not handle unicode log files. The method I am using to decode the bytes is ASCII dependent. I am not using System.Text.UnicodeEncoding yet in the code. Currently ASCII meets all my needs for reading log files but I am still interested in adding compatibility to this function. I am also assuming that all log files denote the end of a line using carriage return & line feed (CHR 13 + CHR 10) which is how majority of text files are written in Windows. UNIX & old style Macintosh text files will not work properly with this code. You will need to modify line 23 to change the delimiter for the split for those text file formats.
UPDATE: I have now finished an update that provides the "tail -f" functionality for continuously reading the updates to a text file. Read about it in my blog post, Replicating UNIX "tail -f" in PowerShell.
UPDATE: I have updated the code to handle unicode text files and non-Windows new lines. You can review the code here.
Function Read-EndOfFileByByteChunk($fileName,$totalNumberOfLines,$byteChunk) { if($totalNumberOfLines -lt 1) { $totalNumberOfLines = 1 } if($byteChunk -le 0) { $byteChunk = 10240 } $linesOfText = New-Object System.Collections.ArrayList if([System.IO.File]::Exists($fileName)) { $fileStream = New-Object System.IO.FileStream($fileName,[System.IO.FileMode]::Open,[System.IO.FileAccess]::Read,[System.IO.FileShare]::ReadWrite) $asciiEncoding = New-Object System.Text.ASCIIEncoding $fileSize = $fileStream.Length $byteOffset = $byteChunk [byte[]] $bytesRead = New-Object byte[] $byteChunk $totalBytesProcessed = 0 $lastReadAttempt = $false do { if($byteOffset -ge $fileSize) { $byteChunk = $fileSize - $totalBytesProcessed [byte[]] $bytesRead = New-Object byte[] $byteChunk $byteOffset = $fileSize $lastReadAttempt = $true } $fileStream.Seek((-$byteOffset), [System.IO.SeekOrigin]::End) | Out-Null $fileStream.Read($bytesRead, 0, $byteChunk) | Out-Null $chunkOfText = New-Object System.Collections.ArrayList $chunkOfText.AddRange(([System.Text.RegularExpressions.Regex]::Split($asciiEncoding.GetString($bytesRead),"\r\n"))) $firstLineLength = ($chunkOfText[0].Length) $byteOffset = ($byteOffset + $byteChunk) - ($firstLineLength) if($lastReadAttempt -eq $false -and $chunkOfText.count -lt $totalNumberOfLines) { $chunkOfText.RemoveAt(0) } $totalBytesProcessed += ($byteChunk - $firstLineLength) $linesOfText.InsertRange(0, $chunkOfText) } while($totalNumberOfLines -ge $linesOfText.count -and $lastReadAttempt -eq $false -and $totalBytesProcessed -lt $fileSize) $fileStream.Close() if($linesOfText.count -gt 1) { $linesOfText.RemoveAt($linesOfText.count-1) } $deltaLines = ($linesOfText.count - $totalNumberOfLines) if($deltaLines -gt 0) { $linesOfText.RemoveRange(0, $deltaLines) } } else { $linesOfText.Add("[ERROR] $fileName not found") | Out-Null } return $linesOfText } #--------------------------------------------------------------------------------------------------# $fileName = "C:\Logs\really-huge.log" # Your really big log file $numberOfLines = 100 # Number of lines from the end of the really big log file to return $byteChunk = 10240 # Size of bytes read per seek during the search for lines to return #################################################################################################### ## This is a possible self-tuning method you can use but will blow up memory on an enormous ## number of lines to return ## $byteChunk = $numberOfLines * 256 #################################################################################################### $lastLines = @() $lastLines = Read-EndOfFileByByteChunk $fileName $numberOfLines $byteChunk foreach($lineOfText in $lastLines) { Write-Output $lineOfText }
No comments:
Post a Comment