Monday, March 7, 2011

Unix Tail-like Functionality in PowerShell

A common tool I use in shell scripts on Unix/Linux/Mac OS X servers is tail. While there are command-line tail conversions for Windows, I need something I can integrate into a script for reading the end of large log files, search for information and act on that result. I don't want to distribute third party software along with the script to accomplish the task. Get-Content and Select-Object are not suitable for large files.

After researching the capabilities of File IO in .Net, I found that System.IO.FileStream class had just what I needed. Using this class, I read the target text file byte by byte from the end of the file until I reach a selected number of lines of text delimited by a carriage return. The amount of time it takes to obtain the data is related to the number of characters per line. It works very well in 500 lines or less in my typical log files (I tested up to 1 gigabyte) and much faster than using:
Get-Content "C:\Logs\really-huge.log" | Select-Object -last 100
The code meets 95% of my needs but I am sure I can optimize it so it comes close to matching the speed of tail from the Unix distributions I commonly use. It's my first stab at tackling the problem. One interesting part of the code is that I use System.Collections.ArrayList instead of a standard PowerShell array. The reason is since I am reading the file in reverse, I need to return the data back in the proper order. The ArrayList object allows me to insert into the first element so I don't have to re-write the array in the right order after collecting the data. Also I noticed that using System.Convert to covert the bytes to a character instead of using PowerShell's native [char] was faster. Returning large number of lines, it was significant -- about .5 seconds per 100 lines.

I will keep working on this to close that 5% and update this post with a link to an updated blog post in the future with the improvements.

UPDATE: I have rewritten this function in a new blog post and it is lightning fast. This code is deprecated and should only be used for amusement purposes.
Function Read-EndOfFile($fileName,$totalNumberOfLines) {
 $fileStream = New-Object System.IO.FileStream($fileName,[System.IO.FileMode]::Open,[System.IO.FileAccess]::Read,[System.IO.FileShare]::ReadWrite)
 $linesOfText = New-Object System.Collections.ArrayList
 $byteOffset = 1
 $lineOfText = ""
 do {
   $fileStream.Seek((-$byteOffset), [System.IO.SeekOrigin]::End) | Out-Null
  $byte = $fileStream.ReadByte()
  if($byte -eq 13) {
  } elseif($byte -eq 10) {
   $linesOfText.Insert(0, $lineOfText)
   $lineOfText = ""
  } else {
   $lineOfText = [System.Convert]::ToChar($byte) + $lineOfText
  }
  $byteOffset++
 } while ($linesOfText.count -le $totalNumberOfLines)
 $fileStream.Close()
 return $linesOfText
}
#--------------------------------------------------------------------------------------------------#
$fileName = "C:\Logs\really-huge.log" # Your really big log file
$numberOfLines = 100 # Number of lines from the end of the really big log file to return

if([System.IO.File]::Exists($fileName) -and $numberOfLines -gt 0) {
 $lastLines = Read-EndOfFile $fileName $numberOfLines

 foreach($lineOfText in $lastLines) {
  Write-Output $lineOfText
 }
}

No comments:

Post a Comment