Tuesday, March 8, 2011

Speed Up Reading Excel Files in PowerShell

One of the most frustrating aspects of PowerShell for me is the extremely sluggish performance the ComObject is for Excel. I deal with spreadsheets on a daily basis and have processes to read them and compare information. I have spent hours upon hours trying to figure out how to optimize the process of reading cells. One day, I said to myself, "Why can't I read Excel documents at near the speed I can read a text file?" (in more colorful language than this). The proverbial lightbulb went off above my head. That one question lead to the code below. If I can't read Excel files at the speed I want and I can with text files, why not convert the Excel spreadsheet to a text based format? A comma separated value text file!

Using PowerShell to load an Excel spreadsheet is quick. The slowdown is reading the cells/rows. So the solution is to load the Excel spreadsheet and immediately save it in the temp directory as a .csv file. PowerShell has a wonderful commandlet called Import-Csv that will allow you to read in a comma separated value text file as PSObject and associate the first row as a NoteProperty for each column in the spreadsheet. As long as your Excel spreadsheet is straight forward, this works perfectly and blazing fast. I have some processes that are now 20 times faster using this method.
Function Remove-File($fileName) {
 if(Test-Path -path $fileName) { Remove-Item -path $fileName }
}
#--------------------------------------------------------------------------------------------------#
$excelFile = "\\server.ad.mydomain.local\excelreports\myreport.xlsx"
if(Test-Path -path $excelFile) {
 $csvFile = ($env:temp + "\" + ((Get-Item -path $excelFile).name).Replace(((Get-Item -path $excelFile).extension),".csv"))
 
 Remove-File $csvFile
 
 $excelObject = New-Object -ComObject Excel.Application   
 $excelObject.Visible = $false  
 $workbookObject = $excelObject.Workbooks.Open($excelFile) 
 $workbookObject.SaveAs($csvFile,6) # http://msdn.microsoft.com/en-us/library/bb241279.aspx
 $workbookObject.Saved = $true
 $workbookObject.Close()
 [System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbookObject) | Out-Null
 $excelObject.Quit()
 [System.Runtime.Interopservices.Marshal]::ReleaseComObject($excelObject) | Out-Null
 [System.GC]::Collect()
 [System.GC]::WaitForPendingFinalizers() 
 
 $spreadsheetDataObject = Import-Csv -path $csvFile # Use the $spreadsheetDataObject for your analysis
 
 Remove-File $csvFile
}

5 comments:

  1. I want to have PS copy each tab of my excel files into separate .csv files.

    I am not sure if I understand your code completely.

    Can you direct me to I would need to change to acomplished what I described above?

    Thnaks

    ReplyDelete
  2. typo:
    Can you direct me to what I would need to change to acomplish what I described above?

    ReplyDelete
  3. Fascinating! So, if I am uploading information to a server that is simply string information, I suppose I could simply use regex to accomplish the same as an excel macro. I suppose I could also parse the file and then save it as a new excel workbook, and use a macro in conjunction with that workbook. Hmmmm... Of course, an interesting scenario would be taking information from an excel workbook (i.e. copying the workbook), parsing it, and then passing it back to the workbook, a la a series of .csv columns/rows or as a string. Curious! Great post.

    ReplyDelete
  4. S-W-E-E-T !!!!! Works like a charm, and it's simple. AND it cleans up the temp file after itself!

    ReplyDelete
  5. Thanks very much for posting!!
    Ernie :)

    ReplyDelete