Input file:
column1;column2;column3
data1a;data2a;data3a
data1b;data2b;data3b
Goal: output file with reordered columns, say
Edit: Benchmarking info below.
I would not use the Powershell csv-related cmdlets. I would use either System.IO.StreamReader
or Microsoft.VisualBasic.FileIO.TextFieldParser
for reading in the file line-by-line to avoid loading the entire thing in memory, and I would use System.IO.StreamWriter
to write it back out. The TextFieldParser
internally uses a StreamReader
, but handles parsing delimited fields so you don't have to, making it very useful if the CSV format is not straightforward (e.g., has delimiter characters in quoted fields).
I would also not do this in Powershell at all, but rather in a .NET application, as it will be much faster than a Powershell script even if they use the same objects.
Here's C# for a simple version, assuming no quoted fields and ASCII encoding:
static void Main(){
string source = @"D:\test.csv";
string dest = @"D:\test2.csv";
using ( var reader = new Microsoft.VisualBasic.FileIO.TextFieldParser( source, Encoding.ASCII ) ) {
using ( var writer = new System.IO.StreamWriter( dest, false, Encoding.ASCII ) ) {
reader.SetDelimiters( ";" );
while ( !reader.EndOfData ) {
var fields = reader.ReadFields();
swap(fields, 1, 2);
writer.WriteLine( string.Join( ";", fields ) );
}
}
}
}
static void swap( string[] arr, int a, int b ) {
string t = arr[ a ];
arr[ a ] = arr[ b ];
arr[ b ] = t;
}
Here's the Powershell version:
[void][reflection.assembly]::loadwithpartialname("Microsoft.VisualBasic")
$source = 'D:\test.csv'
$dest = 'D:\test2.csv'
$reader = new-object Microsoft.VisualBasic.FileIO.TextFieldParser $source
$writer = new-object System.IO.StreamWriter $dest
function swap($f,$a,$b){ $t = $f[$a]; $f[$a] = $f[$b]; $f[$b] = $t}
$reader.SetDelimiters(';')
while ( !$reader.EndOfData ) {
$fields = $reader.ReadFields()
swap $fields 1 2
$writer.WriteLine([string]::join(';', $fields))
}
$reader.close()
$writer.close()
I benchmarked both of these against a 3-column csv file with 10,000,000 rows. The C# version took 171.132 seconds (just under 3 minutes). The Powershell version took 2,364.995 seconds (39 minutes, 25 seconds).
Edit: Why mine take so darn long.
The swap function is a huge bottleneck in my Powershell version. Replacing it with '{0};{1};{2}'
-style output like Roman Kuzmin's answer cut it down to less than 9 minutes. Replacing TextFieldParser
more than halved the remaining to under 4 minutes.
However, a .NET console app version of Roman Kuzmin's answer took 20 seconds.
I'd do it this way:
$new_csv = new-object system.collections.ArrayList
get-content mycsv.csv |% {
$new_csv.add((($_ -split ";")[0,2,1]) -join ";") > $nul
}
$new_csv | out-file myreordered.csv
Here is the solution suitable for millions of records (assuming that your data do not have embedded ';')
$reader = [System.IO.File]::OpenText('data1.csv')
$writer = New-Object System.IO.StreamWriter 'data2.csv'
for(;;) {
$line = $reader.ReadLine()
if ($null -eq $line) {
break
}
$data = $line.Split(";")
$writer.WriteLine('{0};{1};{2}', $data[0], $data[2], $data[1])
}
$reader.Close()
$writer.Close()
Import-CSV C:\Path\To\Original.csv | Select-Object Column1, Column3, Column2 | Export-CSV C:\Path\To\Newfile.csv
It's great that people came with their solutions based on pure .NET. However, I would fight for the simplicity, if possible. That's why I upvoted all of you ;)
Why? I tried to generate 1.000.000 records and store it in CSV and then reorder the columns. Generating the csv was in my case much more demanding then the reordering. Look at the results.
It took only 1,8 minute to reorder the columns. For me it's pretty decent result. Is it ok for me? -> Yes, I don't need to try to find out quicker solution, it's good enough -> saved my time for some other interesting stuff ;)
# generate some csv; objects have several properties
measure-command {
1..1mb |
% {
$date = get-date
New-Object PsObject -Property @{
Column1=$date
Column2=$_
Column3=$date.Ticks/$_
Hour = $date.Hour
Minute = $date.Minute
Second = $date.Second
ReadableTime = $date.ToLongTimeString()
ReadableDate = $date.ToLongDateString()
}} |
Export-Csv d:\temp\exported.csv
}
TotalMinutes : 6,100025295
# reorder the columns
measure-command {
Import-Csv d:\temp\exported.csv |
Select ReadableTime, ReadableDate, Hour, Minute, Second, Column1, Column2, Column3 |
Export-Csv d:\temp\exported2.csv
}
TotalMinutes : 2,33151559833333