How to use powershell to reorder CSV columns

后端 未结 5 2057
感动是毒
感动是毒 2020-12-05 19:54

Input file:

column1;column2;column3
data1a;data2a;data3a
data1b;data2b;data3b

Goal: output file with reordered columns, say



        
相关标签:
5条回答
  • 2020-12-05 20:08

    Edit: Benchmarking info below.

    I would not use the Powershell csv-related cmdlets. I would use either System.IO.StreamReader or Microsoft.VisualBasic.FileIO.TextFieldParser for reading in the file line-by-line to avoid loading the entire thing in memory, and I would use System.IO.StreamWriter to write it back out. The TextFieldParser internally uses a StreamReader, but handles parsing delimited fields so you don't have to, making it very useful if the CSV format is not straightforward (e.g., has delimiter characters in quoted fields).

    I would also not do this in Powershell at all, but rather in a .NET application, as it will be much faster than a Powershell script even if they use the same objects.

    Here's C# for a simple version, assuming no quoted fields and ASCII encoding:

    static void Main(){
        string source = @"D:\test.csv";
        string dest = @"D:\test2.csv";
    
        using ( var reader = new Microsoft.VisualBasic.FileIO.TextFieldParser( source, Encoding.ASCII ) ) {
            using ( var writer = new System.IO.StreamWriter( dest, false, Encoding.ASCII ) ) {
                reader.SetDelimiters( ";" );
                while ( !reader.EndOfData ) {
                    var fields = reader.ReadFields();
                    swap(fields, 1, 2);
                    writer.WriteLine( string.Join( ";", fields ) );
                }
            }
        }
    }
    
    static void swap( string[] arr, int a, int b ) {
        string t = arr[ a ];
        arr[ a ] = arr[ b ];
        arr[ b ] = t;
    }
    

    Here's the Powershell version:

    [void][reflection.assembly]::loadwithpartialname("Microsoft.VisualBasic")
    
    $source = 'D:\test.csv'
    $dest = 'D:\test2.csv'
    
    $reader = new-object Microsoft.VisualBasic.FileIO.TextFieldParser $source
    $writer = new-object System.IO.StreamWriter $dest
    
    function swap($f,$a,$b){ $t = $f[$a]; $f[$a] = $f[$b]; $f[$b] = $t}
    
    $reader.SetDelimiters(';')
    while ( !$reader.EndOfData ) {
        $fields = $reader.ReadFields()
        swap $fields 1 2
        $writer.WriteLine([string]::join(';', $fields))
    }
    
    $reader.close()
    $writer.close()
    

    I benchmarked both of these against a 3-column csv file with 10,000,000 rows. The C# version took 171.132 seconds (just under 3 minutes). The Powershell version took 2,364.995 seconds (39 minutes, 25 seconds).

    Edit: Why mine take so darn long.

    The swap function is a huge bottleneck in my Powershell version. Replacing it with '{0};{1};{2}'-style output like Roman Kuzmin's answer cut it down to less than 9 minutes. Replacing TextFieldParser more than halved the remaining to under 4 minutes.

    However, a .NET console app version of Roman Kuzmin's answer took 20 seconds.

    0 讨论(0)
  • 2020-12-05 20:15

    I'd do it this way:

    $new_csv = new-object system.collections.ArrayList
    get-content mycsv.csv |% {
    $new_csv.add((($_ -split ";")[0,2,1]) -join ";") > $nul
    }
    $new_csv | out-file myreordered.csv
    
    0 讨论(0)
  • 2020-12-05 20:20

    Here is the solution suitable for millions of records (assuming that your data do not have embedded ';')

    $reader = [System.IO.File]::OpenText('data1.csv')
    $writer = New-Object System.IO.StreamWriter 'data2.csv'
    for(;;) {
        $line = $reader.ReadLine()
        if ($null -eq $line) {
            break
        }
        $data = $line.Split(";")
        $writer.WriteLine('{0};{1};{2}', $data[0], $data[2], $data[1])
    }
    $reader.Close()
    $writer.Close()
    
    0 讨论(0)
  • 2020-12-05 20:21
    Import-CSV C:\Path\To\Original.csv | Select-Object Column1, Column3, Column2 | Export-CSV C:\Path\To\Newfile.csv
    
    0 讨论(0)
  • 2020-12-05 20:23

    It's great that people came with their solutions based on pure .NET. However, I would fight for the simplicity, if possible. That's why I upvoted all of you ;)

    Why? I tried to generate 1.000.000 records and store it in CSV and then reorder the columns. Generating the csv was in my case much more demanding then the reordering. Look at the results.

    It took only 1,8 minute to reorder the columns. For me it's pretty decent result. Is it ok for me? -> Yes, I don't need to try to find out quicker solution, it's good enough -> saved my time for some other interesting stuff ;)

    # generate some csv; objects have several properties
    measure-command { 
        1..1mb | 
        % { 
            $date = get-date
            New-Object PsObject -Property @{
                Column1=$date
                Column2=$_
                Column3=$date.Ticks/$_ 
                Hour = $date.Hour
                Minute = $date.Minute
                Second = $date.Second
                ReadableTime = $date.ToLongTimeString()
                ReadableDate = $date.ToLongDateString()
            }} | 
        Export-Csv d:\temp\exported.csv 
    }
    
    TotalMinutes      : 6,100025295
    
    # reorder the columns
    measure-command { 
        Import-Csv d:\temp\exported.csv | 
            Select ReadableTime, ReadableDate, Hour, Minute, Second, Column1, Column2, Column3 | 
            Export-Csv d:\temp\exported2.csv 
    }
    
    TotalMinutes      : 2,33151559833333
    
    0 讨论(0)
提交回复
热议问题