Quickest way to organize categorized data in a text file and convert to CSV

前端 未结 5 1071
粉色の甜心
粉色の甜心 2021-01-24 03:25

I have a text file with hundreds of rows. Data fields and values separated by a colon and one empty line separating each data set. It looks something like this...

ico         


        
相关标签:
5条回答
  • 2021-01-24 04:09

    The simplest approach would be to split your data at 2 consecutive newlines and convert the data chunks into hashtables via ConvertFrom-StringData (you must also replace : with = for that to work). The hashtables can then be converted to custom objects and exported to a CSV.

    $data = Get-Content 'C:\path\to\input.txt' -Raw
    
    $data -replace ':', '=' -split '\r?\n\r?\n' | ForEach-Object {
        [PSCustomObject]($_ | ConvertFrom-StringData)
    } | Export-Csv 'C:\path\to\output.csv' -NoType
    

    Note that the above requires PowerShell v3 or newer. For older PowerShell versions you need to adjust the code as below:

    $data = Get-Content 'C:\path\to\input.txt' | Out-String
    
    $data -replace ':', '=' -split '\r?\n\r?\n' | ForEach-Object {
        $prop = $_ | ConvertFrom-StringData
        New-Object -Type PSObject -Property $prop
    } | Export-Csv 'C:\path\to\output.csv' -NoType
    

    If you want the fields of the CSV in a particular order you can put a Select-Object between the ForEach-Object and Export-Csv:

    ... | ForEach-Object {
        ...
    } | Select-Object icon, temperatureHigh, ... | Export-Csv ...
    

    Import-Csv expects the input data organized as one dataset per row. It cannot be used for blocks of key:value pairs like your input data has.

    ConvertTo-Csv requires the same preparation as Export-Csv in the sample code above. The only difference is that the output isn't written to a file.

    0 讨论(0)
  • 2021-01-24 04:20

    A way to do what you want in simple, and hopefully clear code. I did not use sophisticated PS objects, methods or functions so it that it is clear and simple. The input is expected to be in a text file called in1.txt. I assume that each set of date has at most 7 lines (before a space or end-of-file is encountered). I did not make it generic or include error checking, etc. Needless to say, there are many other ways you can do this. If you have any comments let me know.

    #======================
    # Function used by code
    #======================
    
    Function func-PrintSet
    {
    
     $s1=''
     $del= ','
     $q='"'
     foreach ($element in $arr1) {
         $s1=$s1+$q+$element+$q + $del 
     }
     $s1
    
     $s1=""
     foreach ($element in $arr2) {
         $s1=$s1+$q+$element+$q +  $del 
     }
     $s1
    
    }
    
    #=====================
    # Main code
    #=====================
    
    # simple initialization of arrays.
    
    $arr1=0,0,0,0,0,0,0
    $arr2=0,0,0,0,0,0,0
    $i=-1
    $reader = [System.IO.File]::OpenText("in1.txt")
    while ($null -ne ($line = $reader.ReadLine())) 
    {
        IF ($line)
        {
    
             $items = $line.split(':')
             $i=$i+1
             $arr1[$i]= $items[0]
             $arr2[$i]= $items[1]
        }
        ELSE
        {
    
            func-PrintSet   
            $i=-1
        }
    }
    func-PrintSet
    
    "Done :)"
    
    # Code end
    
    0 讨论(0)
  • 2021-01-24 04:22

    regex is the way to go:

    $data = @'
    icon:rain
    temperatureHigh:55.37
    temperatureLow:42.55
    humidity:0.97
    windSpeed:6.7
    precipType:rain
    precipProbability:0.97
    
    icon:partly-cloudy-day
    temperatureHigh:34.75
    temperatureLow:27.1
    humidity:0.8
    windSpeed:15.32
    precipType:snow
    precipProbability:0.29
    
    icon:clear-day
    temperatureHigh:47
    temperatureLow:31.72
    humidity:0.64
    windSpeed:9.27
    precipType:rain
    precipProbability:0.01
    
    '@
    
    $head = $data
    $head = $head -replace '([^\s]+):([^\s]+)', '"$1",'
    $head = $head -replace '\n\n', '::'
    $head = $head -replace '\n', ''
    $head = $head -replace '(.*?)::.*', '$1'
    $head = $head -replace ',\s*$', ''
    $head
    
    $rows = $data
    $rows = $rows -replace '([^\s]+):([^\s]+)', '"$2",'
    $rows = $rows -replace '\n\n', '::'
    $rows = $rows -replace '\n', ''
    $rows = $rows + "::"
    $rows = $rows -replace '::', "`n"
    $rows = $rows -replace ',\s*\n', "`n"
    $rows
    

    Output:

    "icon","temperatureHigh","temperatureLow","humidity","windSpeed","precipType","precipProbability"
    "rain","55.37","42.55","0.97","6.7","rain","0.97"
    "partly-cloudy-day","34.75","27.1","0.8","15.32","snow","0.29"
    "clear-day","47","31.72","0.64","9.27","rain","0.01"
    
    0 讨论(0)
  • 2021-01-24 04:24

    Try this:

    $CurrentElement=[pscustomobject]@{}
    
    #get all rows and add element list when row empty is founded
    Get-Content "c:\temp\test.txt" | %{
    
        if ($_ -eq "")
        {
            $CurrentElement
            $CurrentElement=[pscustomobject]@{}
        }
        else
        {
           $Row=$_.split(':')
           Add-Member -InputObject $CurrentElement -MemberType NoteProperty -Name $Row[0] -Value $Row[1]
        }
    
    }  | export-csv "c:\temp\result.csv" -notype
    
    $CurrentElement  | export-csv "c:\temp\result.csv" -notype -Append
    
    0 讨论(0)
  • 2021-01-24 04:24

    here's another way to do the job with a combo of simple regex patterns and string operators.

    $InStuff = @'
    column1:value1
    column2:value2
    column3:value3
    column4:value4
    column5:value5
    
    column1:value6
    column2:value7
    column3:value8 
    column4:value9
    column5:value10
    
    column1:value11 
    column2:value12
    column3:value13 
    column4:value14
    column5:value15
    '@
    
    
    $SplitInStuff = $InStuff -split ([environment]::NewLine * 2)
    
    $HeaderLine = ($SplitInStuff[0] -replace '(?m):.+$').Split([environment]::NewLine) -join ', '
    
    $CSV_Text = [System.Collections.Generic.List[string]]::new()
    $CSV_Text.Add($HeaderLine)
    
    foreach ($SIS_Item in $SplitInStuff)
        {
        $CSV_Text.Add(($SIS_Item  -replace '(?m)^.+:').Split([environment]::NewLine).Where({$_}) -join ', ')
        }
    
    $Results = $CSV_Text |
        ConvertFrom-Csv
    
    # on screen
    $Results |
        Format-Table
    
    # to CSV
    $Results |
        Export-Csv -LiteralPath "$env:TEMP\JohnnyCarino_ReformatedData.csv" -NoTypeInformation
    

    output ...

    column1  column2 column3  column4 column5
    -------  ------- -------  ------- -------
    value1   value2  value3   value4  value5 
    value6   value7  value8   value9  value10
    value11  value12 value13  value14 value15
    

    CSV file content ...

    "column1","column2","column3","column4","column5"
    "value1","value2","value3","value4","value5"
    "value6","value7","value8 ","value9","value10"
    "value11 ","value12","value13 ","value14","value15"
    
    0 讨论(0)
提交回复
热议问题