Merge two CSV files while adding new and overwriting existing entries

吃可爱长大的小学妹 提交于 2019-12-24 07:16:02

问题


I have a configuration.csv that holds a template data like this:

| path       | item  | value  | type |
|------------|-------|--------|------|
| some/path  | item1 | value1 | ALL  |
| some/path  | item2 | UPDATE | ALL  |
| other/path | item1 | value2 | SOME |

and customization.csv that has service specific configuration:

| path       | item  | value  | type |
|------------|-------|--------|------|
| some/path  | item2 | value3 | ALL  |
| new/path   | item3 | value3 | SOME |

My goal is to merge them and end up with something like this:

| path       | item  | value  | type |
|------------|-------|--------|------|
| some/path  | item1 | value1 | ALL  |
| some/path  | item2 | value3 | ALL  |
| other/path | item1 | value2 | SOME |
| new/path   | item3 | value3 | SOME |

This should add any new entries and update any existing ones. No one column can be used for unique identification - both path and item needs to be combined, as they are guaranteed to be unique.


回答1:


I suggest to use Compare-Object and as the values from customization.csv shall persist use this files values as -ReferenceObject

## Q:\Test\2019\03\01\SO_54948111.ps1

$conf = Import-Csv '.\configuration.csv'
$cust = Import-Csv '.\customization.csv'

$NewData = Compare-Object -ref $cust -diff $conf -Property path,item -PassThru -IncludeEqual|
    Select-Object -Property * -ExcludeProperty SideIndicator

$NewData
$NewData |Export-Csv '.\NewData.csv' -NoTypeInformation

Sample output

> Q:\Test\2019\03\01\SO_54948111.ps1

path       item  value  type
----       ----  -----  ----
some/path  item2 value3 ALL
some/path  item1 value1 ALL
other/path item1 value2 SOME
new/path   item3 value3 SOME



回答2:


After a lot of searching, I figured the easiest way to manipulate the entries without recreating the managing framework will be through hashtable. During the process I had to account for two edge cases:

  1. additional commas in the values
  2. empty values

The final solution I got is this:

$configuration = Import-Csv .\configuration.csv
$customization = Import-Csv .\customization.csv
$merged = New-Object System.Collections.ArrayList
$hashTable = @{}

#initializing the hashTable with the defaults
foreach ($entry in $configuration)
{
    $hashTable[$entry.path + ',' + $entry.item] = $entry.value + ',' + $entry.type
}

#updating the hashTable with customization that add or overwrite existing entries
foreach ($entry in $customization)
{
    $hashTable[$entry.path + ',' + $entry.item] = $entry.value + ',' + $entry.type
}

#the regex handles multiple commas and empty values.
#It returns an empty string before and after group so we start from 1 
foreach ($key in $hashTable.keys)
{
    $psobject = [PSCustomObject]@{
        path  = ($key -split '(.*),(.*)')[1]
        item  = ($key -split '(.*),(.*)')[2]
        value = ($hashTable[$key] -split '(.*),(.*)')[1]
        type  = ($hashTable[$key] -split '(.*),(.*)')[2]
    }
    [void] $merged.Add($psobject)
}
Write-Output $merged

Once imported, I transform the configuration.csv into hashTable with keys comprised of path and value. I then do the same with customization.csv using the same hashTable which overwrites any existing key values or add them as new.

The third loop converts the hashTable to PSCustomObject similar to what Import-Csv does. I split each of the key and value attributes while accounting for multiple commas and also empty values.
NOTE: the regex will split on the last occurrence of the separator (here it's comma, but you can select anything, really). If you want to split on the first, you can use (.*?),(.*). In my case only the value column could contain an instance of the separator.

If the CSV had a unique column, then a solution similar to this answer could've been used.

Another alternative is to set the keys to be the sum of all columns, and this will filter out any duplicates in the CSV, but the splitting can get tricky, depending on the values in the columns.




回答3:


Your idea 'using the same hashTable which overwrites any existing key values or add them as new.' will only work if the path, item is unique on each side as you will also overwrite any duplicates... Consider this Join-Object cmdlet.

$configuration = ConvertFrom-SourceTable '

| path       | item  | value  | type |
|------------|-------|--------|------|
| some/path  | item1 | value1 | ALL  |
| some/path  | item2 | UPDATE | ALL  |
| other/path | item1 | value2 | SOME |
| other/path | item1 | value3 | ALL  |
'

$customization= ConvertFrom-SourceTable '

| path       | item  | value  | type |
|------------|-------|--------|------|
| some/path  | item2 | value3 | ALL  |
| new/path   | item3 | value3 | SOME |
| new/path   | item3 | value4 | ALL  |
'

Using the Merge-Object, alias Merge, proxy command (see help):

$configuration | Merge $customization -on path, item

path       item  value  type
----       ----  -----  ----
some/path  item1 value1 ALL
some/path  item2 value3 ALL
other/path item1 value2 SOME
other/path item1 value3 ALL
new/path   item3 value3 SOME
new/path   item3 value4 ALL


来源:https://stackoverflow.com/questions/54948111/merge-two-csv-files-while-adding-new-and-overwriting-existing-entries

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!