问题
I have a configuration.csv
that holds a template data like this:
| path | item | value | type |
|------------|-------|--------|------|
| some/path | item1 | value1 | ALL |
| some/path | item2 | UPDATE | ALL |
| other/path | item1 | value2 | SOME |
and customization.csv
that has service specific configuration:
| path | item | value | type |
|------------|-------|--------|------|
| some/path | item2 | value3 | ALL |
| new/path | item3 | value3 | SOME |
My goal is to merge them and end up with something like this:
| path | item | value | type |
|------------|-------|--------|------|
| some/path | item1 | value1 | ALL |
| some/path | item2 | value3 | ALL |
| other/path | item1 | value2 | SOME |
| new/path | item3 | value3 | SOME |
This should add any new entries and update any existing ones. No one column can be used for unique identification - both path
and item
needs to be combined, as they are guaranteed to be unique.
回答1:
I suggest to use Compare-Object
and as the values from customization.csv
shall persist use this files values as -ReferenceObject
## Q:\Test\2019\03\01\SO_54948111.ps1
$conf = Import-Csv '.\configuration.csv'
$cust = Import-Csv '.\customization.csv'
$NewData = Compare-Object -ref $cust -diff $conf -Property path,item -PassThru -IncludeEqual|
Select-Object -Property * -ExcludeProperty SideIndicator
$NewData
$NewData |Export-Csv '.\NewData.csv' -NoTypeInformation
Sample output
> Q:\Test\2019\03\01\SO_54948111.ps1
path item value type
---- ---- ----- ----
some/path item2 value3 ALL
some/path item1 value1 ALL
other/path item1 value2 SOME
new/path item3 value3 SOME
回答2:
After a lot of searching, I figured the easiest way to manipulate the entries without recreating the managing framework will be through hashtable. During the process I had to account for two edge cases:
- additional commas in the values
- empty values
The final solution I got is this:
$configuration = Import-Csv .\configuration.csv
$customization = Import-Csv .\customization.csv
$merged = New-Object System.Collections.ArrayList
$hashTable = @{}
#initializing the hashTable with the defaults
foreach ($entry in $configuration)
{
$hashTable[$entry.path + ',' + $entry.item] = $entry.value + ',' + $entry.type
}
#updating the hashTable with customization that add or overwrite existing entries
foreach ($entry in $customization)
{
$hashTable[$entry.path + ',' + $entry.item] = $entry.value + ',' + $entry.type
}
#the regex handles multiple commas and empty values.
#It returns an empty string before and after group so we start from 1
foreach ($key in $hashTable.keys)
{
$psobject = [PSCustomObject]@{
path = ($key -split '(.*),(.*)')[1]
item = ($key -split '(.*),(.*)')[2]
value = ($hashTable[$key] -split '(.*),(.*)')[1]
type = ($hashTable[$key] -split '(.*),(.*)')[2]
}
[void] $merged.Add($psobject)
}
Write-Output $merged
Once imported, I transform the configuration.csv
into hashTable with keys comprised of path
and value
. I then do the same with customization.csv
using the same hashTable which overwrites any existing key
values or add them as new.
The third loop converts the hashTable to PSCustomObject
similar to what Import-Csv
does. I split each of the key
and value
attributes while accounting for multiple commas and also empty values.
NOTE: the regex will split on the last occurrence of the separator (here it's comma, but you can select anything, really). If you want to split on the first, you can use (.*?),(.*)
. In my case only the value
column could contain an instance of the separator.
If the CSV had a unique column, then a solution similar to this answer could've been used.
Another alternative is to set the keys to be the sum of all columns, and this will filter out any duplicates in the CSV, but the splitting can get tricky, depending on the values in the columns.
回答3:
Your idea 'using the same hashTable which overwrites any existing key values or add them as new.' will only work if the path, item
is unique on each side as you will also overwrite any duplicates...
Consider this Join-Object cmdlet.
$configuration =
ConvertFrom-SourceTable '
| path | item | value | type |
|------------|-------|--------|------|
| some/path | item1 | value1 | ALL |
| some/path | item2 | UPDATE | ALL |
| other/path | item1 | value2 | SOME |
| other/path | item1 | value3 | ALL |
'
$customization=
ConvertFrom-SourceTable '
| path | item | value | type |
|------------|-------|--------|------|
| some/path | item2 | value3 | ALL |
| new/path | item3 | value3 | SOME |
| new/path | item3 | value4 | ALL |
'
Using the Merge-Object
, alias Merge
, proxy command (see help):
$configuration | Merge $customization -on path, item
path item value type
---- ---- ----- ----
some/path item1 value1 ALL
some/path item2 value3 ALL
other/path item1 value2 SOME
other/path item1 value3 ALL
new/path item3 value3 SOME
new/path item3 value4 ALL
来源:https://stackoverflow.com/questions/54948111/merge-two-csv-files-while-adding-new-and-overwriting-existing-entries