问题
I have a file that contains CampaignNames and IDs. The two fields are separated by a pipe |
. The IDs are separated by a space. I want to find all rows in a file (thorpe þ
delimited) that contain the IDs, and output those rows into separate files per name. This file is usually 4-7 GB, sometimes larger.
campaigns.txt
:
Name|NameID FirstName|123 212 445 39 SecondName|313 939 ThirdName|219
Data ID File:
DateþIDþCode 10-22-14þ123þAbc 10-24-16þ212þPow 09-18-15þ219
So I would want 3 files created. FirstName.txt
contains 2 rows. SecondName.txt
contains 0 rows. ThirdName.txt
contains 1 row.
I cobbled together some code from various sources and came up with this. However, I'm wondering if there's a better way than having to read through the data file multiple times. Any thoughts out there?
$campaigns = Import-Csv "campaigns.txt" -Delimiter "|"
$datafile = "5282_10-19-2016"
$encoding = [Text.Encoding]::GetEncoding('iso-8859-1')
echo "Starting.."
Get-Date -Format g
foreach ($campaign in $campaigns) {
$campaignname = $campaign.CampaignName
$campaignids = $campaign.CampaignID.split(" ")
echo "Looking for $campaignname - $campaignids"
$writer = New-Object System.IO.StreamWriter($campaignname + "_filtered.txt")
foreach ($campaignid in $campaignids) {
$datareader = New-Object System.IO.StreamReader($datafile, $encoding)
while ($dataline = $datareader.ReadLine()) {
if ($dataline -match $campaignid) {
$data = $dataline.Split("þ")
$writer.WriteLine('{0}|{1}|{2}|{3}|{4}|{5}|{6}|{7}', $data[0], $data[3], $data[5], $data[8], $data[12], $data[14], $data[19], $data[20])
}
}
}
$writer.Close()
}
echo "Done!"
Get-Date -Format g
回答1:
Process the huge datafile just once.
Pick the campaign names from a hashtable built from campaign.txt.
Assuming there are not many campaigns (say, less than 1000) write to as many StreamWriters.
$campaignByID = @{}
foreach ($c in (Import-Csv 'campaigns.txt' -Delimiter '|')) {
foreach ($id in ($c.CampaignID -split ' ')) {
$campaignByID[$id] = $c.CampaignName
}
}
$campaignWriters = @{}
$datareader = New-Object IO.StreamReader($datafile, $encoding)
while (!$datareader.EndOfStream) {
$data = $datareader.ReadLine().Split('þ')
$campaignName = $campaignByID[$data[1]]
if ($campaignName) {
$writer = $campaignWriters[$campaignName]
if (!$writer) {
$writer = $campaignWriters[$campaignName] =
New-Object IO.StreamWriter($campaignName + '_filtered.txt')
}
$writer.WriteLine(($data[0,3,5,8,12,14,19,20] -join '|'))
}
}
$datareader.Close()
foreach ($writer in $campaignWriters.Values) {
$writer.Close()
}
To display progress use Write-Progress
based on $datareader.BaseStream.Position / $datareader.BaseStream.Length * 100
but don't do it for every datafile line because it'll slow down the processing, do it every 1 second, for example, using a datetime variable: update it when a second has elapsed and display the progress.
回答2:
try this ;)
$campaigns=import-csv C:\temp\campaigns.txt -Delimiter "|"
$datafile=import-csv C:\temp\5282_10-19-2016.txt -Delimiter "þ" -Encoding Default
$DirResult="C:\temp\root"
$campaigns | %{ foreach ($item in ($_.NameID.Split(" "))) {New-Object PSObject -Property @{ Name=$_.Name ; ValID=$item} } } | %{ $datafile | where id -eq $_.ValID | export-csv -Append -Delimiter "|" -Path ("$dirresult\" + $_.ValID + "_filtered.txt") -NoTypeInformation }
来源:https://stackoverflow.com/questions/40198906/find-strings-in-one-file-in-another-and-output-certain-columns