问题
I have some data in tab delimited form that gives the result of device identification from user-agents (UAs). but there are several rows where the devices are wrongly identified and I need to change them to the correct ones.
For instance there are cases when and iphone or htc wildfire UA is identified as another phone. So for there cases I need to update the device information with the correct device by searching for certain keywords in the UA. for example,
781 Mozilla/5.0 (Linux; U; Android 2.1-update1; fi-fi; HTC_Wildfire_A3333 Build/ERE27) AppleWebKit/530.17 (KHTML, like Gecko) Version/4.0 Mobile Safari/530.17 htc_wildfire_ver1_suba3333 HTC Wildfire Android
this is correct but a similar case is wrong
775 Mozilla/5.0 (Linux; U; Android 2.1-update1; fi-fi; HTC Wildfire Build/ERE27) AppleWebKit/525.10+ (KHTML, like Gecko) Version/3.0.4 Mobile Safari/523.12.2 (AdMob-ANDROID-20100709) T-Mobile Pulse Android
So, I have to do something like this. I know that if the UA column contains the term HTC and Wildfire it is that phone. So, I want to look for all the UAs that have the strings HTC and Wildfire but the columns 3 and 4 (manufucturer and model) are wrong and then update them with the correct device information from row 781 which I know is correct. I would manually put in the code that row 781 is correct and if the device is not correctly identified I would put the info from column 3 onwards of row 781 for all these cases.
Of course this is one case and there are several cases like this and I would repeat the same logic for each of them. Also there are other columns besides these four that I've not shown.
how would i accomplish this in a perl script (preferably, but a bash solution is also ok).
回答1:
- Create a file (devices) with all distinct (UA, Manufacturer, Model) triples by looping over the input file, storing the triple as keys in a hash; write sorted keys into devices
- Manually edit devices (delete 'wrong' lines)
- Load devices into a hash, use UA as key, (Manufacturer, Model) as value. Loop over the input file, use UA field of current line to lookup the device, change both fields using the good value from the hash (if necessary).
my @Log = (
[ 'HTC', 'badModelHTC' ]
, [ 'ABC', 'badModelABC' ]
, [ 'HTC', 'goodModelHTC' ]
, [ 'ABC', 'badModelABC' ]
, [ 'ABC', 'goodModelABC' ]
, [ 'HTC', 'goodModelHTC' ]
, [ 'ABC', 'badModelABC' ]
);
my %Devs;
printf "----------- Log org\n";
for (@Log) {
printf "%s %s\n", @{$_};
my $key = join '-', @{$_};
$Devs{ $key } = $_->[ 1 ];
}
printf "----------- Devs org\n";
for (sort( keys( %Devs ) )) {
printf "%s => %s\n", $_, $Devs{ $_ };
if (/bad/) {
delete $Devs{ $_ }; # fake manual removal
}
}
# fake manual shortening of keys
my %Tmp = %Devs;
%Devs = ();
for (keys %Tmp) {
$Devs{ (split( /-/, $_))[ 0 ] } = $Tmp{ $_ };
}
printf "----------- Devs corrected\n";
for (sort( keys( %Devs ) )) {
printf "%s => %s\n", $_, $Devs{ $_ };
}
printf "----------- Log corrected\n";
for (@Log) {
$_->[ 1 ] = $Devs{ $_->[ 0 ] };
printf "%s %s\n", @{$_};
}
output:
----------- Log org
HTC badModelHTC
ABC badModelABC
HTC goodModelHTC
ABC badModelABC
ABC goodModelABC
HTC goodModelHTC
ABC badModelABC
----------- Devs org
ABC-badModelABC => badModelABC
ABC-goodModelABC => goodModelABC
HTC-badModelHTC => badModelHTC
HTC-goodModelHTC => goodModelHTC
----------- Devs corrected
ABC => goodModelABC
HTC => goodModelHTC
----------- Log corrected
HTC goodModelHTC
ABC goodModelABC
HTC goodModelHTC
ABC goodModelABC
ABC goodModelABC
HTC goodModelHTC
ABC goodModelABC
来源:https://stackoverflow.com/questions/4931783/updating-a-row-in-a-data-file-with-values-from-another-row