Does anyone know any unix commands/perl script that would insert a specific character (that can be entered as either hex (ie 7C) or as the actual character (ie |)) in the po
use File::Slurp qw(read_file);
my ($from, $to, $every, $fname) = @ARGV;
my $counter = 0;
my $in = read_file $fname;
my $out = $in;
# copy is important because pos magic attached to $in resets with substr
while ($in =~ /\Q$from/gms) {
$counter++;
substr $out, pos($in)-1, length($from), $to unless $counter % $every;
};
print $out;
If the $from
and $to
parameters have different length, you still need to mess a bit with the second parameter of substr
to make it work correctly.
Small perl hack to solve the problem. Using the index
function to find the commas, modulus to replace the right one, and substr
to perform the replacement.
use strict;
use warnings;
while (<>) {
my $x=index($_,",");
my $i = 0;
while ($x != -1) {
$i++;
unless ($i % 3) {
$_ = substr($_,0,$x) ."|". substr($_,$x+1);
}
$x = index($_,",",$x + 1)
}
print;
}
Run with perl script.pl file.csv
.
Note: You can place the declaration my $i
before the while(<>)
loop in order to do a global count, instead of a separate count for each line. Not quite sure I understood your question in that regard.
This processes the input file one line at a time (no slurping :)
For hex input, just pass '\x7C'
or whatever, as $1
#!/bin/bash
b="${1:-,}" # the "before" field delimiter
n="${2:-3}" # the number of fields in a group
a="${3:-|}"; [[ $a == [\|] ]] && a='\|' # the "after" group delimiter
sed -nr "x;G; /(([^$b]+$b){$((n-1))}[^$b]+)$b/{s//\1$a/g}
s/.*\n//; h; /.*$a/{s///; x}; p" input_file
Here it is again, with some comments.
sed -nr "x;G # pat = hold + pat
/(([^$b]+$b){$((n-1))}[^$b]+)$b/{s//\1$a/g}
s/.*\n// # del fields from prev line
h # hold = mod*\n
/.*$a/{ s/// # pat = unmodified
x # hold = unmodified, pat = mod*\n
}
p # print line" input_file
I have an idea in bash script :
perl -pe 's/,/(++$n % 3 == 0) ? "|" : $&/ge' data.txt
That will do the trick.
How about a nice, simple awk
one-liner?
awk -v RS=, '{ORS=(++i%3?",":"|");print}' file.csv
One minor bug just occurred to me: it will print a ,
or |
as the very last character. To avoid this, we need to alter it slightly:
awk -v RS=, '{ORS=(++i%3?",":"|");print}END{print ""}' file.csv | sed '$d'
# Get params and create part of the regex.
my $delim = "\\" . shift;
my $n = shift;
my $repl = shift;
my $wild = '.*?';
my $pattern = ($wild . $delim) x ($n - 1);
# Slurp.
$/ = undef;
my $text = <>;
# Replace and print.
$text =~ s/($pattern$wild)$delim/$1$repl/sg;
print $text;