I have a huge flat file 100K records each spanning 3000 columns. I need to removed a segment of the data fay starting position 300 to position 500 before archiving. This is sens
Assuming that position means column, you can use cut
to select the columns you want.
cut -f 1-299,501-3000 CutMe.txt
If your data is delimited by commas instead of tabs, then use -d
.
cut -d, -f 1-299,501-3000 CutMe.txt
If position means character, you can do the same with cut -c
.
cut -c 1-299,501-3000 CutMe.txt
Using sed
:
sed -r -i.bak 's/(.{299}).{200}/\1/' file
The -r
option enables extended regex. If you need to make it portable you can remove that option by escaping braces and curlies. The -i
option makes changes in-places. I have put an extension .bak
to safeguard from any mess up. You can remove it if you don't need to maintain the backup of original.
For solution, we just capture the first 299 characters in a capture group and seek the next 200 we need to remove. We substitute this entire line with our captured group.
Assuming "position" means "character":
awk '{print substr($0,1,299) substr($0,501)}' file
If it doesn't then edit your question to add some REPRESENTATIVE sample input and expected output (e.g. 5 lines of 6 columns each, not thousands of lines of thousands of columns).