I am trying to use a shell script (well a \"one liner\") to find any common lines between around 50 files. Edit: Note I am looking for a line (lines) that a
2 * n
files)From @mjgpy3 answer, you just have to make a for loop and use comm
, like this:
#!/bin/bash
tmp1="/tmp/tmp1$RANDOM"
tmp2="/tmp/tmp2$RANDOM"
cp "$1" "$tmp1"
shift
for file in "$@"
do
comm -1 -2 "$tmp1" "$file" > "$tmp2"
mv "$tmp2" "$tmp1"
done
cat "$tmp1"
rm "$tmp1"
Save in a comm.sh
, make it executable, and call
./comm.sh *.sp
assuming all your filenames end with .sp
.
Looking at the other answers, I wanted to give one that opens once each file without using any temporary file, and supports duplicated lines. Additionally, let's process the files in parallel.
Here you go (in python3):
#!/bin/env python
import argparse
import sys
import multiprocessing
import os
EOLS = {'native': os.linesep.encode('ascii'), 'unix': b'\n', 'windows': b'\r\n'}
def extract_set(filename):
with open(filename, 'rb') as f:
return set(line.rstrip(b'\r\n') for line in f)
def find_common_lines(filenames):
pool = multiprocessing.Pool()
line_sets = pool.map(extract_set, filenames)
return set.intersection(*line_sets)
if __name__ == '__main__':
# usage info and argument parsing
parser = argparse.ArgumentParser()
parser.add_argument("in_files", nargs='+',
help="find common lines in these files")
parser.add_argument('--out', type=argparse.FileType('wb'),
help="the output file (default stdout)")
parser.add_argument('--eol-style', choices=EOLS.keys(), default='native',
help="(default: native)")
args = parser.parse_args()
# actual stuff
common_lines = find_common_lines(args.in_files)
# write results to output
to_print = EOLS[args.eol_style].join(common_lines)
if args.out is None:
# find out stdout's encoding, utf-8 if absent
encoding = sys.stdout.encoding or 'utf-8'
sys.stdout.write(to_print.decode(encoding))
else:
args.out.write(to_print)
Save it into a find_common_lines.py
, and call
python ./find_common_lines.py *.sp
More usage info with the --help
option.