Comparing two files in linux terminal

后端未结

关注

 10  1139

There are two files called \"a.txt\" and \"b.txt\" both have a list of words. Now I want to check which words are extra in \"a.txt\

相关标签:

10条回答

长情又很酷

2020-12-22 17:42

You can also use: colordiff: Displays the output of diff with colors.

About vimdiff: It allows you to compare files via SSH, for example :

vimdiff /var/log/secure scp://192.168.1.25/var/log/secure

Extracted from: http://www.sysadmit.com/2016/05/linux-diferencias-entre-dos-archivos.html

0 讨论(0)

发布评论:

提交评论

加载中...

时光取名叫无心

2020-12-22 17:43

Use comm -13 (requires sorted files):

$ cat file1 one two three $ cat file2 one two three four $ comm -13 <(sort file1) <(sort file2) four

0 讨论(0)

发布评论:

提交评论

加载中...

我在风中等你

2020-12-22 17:43

Using awk for it. Test files:

$ cat a.txt one two three four four $ cat b.txt three two one

The awk:

$ awk ' NR==FNR { # process b.txt or the first file seen[$0] # hash words to hash seen next # next word in b.txt } # process a.txt or all files after the first !($0 in seen)' b.txt a.txt # if word is not hashed to seen, output it

Duplicates are outputed:

four four

To avoid duplicates, add each newly met word in a.txt to seen hash:

$ awk ' NR==FNR { seen[$0] next } !($0 in seen) { # if word is not hashed to seen seen[$0] # hash unseen a.txt words to seen to avoid duplicates print # and output it }' b.txt a.txt

Output:

four

If the word lists are comma-separated, like:

$ cat a.txt four,four,three,three,two,one five,six $ cat b.txt one,two,three

you have to do a couple of extra laps (forloops):

awk -F, ' # comma-separated input NR==FNR { for(i=1;i<=NF;i++) # loop all comma-separated fields seen[$i] next } { for(i=1;i<=NF;i++) if(!($i in seen)) { seen[$i] # this time we buffer output (below): buffer=buffer (buffer==""?"":",") $i } if(buffer!="") { # output unempty buffers after each record in a.txt print buffer buffer="" } }' b.txt a.txt

Output this time:

four five,six

0 讨论(0)

发布评论:

提交评论

加载中...

醉梦人生

2020-12-22 17:45

if you have vim installed,try this:

vimdiff file1 file2

or

vim -d file1 file2

you will find it fantastic.

0 讨论(0)

发布评论:

提交评论

加载中...

夕颜

2020-12-22 17:51

Sort them and use comm:

comm -23 <(sort a.txt) <(sort b.txt)

comm compares (sorted) input files and by default outputs three columns: lines that are unique to a, lines that are unique to b, and lines that are present in both. By specifying -1, -2 and/or -3 you can suppress the corresponding output. Therefore comm -23 a b lists only the entries that are unique to a. I use the <(...) syntax to sort the files on the fly, if they are already sorted you don't need this.

0 讨论(0)

发布评论:

提交评论

加载中...

灰色年华

2020-12-22 17:52

If you prefer the diff output style from git diff, you can use it with the --no-index flag to compare files not in a git repository:

git diff --no-index a.txt b.txt

Using a couple of files with around 200k file name strings in each, I benchmarked (with the built-in timecommand) this approach vs some of the other answers here:

git diff --no-index a.txt b.txt # ~1.2s comm -23 <(sort a.txt) <(sort b.txt) # ~0.2s diff a.txt b.txt # ~2.6s sdiff a.txt b.txt # ~2.7s vimdiff a.txt b.txt # ~3.2s

comm seems to be the fastest by far, while git diff --no-index appears to be the fastest approach for diff-style output.

Update 2018-03-25 You can actually omit the --no-index flag unless you are inside a git repository and want to compare untracked files within that repository. From the man pages:

This form is to compare the given two paths on the filesystem. You can omit the --no-index option when running the command in a working tree controlled by Git and at least one of the paths points outside the working tree, or when running the command outside a working tree controlled by Git.

0 讨论(0)

发布评论:

提交评论

加载中...

1 2 下一页

验证码

看不清?

提交回复