Comparing two text files and counting number of occurrences

廉价感情. 提交于 2019-12-11 00:17:28

问题


I'm trying to write a blog post about the dangers of having a common access point name.

So I did some wardriving to get a list of access point names, and I downloaded a list of the 1000 most common access point names (which there exists rainbow tables for) from Renderlab.

But how can I compare those two text files, to see how many of my collected access point names that are open to attacks from rainbow tables?

The text files are build like this:

collected.txt:

linksys
internet
hotspot

Most common access point names are called SSID.txt:

default
NETGEAR
Wireless
WLAN
Belkin54g

So the script should sort the lines, compare them and show how many times the lines from collected.txt are found in SSID.txt ..

Does that make any sense? Any help would be grateful :)


回答1:


If you don't mind using python script:

file1=open('collected.txt', 'r')            # open file 1 for reading
with open('SSID.txt', 'r') as content_file: # ready file 2
    SSID = content_file.read()

found={}                                    # summary of found names
for line in file1:
    if line in SSID:
        if line not in found:
            found[line]=1
        else:
            found[line]+=1
for i in found:
    print found[i], i                       # print out list and no. of occurencies

...it can be run in the dir containing these files - collected.txt and SSID.txt - it will return a list looking like this:

5 NETGEAR
3 default
(...)

Script reads file 1 line-by line and compares it to the whole file 2. It can be easily modified to take file names from command prompt.




回答2:


First, take a look on a simple tutorial about sdiff command, like How do I Compare two files under Linux or UNIX. Also, Notepad++ support this.




回答3:


To find the number of times each line in file A appears in file B, you can do:

awk 'FNR==NR{a[$0]=1; next} $0 in a { count[$0]++ } 
    END { for( i in a ) print i, count[i] }' A B

If you want the output sorted, pipe the output to sort, but there's no need to sort just to find the counts. Note that the $0 in a clause can be omitted at the cost of consuming more memory, which may be a problem if file B is very large.



来源:https://stackoverflow.com/questions/22686690/comparing-two-text-files-and-counting-number-of-occurrences

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!