I want to compare two files (take line from first file and look up in whole second file) to see differences between them and write missing line from fileA.txt to end of fileB.tx
read in two files and convert to set
find union of two sets
sort union set based on time
join set to string with new line
import datetime
import
file1 = "fileA.txt"
file2 = "fileB.txt"
with open(file1 ,'rb') as f:
sa = set( line for line in f )
with open(file2 ,'rb') as f:
sb = set( line for line in f )
print '\n'.join( sorted( sa.union(sb), key = lambda x: datetime.datetime.strptime( ' '.join( x.split()[:3]), '%b %d %H:%M:%S' )) )
Oct 9 12:19:16 user sshd[12744]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 12:19:16 user sshd[12744]: Accepted password for root from 213.XXX.XXX.XX7 port 60554 ssh2
Oct 9 13:24:42 user sshd[12744]: pam_unix(sshd:session): session closed for user root
Oct 9 13:24:42 user sshd[12744]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct 9 13:25:31 user sshd[12844]: Accepted password for root from 213.XXX.XXX.XX7 port 33254 ssh2
Oct 9 13:25:31 user sshd[12844]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:35:48 user sshd[12868]: Accepted password for root from 213.XXX.XXX.XX7 port 33574 ssh2
Oct 9 13:35:48 user sshd[12868]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:46:58 user sshd[12844]: pam_unix(sshd:session): session closed for user root
Oct 9 13:46:58 user sshd[12844]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct 9 15:47:58 user sshd[12868]: pam_unix(sshd:session): session closed for user root
Oct 11 22:17:31 user sshd[2655]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 11 22:17:31 user sshd[2655]: Accepted password for root from 17X.XXX.XXX.X19 port 5567 ssh2
Try with this in the bash
:
cat fileA.txt fileB.txt | sort -M | uniq > new_file.txt
sort -M: sorts based on initial string, consisting of any amount of whitespace, followed by a month name abbreviation, is folded to UPPER case and compared in the order 'JAN' < 'FEB' < ... < 'DEC'. Invalid names compare low to valid names. The `LC_TIME' locale determines the month spellings.
uniq: filters out repeated lines in a file.
|: passes the output of one command to another for further processing.
What this will do is take the two files, sort them in the way described above, keep the unique items and store them in new_file.txt
Note: This is not a python solution but you have tagged the question with linux
so I thought it might interest you. Also you can find more detailed info about the commands used, here.