I have two text files, file1
and file2
.
File1
contains a bunch of random words, and file2
contains words that I w
get the words from each:
f1 = open("/path/to/file1", "r")
f2 = open("/path/to/file2", "r")
file1_raw = f1.read()
file2_raw = f2.read()
file1_words = file1_raw.split()
file2_words = file2_raw.split()
if you want unique words from file1 that aren't in file2:
result = set(file1_words).difference(set(file2_words))
if you care about removing the words from the text of file1
for w in file2_words:
file1_raw = file1_raw.replace(w, "")
If you read the words into a set
(one for each file), you can use set.difference()
. This works if you don't care about the order of the output.
If you care about the order, read the first file into a list, the second into a set, and remove all the elements in the list that are in the set.
a = ["a", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog"]
b = {"quick", "brown"}
c = [x for x in a if not x in b]
print c
gives: ['a', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog']