问题
I'm having trouble returning proper results for an inverted index in python. I'm trying to load a list of strings in the variable 'strlist' and then with my Inverse index looping over the strings to return the word + where it occurs. Here is what I have going so far:
def inverseIndex(strlist):
d={}
for x in range(len(strlist)):
for y in strlist[x].split():
for index, word in set(enumerate([y])):
if word in d:
d=d.update(index)
else:
d._setitem_(index,word)
break
break
break
return d
Now when i run inverseIndex(strlist)
all it returns is {0:'This'}
where what I want is a dictionary mapping all the words in 'strlist'
to the set d
.
Is my initial approach wrong? am i tripping up in the if/else? Any and all help is greatly appreciated. to point me in the right direction.
回答1:
Based on what you're saying, I think you're trying to get some data like this:
input = ["hello world", "foo bar", "red cat"]
data_wanted = {
"foo" : 1,
"hello" : 0,
"cat" : 2,
"world" : 0,
"red" : 2
"bar" : 1
}
So what you should be doing is adding the words as keys to a dictionary, and have their values be the index of the substring in strlist
in which they are located.
def locateWords(strlist):
d = {}
for i, substr in enumerate(strlist): # gives you the index and the item itself
for word in substr.split()
d[word] = i
return d
If the word occurs in more than one string in strlist
, you should change the code to the following:
def locateWords(strlist):
d = {}
for i, substr in enumerate(strlist):
for word in substr.split()
if word not in d:
d[word] = [i]
else:
d[word].append(i)
return d
This changes the values to lists, which contain the indices of the substrings in strlist
which contain that word.
Some of your code's problems explained
{}
is not a set, it's a dictionary.break
forces a loop to terminate immediately - you didn't want to end the loop early because you still had data to process.d.update(index)
will give you aTypeError: 'int' object is not iterable
. This method actually takes an iterable object and updates the dictionary with it. Normally you would use a list of tuples for this:[("foo",1), ("hello",0)]
. It just adds the data to the dictionary.- You don't normally want to use
d.__setitem__
(which you typed wrong anyway). You'd just used[key] = value
. - You can iterate using a "for each" style loop instead, like my code above shows. Looping over the range means you are looping over the indices. (Not exactly a problem, but it could lead to extra bugs if you're not careful to use the indices properly).
It looks like you are coming from another programming language in which braces indicate sets and there is a keyword which ends control blocks (like if, fi
). It's easy to confuse syntax when you're first starting - but if you run into trouble running the code, look at the exceptions you get and search them on the web!
P.S. I'm not sure why you wanted a set - if there are duplicates, you probably want to know all of their locations, not just the first or the last one or anything in between. Just my $0.02.
回答2:
break
is not an end-of-block marker; it means "if you hit this line of code, exit the loop immediately". You probably don't want all those break
statements.
I'm not sure what you think the update
method does.
d.update(index)
will try to treat index
as a dict
or a sequence of key-value pairs and add all the mappings in index
to d
. Since index
is a number, this doesn't seem to be what you expect update
to do. Also, update
returns None
, which is the Python equivalent of not returning anything, so you probably don't want to assign its value to d
.
I'm not sure what you expect
for index, word in set(enumerate([y])):
to do. Let's go over what it does. [y]
creates a 1-element list whose only element is y
. enumerate([y])
will then return an iterator yielding a single element, the tuple (0, y)
. set(enumerate([y]))
will then take all the items from that iterator (so just one item) and make a set containing those items. Finally, for index, word in set(enumerate([y])):
will iterate over that one-item set, executing a single loop iteration with index == 0
and word == y
. This is probably not what you were trying to do.
The __setitem__
special method (which has two underscores on each side) is called by Python to implement element assignment.
d.__setitem__(index, word)
is better written as
d[index] = word
If you want to iterate over strlist
, then instead of using range(len(strlist))
, you can iterate over strlist
directly.
for x in range(len(strlist)):
for y in strlist[x].split():
is equivalent to
for string in strlist:
for y in string.split():
since looping over strlist will give the items of strlist.
I hope that helps.
来源:https://stackoverflow.com/questions/17554977/inverted-index-in-python-not-returning-desired-results