问题
I have a dataset (a_list_of_sentences) in the form of a list of lists of lists, where the smaller list consist in a word and its syntactic dependency, and these lists are joined into sentences, like this:
[[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']],
[['mary', 'nsubj'], ['loves', 'ROOT'], ['all', 'det'], ['men', 'dobj']],
[['all', 'det'], ['students', 'nsubj'], ['love', 'ROOT'], ['mary', 'dobj']]]
I want to find the sentences in which there is a quantifier (e.g. 'every', 'all') followed by a word whose syntactic dependency is subject ('nsubj') or object ('dobj') and distinguish between these two cases. For my purposes, the subject or the object could be either the first word after a quantifier or the second word after a quantifier. I tried to do that using enumerate(), in this way:
for sentence in a_list_of_sentences:
for i, j in enumerate(sentence):
if "dobj" in sentence[i]:
if "all" in sentence[i-1] or "all" in sentence[i-2] or "every" in sentence[i-1] or "every" in sentence[i-2]:
print(sentence, "dobj")
elif "nsubj" in sentence[i]:
if "all" in sentence[i-1] or "all" in sentence[i-2] or "every" in sentence[i-1] or "every" in sentence[i-2]:
print(sentence, "nsubj")
However, this code returns as quantifiers in both subject and object position the quantifiers in object position, because I get sentences like [['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']] in the two print output:
[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']] nsubj
[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']] dobj
Do you know what I am doing wrong and how I can fix it?
Thank you very much!!!
回答1:
The problem is that list slice indexes can be negative (if they didn't, you'd get IndexError). It's kind of a wrap around at (both) list ends.
Check [SO]: Understanding slice notation for more details.
Below is a cleaner variant.
code00.py:
#!/usr/bin/env python3
import sys
def main(*argv):
sentences = [
[["mary", "nsubj"], ["loves", "ROOT"], ["every", "det"], ["man", "dobj"]],
[["mary", "nsubj"], ["loves", "ROOT"], ["all", "det"], ["men", "dobj"]],
[["all", "det"], ["students", "nsubj"], ["love", "ROOT"], ["mary", "dobj"]],
]
quantifiers = ["all", "every"]
syntactic_roles = ["nsubj", "dobj"]
for sentence in sentences:
#print(sentence)
quantifier_idx = -1
for idx, (word, syntactic_role) in enumerate(sentence):
if quantifier_idx > -1 and idx - quantifier_idx in [1, 2] and syntactic_role in syntactic_roles:
print(" ".join(item[0] for item in sentence) + " - " + syntactic_role)
break
if word in quantifiers:
quantifier_idx = idx
if __name__ == "__main__":
print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
main(*sys.argv[1:])
print("\nDone.")
Output:
e:\Work\Dev\StackOverflow\q059500488>"c:\Install\pc064\Python\Python\03.08.01\python.exe" code00.py Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 23:11:46) [MSC v.1916 64 bit (AMD64)] 64bit on win32 mary loves every man - dobj mary loves all men - dobj all students love mary - nsubj Done.
回答2:
You can use negative indexes in lists. The below example will print 'c'.
mylist = ['a', 'b', 'c']
print(mylist[-1])
So if we take your first argument:
[['mary', 'nsubj'], ['loves', 'ROOT'], ['every', 'det'], ['man', 'dobj']]
It will first print on first word of the sentence on the elif statement since:
- mary is a nsubj
- and sentence[i-2], results in sentence[-2], which equals to 'every'
Now, it will also print on the last word of the sentence on the if statement since:
- man is a dobj
- and sentence[i-1], results in sentence[2], which equals to 'every'
I suggest that you look forward instead of backward, for instance with the following code:
quantifiers = ['every', 'all']
for sentence in a_list_of_sentences:
max_index = len(sentence) - 1
for word_index, word in enumerate(sentence):
if word[0] in quantifiers:
if max_index > word_index:
if sentence[word_index+1][1] in 'nsubj':
print(sentence, "nsubj")
elif sentence[word_index+1][1] in 'dobj':
print(sentence, "dobj")
if max_index > word_index + 1:
if sentence[word_index+2][1] in 'nsubj':
print(sentence, "nsubj")
elif sentence[word_index+2][1] in 'dobj':
print(sentence, "dobj")
At last, I have a small remark about how you use the index.
In your code, instead of:
for i, j in enumerate(sentence):
if "dobj" in sentence[i]:
You could do:
for i, j in enumerate(sentence):
if "dobj" in j:
来源:https://stackoverflow.com/questions/59500488/problem-with-indexes-in-enumerate-python