问题
BACKGROUND: I'm importing the Stanford CoreNLP library into my clojure project. I was using version 3.5.1 but recently jumped directly into version 3.6.0, bypassing 3.5.2. As part of this update, because I was getting coreference information using the dcoref annotator, I needed to make small modifications so that my program used the coref annotator instead.
In the past (v3.5.1), when I created a pipeline with the following annotators
"tokenize, ssplit, pos, lemma, ner, parse, depparse, dcoref, quote, entitymentions"
,
I could parse a sentence such as the following without error:
"I ate bread".
If I remember correctly, extracting the coreference chains from the resulting annotated document would just return an null value, or maybe an empty array. But that's inconsequential, because at least the annotated document would be created without error.
Now, when I create a pipeline with the following annotators:
"tokenize, ssplit, pos, lemma, ner, parse, depparse, mention, coref, quote, entitymentions"
,
and then I try to parse that same sentence (or any other sentences with only 1 or 0 "mentions") I get an indexoutofboundsexception with the following trace:
actual: java.lang.RuntimeException: Error annotating document with coref
at edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate (StatisticalCorefSystem.java:79)
edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate (StatisticalCorefSystem.java:62)
edu.stanford.nlp.pipeline.CorefAnnotator.annotate (CorefAnnotator.java:100)
edu.stanford.nlp.pipeline.AnnotationPipeline.annotate (AnnotationPipeline.java:68)
edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate (StanfordCoreNLP.java:491)
nlp.core$parse_text.invoke (core.clj:199)
nlp.focus_scorer.process$lexchain_features.invoke (process.clj:63)
nlp.focus_scorer.process_test/fn (process_test.clj:49)
clojure.test$test_var$fn__7670.invoke (test.clj:704)
clojure.test$test_var.invoke (test.clj:704)
clojure.test$test_vars$fn__7692$fn__7697.invoke (test.clj:722)
clojure.test$default_fixture.invoke (test.clj:674)
clojure.test$test_vars$fn__7692.invoke (test.clj:722)
clojure.test$default_fixture.invoke (test.clj:674)
clojure.test$test_vars.invoke (test.clj:718)
clojure.test$test_all_vars.invoke (test.clj:728)
clojure.test$test_ns.invoke (test.clj:747)
clojure.core$map$fn__4553.invoke (core.clj:2624)
clojure.lang.LazySeq.sval (LazySeq.java:40)
clojure.lang.LazySeq.seq (LazySeq.java:49)
clojure.lang.Cons.next (Cons.java:39)
clojure.lang.RT.boundedLength (RT.java:1735)
clojure.lang.RestFn.applyTo (RestFn.java:130)
clojure.core$apply.invoke (core.clj:632)
clojure.test$run_tests.doInvoke (test.clj:762)
clojure.lang.RestFn.invoke (RestFn.java:408)
user$eval13163.invoke (form-init7737210093072696705.clj:1)
clojure.lang.Compiler.eval (Compiler.java:6782)
clojure.lang.Compiler.eval (Compiler.java:6745)
clojure.core$eval.invoke (core.clj:3081)
clojure.main$repl$read_eval_print__7099$fn__7102.invoke (main.clj:240)
clojure.main$repl$read_eval_print__7099.invoke (main.clj:240)
clojure.main$repl$fn__7108.invoke (main.clj:258)
clojure.main$repl.doInvoke (main.clj:258)
clojure.lang.RestFn.invoke (RestFn.java:1523)
clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__909.invoke (interruptible_eval.clj:58)
clojure.lang.AFn.applyToHelper (AFn.java:152)
clojure.lang.AFn.applyTo (AFn.java:144)
clojure.core$apply.invoke (core.clj:630)
clojure.core$with_bindings_STAR_.doInvoke (core.clj:1868)
clojure.lang.RestFn.invoke (RestFn.java:425)
clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invoke (interruptible_eval.clj:56)
clojure.tools.nrepl.middleware.interruptible_eval$interruptible_eval$fn__951$fn__954.invoke (interruptible_eval.clj:191)
clojure.tools.nrepl.middleware.interruptible_eval$run_next$fn__946.invoke (interruptible_eval.clj:159)
clojure.lang.AFn.run (AFn.java:22)
java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617)
java.lang.Thread.run (Thread.java:745)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList$SubList.rangeCheck (ArrayList.java:1217)
java.util.ArrayList$SubList.get (ArrayList.java:1034)
edu.stanford.nlp.scoref.Clusterer$State.setClusters (Clusterer.java:349)
edu.stanford.nlp.scoref.Clusterer$State.<init> (Clusterer.java:322)
edu.stanford.nlp.scoref.Clusterer.getClusterMerges (Clusterer.java:58)
edu.stanford.nlp.scoref.ClusteringCorefSystem.runCoref (ClusteringCorefSystem.java:63)
edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate (StatisticalCorefSystem.java:68)
edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate (StatisticalCorefSystem.java:62)
edu.stanford.nlp.pipeline.CorefAnnotator.annotate (CorefAnnotator.java:100)
edu.stanford.nlp.pipeline.AnnotationPipeline.annotate (AnnotationPipeline.java:68)
edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate (StanfordCoreNLP.java:491)
nlp.core$parse_text.invoke (core.clj:199)
nlp.focus_scorer.process$lexchain_features.invoke (process.clj:63)
nlp.focus_scorer.process_test/fn (process_test.clj:49)
clojure.test$test_var$fn__7670.invoke (test.clj:704)
clojure.test$test_var.invoke (test.clj:704)
clojure.test$test_vars$fn__7692$fn__7697.invoke (test.clj:722)
clojure.test$default_fixture.invoke (test.clj:674)
clojure.test$test_vars$fn__7692.invoke (test.clj:722)
clojure.test$default_fixture.invoke (test.clj:674)
clojure.test$test_vars.invoke (test.clj:718)
clojure.test$test_all_vars.invoke (test.clj:728)
clojure.test$test_ns.invoke (test.clj:747)
clojure.core$map$fn__4553.invoke (core.clj:2624)
clojure.lang.LazySeq.sval (LazySeq.java:40)
clojure.lang.LazySeq.seq (LazySeq.java:49)
clojure.lang.Cons.next (Cons.java:39)
clojure.lang.RT.boundedLength (RT.java:1735)
clojure.lang.RestFn.applyTo (RestFn.java:130)
clojure.core$apply.invoke (core.clj:632)
clojure.test$run_tests.doInvoke (test.clj:762)
clojure.lang.RestFn.invoke (RestFn.java:408)
user$eval13163.invoke (form-init7737210093072696705.clj:1)
clojure.lang.Compiler.eval (Compiler.java:6782)
clojure.lang.Compiler.eval (Compiler.java:6745)
clojure.core$eval.invoke (core.clj:3081)
clojure.main$repl$read_eval_print__7099$fn__7102.invoke (main.clj:240)
clojure.main$repl$read_eval_print__7099.invoke (main.clj:240)
clojure.main$repl$fn__7108.invoke (main.clj:258)
clojure.main$repl.doInvoke (main.clj:258)
clojure.lang.RestFn.invoke (RestFn.java:1523)
clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__909.invoke (interruptible_eval.clj:58)
clojure.lang.AFn.applyToHelper (AFn.java:152)
clojure.lang.AFn.applyTo (AFn.java:144)
clojure.core$apply.invoke (core.clj:630)
clojure.core$with_bindings_STAR_.doInvoke (core.clj:1868)
clojure.lang.RestFn.invoke (RestFn.java:425)
clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invoke (interruptible_eval.clj:56)
clojure.tools.nrepl.middleware.interruptible_eval$interruptible_eval$fn__951$fn__954.invoke (interruptible_eval.clj:191)
clojure.tools.nrepl.middleware.interruptible_eval$run_next$fn__946.invoke (interruptible_eval.clj:159)
clojure.lang.AFn.run (AFn.java:22)
java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617)
java.lang.Thread.run (Thread.java:745)
Am I possibly doing something wrong? I realize that the fact that I'm using clojure instead of java might be causing some issue, but I've never had a problem with version 3.5.1. It would seem that the error is being thrown from the annotation step in edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate, but I'm not sure what I can do about that (other than to have two pipeline objects, one with the coref annotator and one without, parse the sentence without coref, count the mentions, and then parse with coref only if I see more than one mention... which seems a little too much.)
回答1:
3.6.0 features major changes to coreference. This issue is a bug in Stanford CoreNLP 3.6.0. If you re-download the distribution this bug should be fixed in what's up on the site now. It should also be fixed in the up-coming Maven release.
来源:https://stackoverflow.com/questions/34902540/stanford-corenlp-pipeline-coref-parsing-some-short-strings-with-few-mentions