dbpedia spotlight dataset

拜拜、爱过 提交于 2019-12-11 07:37:19

问题


I installed the DBpedia Spotlight from http://spotlight.dbpedia.org/download/release-0.5/dbpedia-spotlight-quickstart.zip and wanted to improve its dataset by downloading from https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Downloads.

Can someone tell me how to use the data from spotter lexicon and disambiguation index with the jar files.


回答1:


Assuming you have already downloaded and decompressed the files below:

wget http://spotlight.dbpedia.org/download/release-0.5/context-index-compact.tgz
tar zxvf context-index-compact.tgz
wget http://spotlight.dbpedia.org/download/release-0.4/surface_forms-Wikipedia-TitRedDis.uriThresh75.tsv.spotterDictionary.gz
gunzip surface_forms-Wikipedia-TitRedDis.uriThresh75.tsv.spotterDictionary.gz

Now you just need to change the server.properties file to point to your newly extracted files:

org.dbpedia.spotlight.index.dir = index-withSF-withTypes-compressed
org.dbpedia.spotlight.spot.dictionary = surface_forms-Wikipedia-TitRedDis.uriThresh75.tsv.spotterDictionary

If you are using the largest spotter dict, you may need to increase the java heap space -- e.g. -Xmx10G in your command line.



来源:https://stackoverflow.com/questions/11088289/dbpedia-spotlight-dataset

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!