Gettting Actor Ids and biographies from the data dumps or Freebase API

丶灬走出姿态 提交于 2019-12-12 10:06:47

问题


Does anyone know the best way of getting Actor Ids from Freebase data dumps, and later on getting the IMDB ids and biographies from the Freebase API?


回答1:


Actors will have the type /film/actor and look like this in the dump:

ns:m.010q36     rdf:type        ns:film.actor.

You can find them all in a few minutes from the compressed dump with a simple grep:

zgrep $'rdf:type\tns:film.actor.' freebase-rdf-<date of dump>.gz | cut -f 1 | cut -d ':' -f 2 > actor-mids.txt

This will generate a list of MIDs in the form m.010q36 which represents the MID /m/010q36.

Using the list of MIDs, look for all lines which have that MID in the first column, one of your desired properties in the second. You could do this using Python, grep, or the tool/language of your choice. Of course if you're using a programming language like Python, you could roll the initial search.

Wikipedia and IMDB IDs are stored as what Freebase calls keys and look like this (MusicBrainz & Netflix included too):

ns:m.010q36     ns:type.object.key      "/wikipedia/en/Mr$002ERodgers".
ns:m.010q36     ns:type.object.key      "/authority/imdb/name/nm0736872".
ns:m.010q36     ns:type.object.key      "/authority/musicbrainz/87467525-3724-412d-ad3e-595ecb6a3bfd".
ns:m.010q36     ns:type.object.key      "/authority/netflix/role/30006685".

Keys may be encoded (like the Wikipedia key above). You can find documentation on the Freebase wiki on how to deal with them.



来源:https://stackoverflow.com/questions/17532534/gettting-actor-ids-and-biographies-from-the-data-dumps-or-freebase-api

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!