问题
Does anyone know the best way of getting Actor Ids from Freebase data dumps, and later on getting the IMDB ids and biographies from the Freebase API?
回答1:
Actors will have the type /film/actor and look like this in the dump:
ns:m.010q36 rdf:type ns:film.actor.
You can find them all in a few minutes from the compressed dump with a simple grep:
zgrep $'rdf:type\tns:film.actor.' freebase-rdf-<date of dump>.gz | cut -f 1 | cut -d ':' -f 2 > actor-mids.txt
This will generate a list of MIDs in the form m.010q36
which represents the MID /m/010q36
.
Using the list of MIDs, look for all lines which have that MID in the first column, one of your desired properties in the second. You could do this using Python, grep, or the tool/language of your choice. Of course if you're using a programming language like Python, you could roll the initial search.
Wikipedia and IMDB IDs are stored as what Freebase calls keys and look like this (MusicBrainz & Netflix included too):
ns:m.010q36 ns:type.object.key "/wikipedia/en/Mr$002ERodgers".
ns:m.010q36 ns:type.object.key "/authority/imdb/name/nm0736872".
ns:m.010q36 ns:type.object.key "/authority/musicbrainz/87467525-3724-412d-ad3e-595ecb6a3bfd".
ns:m.010q36 ns:type.object.key "/authority/netflix/role/30006685".
Keys may be encoded (like the Wikipedia key above). You can find documentation on the Freebase wiki on how to deal with them.
来源:https://stackoverflow.com/questions/17532534/gettting-actor-ids-and-biographies-from-the-data-dumps-or-freebase-api