I recommend gscholar in combination with pdftotext.
Although PDF provides meta data, it is seldomly populated with correct content. Often "None" or "Adobe-Photoshop" or other dumb strings are inplace of the title field, for example. That is why none of the above tools might derive correct information from PDFs as the title might be anywhere in the document. Another example: many papers of conference proceedings might also have the title of the conference, or the name of the editors which confuses automatic extraction tools. The results are then dead wrong when you are interested of the real authors of the paper.
So I suggest a semi-automatic approach involving google scholar.
- Render the PDF to text, so you might extract: author, and title.
- Second copy paste some of this info and query google scholar. To automate this, I employ the cool python script gscholar.py.
So in real life this is what I do:
me@box> pdftotext 10.1.1.90.711.pdf - | head
Computational Geometry 23 (2002) 183–194
www.elsevier.com/locate/comgeo
Voronoi diagrams on the sphere ✩
Hyeon-Suk Na a , Chung-Nim Lee a , Otfried Cheong b,∗
a Department of Mathematics, Pohang University of Science and Technology, South Korea
b Institute of Information and Computing Sciences, Utrecht University, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
Received 28 June 2001; received in revised form 6 September 2001; accepted 12 February 2002
Communicated by J.-R. Sack
me@box> gscholar.py "Voronoi diagrams on the sphere Hyeon-Suk"
@article{na2002voronoi,
title={Voronoi diagrams on the sphere},
author={Na, Hyeon-Suk and Lee, Chung-Nim and Cheong, Otfried},
journal={Computational Geometry},
volume={23},
number={2},
pages={183--194},
year={2002},
publisher={Elsevier}
}
EDIT: Be careful, you might encounter captchas. Another great script is bibfetch.