问题
Given a bibTex file, I need to add the respective fields(author, title, journal etc.) to a table in a MySQL database (with a custom schema).
After doing some initial research, I found that there exists Bibutils which I could use to convert a bib file to xml. My initial idea was to convert it to XML and then parse the XML in python to populate a dictionary.
My main questions are:
- Is there a better way I could do this conversion?
- Is there a library which directly parses a bibTex and gives me the fields in python?
(I did find bibliography.parsing, which uses bibutils internally but there is not much documentation on it and am finding it tough to get it to work).
回答1:
Old question, but I am doing the same thing at the moment using the Pybtex library, which has an inbuilt parser:
from pybtex.database.input import bibtex
#open a bibtex file
parser = bibtex.Parser()
bibdata = parser.parse_file("myrefs.bib")
#loop through the individual references
for bib_id in bibdata.entries:
b = bibdata.entries[bib_id].fields
try:
# change these lines to create a SQL insert
print b["title"]
print b["journal"]
print b["year"]
#deal with multiple authors
for author in bibdata.entries[bib_id].persons["author"]:
print author.first(), author.last()
# field may not exist for a reference
except(KeyError):
continue
回答2:
Converting to XML is a fine idea.
XML exists as an application-independent data format, so that you can parse it with readily-available libraries; using it as an intermediary has no particular drawbacks. In fact, you can usually import XML into a database without even going through a programming language such as Python (although the amount of Python you'd have to write for a task like this is trivial).
So far as I know, there is no direct, mature bibTeX reader for Python.
回答3:
You can also use Python BibtexParser: https://github.com/sciunto/python-bibtexparser
Documentation: https://bibtexparser.readthedocs.org
It's very straight forward (I use it in production).
For the record, I am not the developer of this library.
回答4:
You could use the Perl package Bib2ML (aka. Bib2HTML). It contains a bib2sql
tool that generates a SQL database from a BibTeX database, with the following schema:
An alternative tool: bibsql and bibtosql.
Then you can feed it to your schema by writing some SQL conversion queries.
回答5:
My workaround is to use bibtexparser to export relevant fields to a csv file;
import bibtexparser
import pandas as pd
with open("../../bib/small.bib") as bibtex_file:
bib_database = bibtexparser.load(bibtex_file)
df = pd.DataFrame(bib_database.entries)
selection = df[['doi', 'number']]
selection.to_csv('temp.csv', index=False)
And then write the csv to a table in the database, and delete the temp.csv
.
This avoids some complication with pybtex I found.
来源:https://stackoverflow.com/questions/9235853/convert-bibtex-file-to-database-entries-using-python