Given a bibTex file, I need to add the respective fields(author, title, journal etc.) to a table in a MySQL database (with a custom schema).
After doing some initial re
My workaround is to use bibtexparser to export relevant fields to a csv file;
import bibtexparser
import pandas as pd
with open("../../bib/small.bib") as bibtex_file:
bib_database = bibtexparser.load(bibtex_file)
df = pd.DataFrame(bib_database.entries)
selection = df[['doi', 'number']]
selection.to_csv('temp.csv', index=False)
And then write the csv to a table in the database, and delete the temp.csv
.
This avoids some complication with pybtex I found.
You could use the Perl package Bib2ML (aka. Bib2HTML). It contains a bib2sql
tool that generates a SQL database from a BibTeX database, with the following schema:
An alternative tool: bibsql and bibtosql.
Then you can feed it to your schema by writing some SQL conversion queries.
Old question, but I am doing the same thing at the moment using the Pybtex library, which has an inbuilt parser:
from pybtex.database.input import bibtex
#open a bibtex file
parser = bibtex.Parser()
bibdata = parser.parse_file("myrefs.bib")
#loop through the individual references
for bib_id in bibdata.entries:
b = bibdata.entries[bib_id].fields
try:
# change these lines to create a SQL insert
print b["title"]
print b["journal"]
print b["year"]
#deal with multiple authors
for author in bibdata.entries[bib_id].persons["author"]:
print author.first(), author.last()
# field may not exist for a reference
except(KeyError):
continue
Converting to XML is a fine idea.
XML exists as an application-independent data format, so that you can parse it with readily-available libraries; using it as an intermediary has no particular drawbacks. In fact, you can usually import XML into a database without even going through a programming language such as Python (although the amount of Python you'd have to write for a task like this is trivial).
So far as I know, there is no direct, mature bibTeX reader for Python.
You can also use Python BibtexParser: https://github.com/sciunto/python-bibtexparser
Documentation: https://bibtexparser.readthedocs.org
It's very straight forward (I use it in production).
For the record, I am not the developer of this library.