Convert BibTex file to database entries using Python

后端 未结 5 1577
星月不相逢
星月不相逢 2021-02-02 15:41

Given a bibTex file, I need to add the respective fields(author, title, journal etc.) to a table in a MySQL database (with a custom schema).

After doing some initial re

相关标签:
5条回答
  • 2021-02-02 15:45

    My workaround is to use bibtexparser to export relevant fields to a csv file;

    import bibtexparser
    import pandas as pd
    
    with open("../../bib/small.bib") as bibtex_file:
        bib_database = bibtexparser.load(bibtex_file)
        
    df = pd.DataFrame(bib_database.entries)
    selection = df[['doi', 'number']]
    selection.to_csv('temp.csv', index=False)
    

    And then write the csv to a table in the database, and delete the temp.csv.

    This avoids some complication with pybtex I found.

    0 讨论(0)
  • 2021-02-02 15:49

    You could use the Perl package Bib2ML (aka. Bib2HTML). It contains a bib2sql tool that generates a SQL database from a BibTeX database, with the following schema:

    An alternative tool: bibsql and bibtosql.

    Then you can feed it to your schema by writing some SQL conversion queries.

    0 讨论(0)
  • 2021-02-02 15:50

    Old question, but I am doing the same thing at the moment using the Pybtex library, which has an inbuilt parser:

    from pybtex.database.input import bibtex
    
    #open a bibtex file
    parser = bibtex.Parser()
    bibdata = parser.parse_file("myrefs.bib")
    
    #loop through the individual references
    for bib_id in bibdata.entries:
        b = bibdata.entries[bib_id].fields
        try:
            # change these lines to create a SQL insert
            print b["title"]
            print b["journal"]
            print b["year"]
            #deal with multiple authors
            for author in bibdata.entries[bib_id].persons["author"]:
                print author.first(), author.last()
        # field may not exist for a reference
        except(KeyError):
            continue
    
    0 讨论(0)
  • 2021-02-02 15:52

    Converting to XML is a fine idea.

    XML exists as an application-independent data format, so that you can parse it with readily-available libraries; using it as an intermediary has no particular drawbacks. In fact, you can usually import XML into a database without even going through a programming language such as Python (although the amount of Python you'd have to write for a task like this is trivial).

    So far as I know, there is no direct, mature bibTeX reader for Python.

    0 讨论(0)
  • 2021-02-02 15:56

    You can also use Python BibtexParser: https://github.com/sciunto/python-bibtexparser

    Documentation: https://bibtexparser.readthedocs.org

    It's very straight forward (I use it in production).

    For the record, I am not the developer of this library.

    0 讨论(0)
提交回复
热议问题