Contents

bibtexparser: API

bibtexparser — Parsing and writing BibTeX files

BibTeX is a bibliographic data file format.

The bibtexparser module can parse BibTeX files and write them. The API is similar to the json module. The parsed data is returned as a simple BibDatabase object with the main attribute being entries representing bibliographic sources such as books and journal articles.

The following functions provide a quick and basic way to manipulate a BibTeX file. More advanced features are also available in this module.

Parsing a file is as simple as:

import bibtexparser
with open('bibtex.bib') as bibtex_file:
   bibtex_database = bibtexparser.load(bibtex_file)

And writing:

import bibtexparser
with open('bibtex.bib', 'w') as bibtex_file:
    bibtexparser.dump(bibtex_database, bibtex_file)
bibtexparser.load(bibtex_file, parser=None)[source]

Load BibDatabase object from a file

Parameters:
  • bibtex_file (file) – input file to be parsed
  • parser (BibTexParser) – custom parser to use (optional)
Returns:

bibliographic database object

Return type:

BibDatabase

Example:

import bibtexparser
with open('bibtex.bib') as bibtex_file:
   bibtex_database = bibtexparser.load(bibtex_file)
bibtexparser.loads(bibtex_str, parser=None)[source]

Load BibDatabase object from a string

Parameters:
  • bibtex_str (str or unicode) – input BibTeX string to be parsed
  • parser (BibTexParser) – custom parser to use (optional)
Returns:

bibliographic database object

Return type:

BibDatabase

bibtexparser.dumps(bib_database, writer=None)[source]

Dump BibDatabase object to a BibTeX string

Parameters:
  • bib_database (BibDatabase) – bibliographic database object
  • writer (BibTexWriter) – custom writer to use (optional) (not yet implemented)
Returns:

BibTeX string

Return type:

unicode

bibtexparser.dump(bib_database, bibtex_file, writer=None)[source]

Dump BibDatabase object as a BibTeX text file

Parameters:
  • bib_database (BibDatabase) – bibliographic database object
  • bibtex_file (file) – file to write to
  • writer (BibTexWriter) – custom writer to use (optional) (not yet implemented)

Example:

import bibtexparser
with open('bibtex.bib', 'w') as bibtex_file:
    bibtexparser.dump(bibtex_database, bibtex_file)

bibtexparser.bibdatabase — The bibliographic database object

class bibtexparser.bibdatabase.BibDatabase[source]

Bibliographic database object that follows the data structure of a BibTeX file.

comments = None

List of BibTeX comment (@comment{…}) blocks.

entries = None

List of BibTeX entries, for example @book{…}, @article{…}, etc. Each entry is a simple dict with BibTeX field-value pairs, for example ‘author’: ‘Bird, R.B. and Armstrong, R.C. and Hassager, O.’ Each entry will always have the following dict keys (in addition to other BibTeX fields):

  • ID (BibTeX key)
  • ENTRYTYPE (entry type in lowercase, e.g. book, article etc.)
entries_dict

Return a dictionary of BibTeX entries. The dict key is the BibTeX entry key

preambles = None

List of BibTeX preamble (@preamble{…}) blocks.

strings = None

OrderedDict of BibTeX string definitions (@string{…}). In order of definition.

bibtexparser.bparser — Tune the default parser

class bibtexparser.bparser.BibTexParser(data=None, customization=None, ignore_nonstandard_types=True, homogenize_fields=False, interpolate_strings=True, common_strings=False, add_missing_from_crossref=False)[source]

A parser for reading BibTeX bibliographic data files.

Example:

from bibtexparser.bparser import BibTexParser

bibtex_str = ...

parser = BibTexParser()
parser.ignore_nonstandard_types = False
parser.homogenize_fields = False
parser.common_strings = False
bib_database = bibtexparser.loads(bibtex_str, parser)
Parameters:
  • customization – function or None (default) Customization to apply to parsed entries.
  • ignore_nonstandard_types – bool (default True) If True ignores non-standard bibtex entry types.
  • homogenize_fields – bool (default False) Common field name replacements (as set in alt_dict attribute).
  • interpolate_strings – bool (default True) If True, replace bibtex string by their value, else uses BibDataString objects.
  • common_strings – bool (default False) Include common string definitions (e.g. month abbreviations) to the bibtex file.
  • add_missing_from_crossref – bool (default False) Resolve BibTeX references set in the crossref field for BibTeX entries and add the fields from the referenced entry to the referencing entry.
common_strings = None

Load common strings such as months abbreviation Default: False.

customization = None

Callback function to process BibTeX entries after parsing, for example to create a list from a string with multiple values. By default all BibTeX values are treated as simple strings. Default: None.

homogenize_fields = None

Sanitize BibTeX field names, for example change url to link etc. Field names are always converted to lowercase names. Default: False.

ignore_nonstandard_types = None

Ignore non-standard BibTeX types (book, article, etc). Default: True.

interpolate_strings = None

Interpolate Bibtex Strings or keep the structure

parse(bibtex_str, partial=False)[source]

Parse a BibTeX string into an object

Parameters:
  • bibtex_str – BibTeX string
  • partial – If True, print errors only on parsing failures. If False, an exception is raised.
Type:

str or unicode

Type:

boolean

Returns:

bibliographic database

Return type:

BibDatabase

parse_file(file, partial=False)[source]

Parse a BibTeX file into an object

Parameters:
  • file – BibTeX file or file-like object
  • partial – If True, print errors only on parsing failures. If False, an exception is raised.
Type:

file

Type:

boolean

Returns:

bibliographic database

Return type:

BibDatabase

bibtexparser.customization — Functions to customize records

A set of functions useful for customizing bibtex fields. You can find inspiration from these functions to design yours. Each of them takes a record and return the modified record.

bibtexparser.customization.splitname(name, strict_mode=True)[source]

Break a name into its constituent parts: First, von, Last, and Jr.

Parameters:
  • name (string) – a string containing a single name
  • strict_mode (Boolean) – whether to use strict mode
Returns:

dictionary of constituent parts

Raises:

customization.InvalidName – If an invalid name is given and strict_mode = True.

In BibTeX, a name can be represented in any of three forms:
  • First von Last
  • von Last, First
  • von Last, Jr, First

This function attempts to split a given name into its four parts. The returned dictionary has keys of first, last, von and jr. Each value is a list of the words making up that part; this may be an empty list. If the input has no non-whitespace characters, a blank dictionary is returned.

It is capable of detecting some errors with the input name. If the strict_mode parameter is True, which is the default, this results in a customization.InvalidName exception being raised. If it is False, the function continues, working around the error as best it can. The errors that can be detected are listed below along with the handling for non-strict mode:

  • Name finishes with a trailing comma: delete the comma
  • Too many parts (e.g., von Last, Jr, First, Error): merge extra parts into First
  • Unterminated opening brace: add closing brace to end of input
  • Unmatched closing brace: add opening brace at start of word
bibtexparser.customization.getnames(names)[source]

Convert people names as surname, firstnames or surname, initials.

Parameters:names (list) – a list of names
Returns:list – Correctly formated names

Note

This function is known to be too simple to handle properly the complex rules. We would like to enhance this in forthcoming releases.

bibtexparser.customization.author(record)[source]

Split author field into a list of “Name, Surname”.

Parameters:record (dict) – the record.
Returns:dict – the modified record.
bibtexparser.customization.editor(record)[source]

Turn the editor field into a dict composed of the original editor name and a editor id (without coma or blank).

Parameters:record (dict) – the record.
Returns:dict – the modified record.
bibtexparser.customization.journal(record)[source]

Turn the journal field into a dict composed of the original journal name and a journal id (without coma or blank).

Parameters:record (dict) – the record.
Returns:dict – the modified record.
bibtexparser.customization.keyword(record, sep=', |;')[source]

Split keyword field into a list.

Parameters:
  • record (string, optional) – the record.
  • sep – pattern used for the splitting regexp.
Returns:

dict – the modified record.

Parameters:record (dict) – the record.
Returns:dict – the modified record.
bibtexparser.customization.page_double_hyphen(record)[source]

Separate pages by a double hyphen (–).

Parameters:record (dict) – the record.
Returns:dict – the modified record.
bibtexparser.customization.doi(record)[source]
Parameters:record (dict) – the record.
Returns:dict – the modified record.
bibtexparser.customization.type(record)[source]

Put the type into lower case.

Parameters:record (dict) – the record.
Returns:dict – the modified record.
bibtexparser.customization.convert_to_unicode(record)[source]

Convert accent from latex to unicode style.

Parameters:record (dict) – the record.
Returns:dict – the modified record.
bibtexparser.customization.homogenize_latex_encoding(record)[source]

Homogenize the latex enconding style for bibtex

This function is experimental.

Parameters:record (dict) – the record.
Returns:dict – the modified record.
bibtexparser.customization.add_plaintext_fields(record)[source]

For each field in the record, add a plain_ field containing the plaintext, stripped from braces and similar. See https://github.com/sciunto-org/python-bibtexparser/issues/116.

Parameters:record (dict) – the record.
Returns:dict – the modified record.

Exception classes

class bibtexparser.customization.InvalidName[source]

Exception raised by customization.splitname() when an invalid name is input.

bibtexparser.bwriter — Tune the default writer

class bibtexparser.bwriter.BibTexWriter(write_common_strings=False)[source]

Writer to convert a BibDatabase object to a string or file formatted as a BibTeX file.

Example:

from bibtexparser.bwriter import BibTexWriter

bib_database = ...

writer = BibTexWriter()
writer.contents = ['comments', 'entries']
writer.indent = '  '
writer.order_entries_by = ('ENTRYTYPE', 'author', 'year')
bibtex_str = bibtexparser.dumps(bib_database, writer)
add_trailing_comma = None

BibTeX syntax allows the comma to be optional at the end of the last field in an entry. Use this to enable writing this last comma in the bwriter output. Defaults: False.

comma_first = None

BibTeX syntax allows comma first syntax (common in functional languages), use this to enable comma first syntax as the bwriter output

common_strings = None

Whether common strings are written

contents = None

List of BibTeX elements to write, valid values are entries, comments, preambles, strings.

display_order = None

Tuple of fields for display order in a single BibTeX entry. Fields not listed here will be displayed alphabetically at the end. Set to ‘[]’ for alphabetical order. Default: ‘[]’

entry_separator = None

Characters(s) for separating BibTeX entries. Default: new line.

indent = None

Character(s) for indenting BibTeX field-value pairs. Default: single space.

order_entries_by = None

Tuple of fields for ordering BibTeX entries. Set to None to disable sorting. Default: BibTeX key (‘ID’, ).

write(bib_database)[source]

Converts a bibliographic database to a BibTeX-formatted string.

Parameters:bib_database (BibDatabase) – bibliographic database to be converted to a BibTeX string
Returns:BibTeX-formatted string
Return type:str or unicode

bibtexparser.bibtexexpression — Parser’s core relying on pyparsing

class bibtexparser.bibtexexpression.BibtexExpression[source]

Gives access to pyparsing expressions.

Attributes are pyparsing expressions for the following elements:

  • main_expression: the bibtex file
  • string_def: a string definition
  • preamble_decl: a preamble declaration
  • explicit_comment: an explicit comment
  • entry: an entry definition
  • implicit_comment: an implicit comment
exception ParseException(pstr, loc=0, msg=None, elem=None)

Exception thrown when parse expressions don’t match class; supported attributes by name are: - lineno - returns the line number of the exception text - col - returns the column number of the exception text - line - returns the line containing the exception text

Example:

try:
    Word(nums).setName("integer").parseString("ABC")
except ParseException as pe:
    print(pe)
    print("column: {}".format(pe.col))

prints:

Expected integer (at char 0), (line:1, col:1)
 column: 1
static explain(exc, depth=16)

Method to take an exception and translate the Python internal traceback into a list of the pyparsing expressions that caused the exception to be raised.

Parameters:

  • exc - exception raised during parsing (need not be a ParseException, in support of Python exceptions that might be raised in a parse action)
  • depth (default=16) - number of levels back in the stack trace to list expression and function names; if None, the full stack trace names will be listed; if 0, only the failing input line, marker, and exception string will be shown

Returns a multi-line string listing the ParserElements and/or function names in the exception’s stack trace.

Note: the diagnostic output will include string representations of the expressions that failed to parse. These representations will be more helpful if you use setName to give identifiable names to your expressions. Otherwise they will use the default string forms, which may be cryptic to read.

explain() is only supported under Python 3.

add_log_function(log_fun)[source]

Add notice to logger on entry, comment, preamble, string definitions.

Parameters:log_fun – logger function
set_string_expression_parse_action(fun)[source]

Set the parseAction for string_expression expression.

Note

See set_string_name_parse_action.

set_string_name_parse_action(fun)[source]

Set the parseAction for string name expression.

Note

For some reason pyparsing duplicates the string_name expression so setting its parseAction a posteriori has no effect in the context of a string expression. This is why this function should be used instead.

bibtexparser.bibtexexpression.add_logger_parse_action(expr, log_func)[source]

Register a callback on expression parsing with the adequate message.

bibtexparser.bibtexexpression.field_to_pair(string_, location, token)[source]

Looks for parsed element named ‘Field’.

Returns:(name, value).
bibtexparser.bibtexexpression.in_braces_or_pars(exp)[source]

exp -> (exp)|{exp}

bibtexparser.bibtexexpression.strip_after_new_lines(s)[source]

Removes leading and trailing whitespaces in all but first line.

Parameters:s – string or BibDataStringExpression