Project reference

If you want to cite this work, please simply refer to the github project:

GROBID (2008-2021) <>

Please do not include a particular person name to emphasize the project and the tool !

We also ask you not to cite any old research papers, but the current project itself, because, yes, we can try to cite a software project in the bibliographical references and not just mention it in a foot note ;) Well, it might be (likely) rejected by reviewers, the editorial style or editors, but at least you tried !

Here's a BibTeX entry using the Software Heritage project-level permanent identifier:

    title = {GROBID},
    howpublished = {\url{}},
    publisher = {GitHub},
    year = {2008--2021},
    archivePrefix = {swh},
    eprint = {1:dir:dab86b296e3c3216e2241968f0d63b68e8209d3c}

Presentations on Grobid

The following presentations are reminders that old machine learning stuff is not like good wine. Please use this project repository for up-to-date information.

GROBID in 30 slides (2015).

GROBID in 20 slides (2012).

P. Lopez. Automatic Extraction and Resolution of Bibliographical References in Patent Documents. First Information Retrieval Facility Conference (IRFC), Vienna, May 2010. LNCS 6107, pp. 120-135. Springer, Heidelberg, 2010.

Evaluation and usages

The following articles are provided for information - it does not mean that we agree with all their statements about Grobid (please refer to the present documentation for the actual features and capacities of the tool) or with all the various methodologies used for evaluation, but they all explore interesting aspects related to Grobid.

Articles on CRF for bibliographical reference parsing

For archeological purposes, the first paper has been the main motivation and influence for starting GROBID.


For end-to-end evaluation:

For layout/zoning identification:

Similar open source tools

Transformer/Layout joint approaches (open source)


Created in the context of PdfPig, the following page is a great collection of resources on Document Layout Analysis: