Project reference

If you want to cite this work, please simply refer to the github project:

GROBID (2008-2022) <https://github.com/kermitt2/grobid>

Please do not include a particular person name to emphasize the project and the tool !

We also ask you not to cite any old research papers, but the current project itself, because, yes, we can try to cite a software project in the bibliographical references and not just mention it in a foot note ;) Well, it might be (likely) rejected by reviewers, the editorial style or editors, but at least you tried !

Here's a BibTeX entry using the Software Heritage project-level permanent identifier:

@misc{GROBID,
    title = {GROBID},
    howpublished = {\url{https://github.com/kermitt2/grobid}},
    publisher = {GitHub},
    year = {2008--2022},
    archivePrefix = {swh},
    eprint = {1:dir:dab86b296e3c3216e2241968f0d63b68e8209d3c}
}

Evaluation and usages

The following articles are provided for information - it does not mean that we agree with all their statements about Grobid (please refer to the present documentation for the actual features and capacities of the tool) or with all the various methodologies used for evaluation, but they all explore interesting aspects related to Grobid.

Articles on CRF for bibliographical reference parsing

For archeological purposes, the first paper has been the main motivation and influence for starting GROBID.

Datasets

For end-to-end evaluation:

For layout/zoning identification:

Similar open source tools

Transformer/Layout joint approaches (open source)

Other

Created in the context of PdfPig, the following page is a great collection of resources on Document Layout Analysis: https://github.com/BobLd/DocumentLayoutAnalysis