Skip to content

GROBID Documentation

Home

GROBID Documentation

Getting Started

New to GROBID? Start here to get up and running quickly.

Quick start — install and launch GROBID in minutes
Run with Docker — the easiest way to deploy GROBID
Troubleshooting and FAQ — common issues and solutions

Upgrading

Upgrade guide — what to know when moving between major GROBID versions

User Guide

Everything you need to use GROBID once it's running.

Using the REST API — endpoints, parameters, and client libraries
Understanding the output (TEI) — structure of the TEI XML results
PDF coordinates — extracting bounding boxes for structures in the original PDF
Configuration — tuning GROBID for your use case
Consolidation service — linking extracted references to external metadata
Specialized processes — patents, medical, and other domain-specific workflows

About

Introduction — what GROBID is and what it does
How GROBID works — architecture and processing pipeline
Benchmarks — evaluation methodology and overview of results
References — publications about GROBID
License
Community — mailing list, Discord, and how to get involved

Developer Guide

Building, training, and extending GROBID.

Build from source — set up a development environment
Training and evaluating models — retrain or fine-tune GROBID models
End-to-end evaluation — evaluate full pipeline performance
Deep Learning models — using DL models instead of default CRF
Developer notes — internal conventions and tips for contributors
Recompiling CRF libraries — rebuilding native CRF dependencies

Annotation Guidelines

Guidelines for annotating training data.

Benchmarking

Detailed evaluation results on specific datasets.

Archive

Deprecated features kept for reference.