The RESTful API provides a simple and efficient way to use and deploy GROBID. As an alternative, the present page explains how to embed Grobid directly in your Java application.
After building the project, two core jar files are created: grobid-core-<current version>
.onejar.jar
and grobid-core-<current version>
.jar
A complete working maven project example of usage of GROBID Java API can be found here: https://github.com/kermitt2/grobid-example. The example project is using GROBID Java API for extracting header metadata and citations from a PDF and output the results in BibTex format.
Using maven
GROBID releases are uploaded on the grobid bintray repository.
You need to add the following snippet in your pom.xml in order to configure it:
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
In this way you after configuring such repository the dependencies will be automatically managed. Here an example of grobid-core dependency:
<dependency>
<groupId>com.github.kermitt2</groupId>
<artifactId>grobid</artifactId>
<version>0.7.0</version>
</dependency>
If you want to work on a SNAPSHOT development version, you need to include in your pom file the path to this snapshot Grobid jar file,
for instance as follow (if necessary replace 0.7.1-SNAPSHOT
by the valid <current version>
):
<dependency>
<groupId>org.grobid</groupId>
<artifactId>grobid-core</artifactId>
<version>0.7.1-SNAPSHOT</version>
<scope>system</scope>
<systemPath>${project.basedir}/lib/grobid-core-0.7.1-SNAPSHOT.jar</systemPath>
</dependency>
Using gradle
Add the following snippet in your gradle.build file:
repositories {
maven { url "https://jitpack.io" }
}
and add the Grobid dependency as well:
compile 'org.grobid:grobid-core:0.7.0'
compile 'org.grobid:grobid-trainer:0.7.0'
API call
When using Grobid, you have to initiate a context with the path to the Grobid resources, the following class give a complete example of usage:
import org.grobid.core.*;
import org.grobid.core.data.*;
import org.grobid.core.factory.*;
import org.grobid.core.mock.*;
import org.grobid.core.utilities.*;
import org.grobid.core.engines.Engine;
...
String pdfPath = "mypdffile.pdf";
...
try {
String pGrobidHome = "/Users/lopez/grobid/grobid-home";
// The GrobidHomeFinder can be instantiate without parameters to verify the grobid home in the standard
// location (classpath, ../grobid-home, ../../grobid-home)
// If the location is customised:
GrobidHomeFinder grobidHomeFinder = new GrobidHomeFinder(Arrays.asList(pGrobidHome));
//The grobid yaml config file needs to be instantiate using the correct grobidHomeFinder or it will use the default
//locations
GrobidProperties.getInstance(grobidHomeFinder);
System.out.println(">>>>>>>> GROBID_HOME="+GrobidProperties.getGrobidHome());
Engine engine = GrobidFactory.getInstance().createEngine();
// Biblio object for the result
BiblioItem resHeader = new BiblioItem();
String tei = engine.processHeader(pdfPath, 1, resHeader);
}
catch (Exception e) {
// If an exception is generated, print a stack trace
e.printStackTrace();
}
maven Skeleton project example
In the following archive, you can find a maven toy example project integrating Grobid in a third party Java project using maven: grobid-example.
You need a local grobid-home
installation to run GROBID (the resources are not embedded in the jar due to various reasons, in particular JNI and safety). The paths to grobid-home might need to be changed in the project config file: grobid-example/grobid-example.properties
according to your installation, for instance:
grobid_example.pGrobidHome=/Users/lopez/grobid/grobid-home
Then you can test the toy project:
> mvn test
Javadoc
The javadoc of the Grobid project is available here. All the main methods of the Grobid Java API are currently accessible via the single class org.grobid.core.engines.Engine. The various test files under grobid/grobid-core/src/test/java/org/grobid/core/test
further illustrate how to use the Grobid java API.