Skip to content

Benchmaking biorxiv

Header metadata

Evaluation on 2000 random PDF files out of 1998 PDF (ratio 1.0).

Strict Matching (exact matches)

Field-level results

label precision recall f1 support
authors 84.48 83.59 84.03 1999
first_author 96.41 95.49 95.95 1997
title 77.18 75.95 76.56 2000
all fields (micro avg.) 86.04 85.01 85.52 5996
all fields (macro avg.) 86.02 85.01 85.52 5996

Soft Matching (ignoring punctuation, case and space characters mismatches)

Field-level results

label precision recall f1 support
authors 84.93 84.04 84.49 1999
first_author 96.66 95.74 96.2 1997
title 79.37 78.1 78.73 2000
all fields (micro avg.) 87 85.96 86.48 5996
all fields (macro avg.) 86.99 85.96 86.47 5996

Levenshtein Matching (Minimum Levenshtein distance at 0.8)

Field-level results

label precision recall f1 support
authors 92.21 91.25 91.73 1999
first_author 96.92 95.99 96.45 1997
title 91.77 90.3 91.03 2000
all fields (micro avg.) 93.64 92.51 93.07 5996
all fields (macro avg.) 93.63 92.51 93.07 5996

Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95)

Field-level results

label precision recall f1 support
authors 88.17 87.24 87.7 1999
first_author 96.41 95.49 95.95 1997
title 87.6 86.2 86.9 2000
all fields (micro avg.) 90.73 89.64 90.18 5996
all fields (macro avg.) 90.73 89.65 90.18 5996

Instance-level results

Total expected instances:   2000
Total correct instances:    1346 (strict) 
Total correct instances:    1381 (soft) 
Total correct instances:    1701 (Levenshtein) 
Total correct instances:    1570 (ObservedRatcliffObershelp) 

Instance-level recall:  67.3    (strict) 
Instance-level recall:  69.05   (soft) 
Instance-level recall:  85.05   (Levenshtein) 
Instance-level recall:  78.5    (RatcliffObershelp) 

Evaluation metrics produced in 12.073 seconds