Skip to content

Benchmaking elife

Header metadata

Evaluation on 984 random PDF files out of 982 PDF (ratio 1.0).

Strict Matching (exact matches)

Field-level results

label precision recall f1 support
authors 77.47 76.6 77.03 983
first_author 93.72 92.77 93.24 982
title 88.8 86.18 87.47 984
all fields (micro avg.) 86.65 85.18 85.91 2949
all fields (macro avg.) 86.66 85.18 85.92 2949

Soft Matching (ignoring punctuation, case and space characters mismatches)

Field-level results

label precision recall f1 support
authors 77.88 77.01 77.44 983
first_author 93.72 92.77 93.24 982
title 95.81 92.99 94.38 984
all fields (micro avg.) 89.1 87.59 88.34 2949
all fields (macro avg.) 89.14 87.59 88.36 2949

Levenshtein Matching (Minimum Levenshtein distance at 0.8)

Field-level results

label precision recall f1 support
authors 90.02 89.01 89.51 983
first_author 94.03 93.08 93.55 982
title 97.38 94.51 95.93 984
all fields (micro avg.) 93.79 92.2 92.99 2949
all fields (macro avg.) 93.81 92.2 93 2949

Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95)

Field-level results

label precision recall f1 support
authors 83.33 82.4 82.86 983
first_author 93.72 92.77 93.24 982
title 97.28 94.41 95.82 984
all fields (micro avg.) 91.41 89.86 90.63 2949
all fields (macro avg.) 91.45 89.86 90.64 2949

Instance-level results

Total expected instances:   984
Total correct instances:    677 (strict) 
Total correct instances:    727 (soft) 
Total correct instances:    836 (Levenshtein) 
Total correct instances:    783 (ObservedRatcliffObershelp) 

Instance-level recall:  68.8    (strict) 
Instance-level recall:  73.88   (soft) 
Instance-level recall:  84.96   (Levenshtein) 
Instance-level recall:  79.57   (RatcliffObershelp) 

Evaluation metrics produced in 10.247 seconds