Skip to content

Benchmaking elife

Header metadata

Evaluation on 984 random PDF files out of 982 PDF (ratio 1.0).

Strict Matching (exact matches)

Field-level results

label precision recall f1 support
authors 78.27 77.31 77.79 983
first_author 94.13 93.08 93.6 982
title 89.29 86.38 87.81 984
all fields (micro avg.) 87.21 85.59 86.39 2949
all fields (macro avg.) 87.23 85.59 86.4 2949

Soft Matching (ignoring punctuation, case and space characters mismatches)

Field-level results

label precision recall f1 support
authors 78.68 77.72 78.2 983
first_author 94.13 93.08 93.6 982
title 96.32 93.19 94.73 984
all fields (micro avg.) 89.67 88 88.82 2949
all fields (macro avg.) 89.71 88 88.84 2949

Levenshtein Matching (Minimum Levenshtein distance at 0.8)

Field-level results

label precision recall f1 support
authors 90.53 89.42 89.97 983
first_author 94.44 93.38 93.91 982
title 97.79 94.61 96.18 984
all fields (micro avg.) 94.23 92.47 93.34 2949
all fields (macro avg.) 94.25 92.47 93.35 2949

Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95)

Field-level results

label precision recall f1 support
authors 83.63 82.6 83.11 983
first_author 94.13 93.08 93.6 982
title 97.79 94.61 96.18 984
all fields (micro avg.) 91.81 90.1 90.95 2949
all fields (macro avg.) 91.85 90.1 90.96 2949

Instance-level results

Total expected instances:   984
Total correct instances:    685 (strict) 
Total correct instances:    735 (soft) 
Total correct instances:    843 (Levenshtein) 
Total correct instances:    787 (ObservedRatcliffObershelp) 

Instance-level recall:  69.61   (strict) 
Instance-level recall:  74.7    (soft) 
Instance-level recall:  85.67   (Levenshtein) 
Instance-level recall:  79.98   (RatcliffObershelp) 

Citation metadata

Evaluation on 984 random PDF files out of 982 PDF (ratio 1.0).

Strict Matching (exact matches)

Field-level results

label precision recall f1 support
authors 79.65 78 78.81 63265
date 95.89 93.36 94.61 63662
first_author 94.78 92.78 93.77 63265
inTitle 95.45 93.77 94.6 63213
issue 1.54 81.25 3.02 16
page 95.75 94.37 95.05 53375
title 90.25 90.09 90.17 62044
volume 97.77 97.76 97.76 61049
all fields (micro avg.) 92.54 91.35 91.94 429889
all fields (macro avg.) 81.39 90.17 80.98 429889

Soft Matching (ignoring punctuation, case and space characters mismatches)

Field-level results

label precision recall f1 support
authors 79.79 78.13 78.95 63265
date 95.89 93.36 94.61 63662
first_author 94.87 92.86 93.85 63265
inTitle 95.92 94.24 95.07 63213
issue 1.54 81.25 3.02 16
page 95.75 94.37 95.05 53375
title 95.89 95.72 95.81 62044
volume 97.77 97.76 97.76 61049
all fields (micro avg.) 93.46 92.26 92.86 429889
all fields (macro avg.) 82.18 90.96 81.77 429889

Levenshtein Matching (Minimum Levenshtein distance at 0.8)

Field-level results

label precision recall f1 support
authors 93.41 91.47 92.43 63265
date 95.89 93.36 94.61 63662
first_author 95.31 93.29 94.29 63265
inTitle 96.53 94.84 95.68 63213
issue 1.54 81.25 3.02 16
page 95.75 94.37 95.05 53375
title 97.67 97.5 97.58 62044
volume 97.77 97.76 97.76 61049
all fields (micro avg.) 95.86 94.64 95.25 429889
all fields (macro avg.) 84.24 92.98 83.8 429889

Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95)

Field-level results

label precision recall f1 support
authors 86.95 85.14 86.03 63265
date 95.89 93.36 94.61 63662
first_author 94.8 92.79 93.78 63265
inTitle 95.93 94.25 95.09 63213
issue 1.54 81.25 3.02 16
page 95.75 94.37 95.05 53375
title 97.51 97.34 97.43 62044
volume 97.77 97.76 97.76 61049
all fields (micro avg.) 94.74 93.52 94.12 429889
all fields (macro avg.) 83.27 92.03 82.85 429889

Instance-level results

Total expected instances:       63664
Total extracted instances:      65174
Total correct instances:        41600 (strict) 
Total correct instances:        44379 (soft) 
Total correct instances:        52020 (Levenshtein) 
Total correct instances:        48560 (RatcliffObershelp) 

Instance-level precision:   63.83 (strict) 
Instance-level precision:   68.09 (soft) 
Instance-level precision:   79.82 (Levenshtein) 
Instance-level precision:   74.51 (RatcliffObershelp) 

Instance-level recall:  65.34   (strict) 
Instance-level recall:  69.71   (soft) 
Instance-level recall:  81.71   (Levenshtein) 
Instance-level recall:  76.28   (RatcliffObershelp) 

Instance-level f-score: 64.58 (strict) 
Instance-level f-score: 68.89 (soft) 
Instance-level f-score: 80.75 (Levenshtein) 
Instance-level f-score: 75.38 (RatcliffObershelp) 

Matching 1 :    58266

Matching 2 :    955

Matching 3 :    1234

Matching 4 :    384

Total matches : 60839

Citation context resolution

Total expected references:   63664 - 64.7 references per article
Total predicted references:      65174 - 66.23 references per article

Total expected citation contexts:    109022 - 110.79 citation contexts per article
Total predicted citation contexts:   93048 - 94.56 citation contexts per article

Total correct predicted citation contexts:   89788 - 91.25 citation contexts per article
Total wrong predicted citation contexts:     3260 (wrong callout matching, callout missing in NLM, or matching with a bib. ref. not aligned with a bib.ref. in NLM)

Precision citation contexts:     96.5
Recall citation contexts:    82.36
fscore citation contexts:    88.87

Evaluation metrics produced in 1098.998 seconds