Skip to content

Benchmaking pmc

Header metadata

Evaluation on 1943 random PDF files out of 1941 PDF (ratio 1.0).

Strict Matching (exact matches)

Field-level results

label precision recall f1 support
authors 92.93 92.79 92.86 1941
first_author 96.54 96.39 96.47 1941
title 84.32 83.89 84.11 1943
all fields (micro avg.) 91.27 91.02 91.15 5825
all fields (macro avg.) 91.27 91.02 91.14 5825

Soft Matching (ignoring punctuation, case and space characters mismatches)

Field-level results

label precision recall f1 support
authors 94.89 94.74 94.82 1941
first_author 96.96 96.81 96.88 1941
title 92.03 91.56 91.8 1943
all fields (micro avg.) 94.63 94.37 94.5 5825
all fields (macro avg.) 94.63 94.37 94.5 5825

Levenshtein Matching (Minimum Levenshtein distance at 0.8)

Field-level results

label precision recall f1 support
authors 96.75 96.6 96.67 1941
first_author 97.21 97.06 97.14 1941
title 98.24 97.74 97.99 1943
all fields (micro avg.) 97.4 97.13 97.27 5825
all fields (macro avg.) 97.4 97.13 97.27 5825

Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95)

Field-level results

label precision recall f1 support
authors 95.82 95.67 95.75 1941
first_author 96.54 96.39 96.47 1941
title 96.22 95.73 95.98 1943
all fields (micro avg.) 96.2 95.93 96.06 5825
all fields (macro avg.) 96.2 95.93 96.06 5825

Instance-level results

Total expected instances:   1943
Total correct instances:    1528 (strict) 
Total correct instances:    1696 (soft) 
Total correct instances:    1839 (Levenshtein) 
Total correct instances:    1784 (ObservedRatcliffObershelp) 

Instance-level recall:  78.64   (strict) 
Instance-level recall:  87.29   (soft) 
Instance-level recall:  94.65   (Levenshtein) 
Instance-level recall:  91.82   (RatcliffObershelp) 

Citation metadata

Evaluation on 1943 random PDF files out of 1941 PDF (ratio 1.0).

Strict Matching (exact matches)

Field-level results

label precision recall f1 support
authors 82.01 75.3 78.51 85778
date 93.51 83.05 87.97 87067
first_author 88.46 81.19 84.67 85778
inTitle 71.85 70.71 71.27 81007
issue 85.83 85.46 85.64 16635
page 93.34 83.24 88 80501
title 78.48 74.5 76.44 80736
volume 94.92 88.5 91.6 80067
all fields (micro avg.) 85.9 79.67 82.67 597569
all fields (macro avg.) 86.05 80.24 83.01 597569

Soft Matching (ignoring punctuation, case and space characters mismatches)

Field-level results

label precision recall f1 support
authors 82.48 75.73 78.96 85778
date 93.51 83.05 87.97 87067
first_author 88.63 81.34 84.83 85778
inTitle 83.25 81.93 82.59 81007
issue 85.83 85.46 85.64 16635
page 93.34 83.24 88 80501
title 89.99 85.44 87.66 80736
volume 94.92 88.5 91.6 80067
all fields (micro avg.) 89.23 82.75 85.87 597569
all fields (macro avg.) 88.99 83.09 85.91 597569

Levenshtein Matching (Minimum Levenshtein distance at 0.8)

Field-level results

label precision recall f1 support
authors 88.13 80.92 84.37 85778
date 93.51 83.05 87.97 87067
first_author 88.82 81.52 85.02 85778
inTitle 84.56 83.21 83.88 81007
issue 85.83 85.46 85.64 16635
page 93.34 83.24 88 80501
title 92.21 87.54 89.82 80736
volume 94.92 88.5 91.6 80067
all fields (micro avg.) 90.55 83.98 87.14 597569
all fields (macro avg.) 90.17 84.18 87.04 597569

Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95)

Field-level results

label precision recall f1 support
authors 84.94 77.99 81.32 85778
date 93.51 83.05 87.97 87067
first_author 88.48 81.2 84.69 85778
inTitle 81.86 80.56 81.2 81007
issue 85.83 85.46 85.64 16635
page 93.34 83.24 88 80501
title 91.85 87.2 89.46 80736
volume 94.92 88.5 91.6 80067
all fields (micro avg.) 89.61 83.11 86.24 597569
all fields (macro avg.) 89.34 83.4 86.24 597569

Instance-level results

Total expected instances:       90125
Total extracted instances:      86315
Total correct instances:        38619 (strict) 
Total correct instances:        50592 (soft) 
Total correct instances:        55410 (Levenshtein) 
Total correct instances:        51988 (RatcliffObershelp) 

Instance-level precision:   44.74 (strict) 
Instance-level precision:   58.61 (soft) 
Instance-level precision:   64.2 (Levenshtein) 
Instance-level precision:   60.23 (RatcliffObershelp) 

Instance-level recall:  42.85   (strict) 
Instance-level recall:  56.14   (soft) 
Instance-level recall:  61.48   (Levenshtein) 
Instance-level recall:  57.68   (RatcliffObershelp) 

Instance-level f-score: 43.78 (strict) 
Instance-level f-score: 57.35 (soft) 
Instance-level f-score: 62.81 (Levenshtein) 
Instance-level f-score: 58.93 (RatcliffObershelp) 

Matching 1 :    67552

Matching 2 :    3953

Matching 3 :    1787

Matching 4 :    660

Total matches : 73952

Citation context resolution

Total expected references:   90125 - 46.38 references per article
Total predicted references:      86315 - 44.42 references per article

Total expected citation contexts:    139835 - 71.97 citation contexts per article
Total predicted citation contexts:   111653 - 57.46 citation contexts per article

Total correct predicted citation contexts:   94282 - 48.52 citation contexts per article
Total wrong predicted citation contexts:     17371 (wrong callout matching, callout missing in NLM, or matching with a bib. ref. not aligned with a bib.ref. in NLM)

Precision citation contexts:     84.44
Recall citation contexts:    67.42
fscore citation contexts:    74.98

Evaluation metrics produced in 1145.951 seconds