Benchmaking biorxiv
Header metadata
Evaluation on 2000 random PDF files out of 1998 PDF (ratio 1.0).
Strict Matching (exact matches)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 84.76 | 83.99 | 84.37 | 1999 |
| first_author | 96.72 | 95.94 | 96.33 | 1997 |
| title | 77.18 | 76.1 | 76.64 | 2000 |
| all fields (micro avg.) | 86.23 | 85.34 | 85.78 | 5996 |
| all fields (macro avg.) | 86.22 | 85.35 | 85.78 | 5996 |
Soft Matching (ignoring punctuation, case and space characters mismatches)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 85.26 | 84.49 | 84.87 | 1999 |
| first_author | 96.97 | 96.19 | 96.58 | 1997 |
| title | 79.46 | 78.35 | 78.9 | 2000 |
| all fields (micro avg.) | 87.24 | 86.34 | 86.79 | 5996 |
| all fields (macro avg.) | 87.23 | 86.35 | 86.79 | 5996 |
Levenshtein Matching (Minimum Levenshtein distance at 0.8)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 92.53 | 91.7 | 92.11 | 1999 |
| first_author | 97.17 | 96.39 | 96.78 | 1997 |
| title | 91.94 | 90.65 | 91.29 | 2000 |
| all fields (micro avg.) | 93.88 | 92.91 | 93.39 | 5996 |
| all fields (macro avg.) | 93.88 | 92.91 | 93.39 | 5996 |
Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 88.44 | 87.64 | 88.04 | 1999 |
| first_author | 96.72 | 95.94 | 96.33 | 1997 |
| title | 87.68 | 86.45 | 87.06 | 2000 |
| all fields (micro avg.) | 90.95 | 90.01 | 90.48 | 5996 |
| all fields (macro avg.) | 90.95 | 90.01 | 90.48 | 5996 |
Instance-level results
Total expected instances: 2000
Total correct instances: 1351 (strict)
Total correct instances: 1388 (soft)
Total correct instances: 1709 (Levenshtein)
Total correct instances: 1576 (ObservedRatcliffObershelp)
Instance-level recall: 67.55 (strict)
Instance-level recall: 69.4 (soft)
Instance-level recall: 85.45 (Levenshtein)
Instance-level recall: 78.8 (RatcliffObershelp)
Citation metadata
Evaluation on 2000 random PDF files out of 1998 PDF (ratio 1.0).
Strict Matching (exact matches)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 88.24 | 81.99 | 85 | 97183 |
| date | 91.49 | 84.76 | 88 | 97630 |
| doi | 70.91 | 81.39 | 75.79 | 16894 |
| first_author | 95.08 | 88.27 | 91.55 | 97183 |
| inTitle | 82.59 | 77.93 | 80.19 | 96430 |
| issue | 93.91 | 89.84 | 91.83 | 30312 |
| page | 94.76 | 76.82 | 84.85 | 88597 |
| pmcid | 66.01 | 82.78 | 73.45 | 807 |
| pmid | 69.88 | 79.69 | 74.46 | 2093 |
| title | 84.71 | 82.16 | 83.42 | 92463 |
| volume | 95.89 | 93.57 | 94.71 | 87709 |
| all fields (micro avg.) | 89.7 | 83.81 | 86.65 | 707301 |
| all fields (macro avg.) | 84.86 | 83.56 | 83.93 | 707301 |
Soft Matching (ignoring punctuation, case and space characters mismatches)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 89.4 | 83.06 | 86.11 | 97183 |
| date | 91.49 | 84.76 | 88 | 97630 |
| doi | 75.4 | 86.55 | 80.59 | 16894 |
| first_author | 95.51 | 88.67 | 91.96 | 97183 |
| inTitle | 92.07 | 86.88 | 89.4 | 96430 |
| issue | 93.91 | 89.84 | 91.83 | 30312 |
| page | 94.76 | 76.82 | 84.85 | 88597 |
| pmcid | 74.8 | 93.8 | 83.23 | 807 |
| pmid | 73.61 | 83.95 | 78.44 | 2093 |
| title | 92.97 | 90.17 | 91.55 | 92463 |
| volume | 95.89 | 93.57 | 94.71 | 87709 |
| all fields (micro avg.) | 92.5 | 86.42 | 89.36 | 707301 |
| all fields (macro avg.) | 88.16 | 87.1 | 87.33 | 707301 |
Levenshtein Matching (Minimum Levenshtein distance at 0.8)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 94.6 | 87.9 | 91.13 | 97183 |
| date | 91.49 | 84.76 | 88 | 97630 |
| doi | 77.42 | 88.87 | 82.75 | 16894 |
| first_author | 95.66 | 88.8 | 92.1 | 97183 |
| inTitle | 93.11 | 87.86 | 90.41 | 96430 |
| issue | 93.91 | 89.84 | 91.83 | 30312 |
| page | 94.76 | 76.82 | 84.85 | 88597 |
| pmcid | 74.8 | 93.8 | 83.23 | 807 |
| pmid | 73.61 | 83.95 | 78.44 | 2093 |
| title | 95.94 | 93.05 | 94.47 | 92463 |
| volume | 95.89 | 93.57 | 94.71 | 87709 |
| all fields (micro avg.) | 93.84 | 87.67 | 90.65 | 707301 |
| all fields (macro avg.) | 89.2 | 88.11 | 88.36 | 707301 |
Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 91.59 | 85.1 | 88.22 | 97183 |
| date | 91.49 | 84.76 | 88 | 97630 |
| doi | 76.07 | 87.32 | 81.3 | 16894 |
| first_author | 95.13 | 88.31 | 91.59 | 97183 |
| inTitle | 90.79 | 85.67 | 88.15 | 96430 |
| issue | 93.91 | 89.84 | 91.83 | 30312 |
| page | 94.76 | 76.82 | 84.85 | 88597 |
| pmcid | 66.01 | 82.78 | 73.45 | 807 |
| pmid | 69.88 | 79.69 | 74.46 | 2093 |
| title | 95.24 | 92.37 | 93.78 | 92463 |
| volume | 95.89 | 93.57 | 94.71 | 87709 |
| all fields (micro avg.) | 92.87 | 86.77 | 89.72 | 707301 |
| all fields (macro avg.) | 87.34 | 86.02 | 86.4 | 707301 |
Instance-level results
Total expected instances: 98799
Total extracted instances: 96830
Total correct instances: 42789 (strict)
Total correct instances: 53506 (soft)
Total correct instances: 57609 (Levenshtein)
Total correct instances: 54441 (RatcliffObershelp)
Instance-level precision: 44.19 (strict)
Instance-level precision: 55.26 (soft)
Instance-level precision: 59.49 (Levenshtein)
Instance-level precision: 56.22 (RatcliffObershelp)
Instance-level recall: 43.31 (strict)
Instance-level recall: 54.16 (soft)
Instance-level recall: 58.31 (Levenshtein)
Instance-level recall: 55.1 (RatcliffObershelp)
Instance-level f-score: 43.75 (strict)
Instance-level f-score: 54.7 (soft)
Instance-level f-score: 58.9 (Levenshtein)
Instance-level f-score: 55.66 (RatcliffObershelp)
Matching 1 : 77743
Matching 2 : 4503
Matching 3 : 4304
Matching 4 : 2196
Total matches : 88746
Citation context resolution
Total expected references: 98797 - 49.4 references per article
Total predicted references: 96830 - 48.41 references per article
Total expected citation contexts: 142862 - 71.43 citation contexts per article
Total predicted citation contexts: 131079 - 65.54 citation contexts per article
Total correct predicted citation contexts: 111777 - 55.89 citation contexts per article
Total wrong predicted citation contexts: 19302 (wrong callout matching, callout missing in NLM, or matching with a bib. ref. not aligned with a bib.ref. in NLM)
Precision citation contexts: 85.27
Recall citation contexts: 78.24
fscore citation contexts: 81.61
Evaluation metrics produced in 1408.979 seconds