Benchmaking plos
Header metadata
Evaluation on 1000 random PDF files out of 998 PDF (ratio 1.0).
Strict Matching (exact matches)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 98.97 | 98.97 | 98.97 | 969 |
| first_author | 99.17 | 99.17 | 99.17 | 969 |
| title | 95.67 | 95.1 | 95.39 | 1000 |
| all fields (micro avg.) | 97.92 | 97.72 | 97.82 | 2938 |
| all fields (macro avg.) | 97.94 | 97.75 | 97.84 | 2938 |
Soft Matching (ignoring punctuation, case and space characters mismatches)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 98.97 | 98.97 | 98.97 | 969 |
| first_author | 99.17 | 99.17 | 99.17 | 969 |
| title | 99.2 | 98.6 | 98.9 | 1000 |
| all fields (micro avg.) | 99.11 | 98.91 | 99.01 | 2938 |
| all fields (macro avg.) | 99.11 | 98.91 | 99.01 | 2938 |
Levenshtein Matching (Minimum Levenshtein distance at 0.8)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 99.38 | 99.38 | 99.38 | 969 |
| first_author | 99.28 | 99.28 | 99.28 | 969 |
| title | 99.5 | 98.9 | 99.2 | 1000 |
| all fields (micro avg.) | 99.39 | 99.18 | 99.28 | 2938 |
| all fields (macro avg.) | 99.39 | 99.19 | 99.29 | 2938 |
Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 99.28 | 99.28 | 99.28 | 969 |
| first_author | 99.17 | 99.17 | 99.17 | 969 |
| title | 99.3 | 98.7 | 99 | 1000 |
| all fields (micro avg.) | 99.25 | 99.05 | 99.15 | 2938 |
| all fields (macro avg.) | 99.25 | 99.05 | 99.15 | 2938 |
Instance-level results
Total expected instances: 1000
Total correct instances: 946 (strict)
Total correct instances: 980 (soft)
Total correct instances: 984 (Levenshtein)
Total correct instances: 982 (ObservedRatcliffObershelp)
Instance-level recall: 94.6 (strict)
Instance-level recall: 98 (soft)
Instance-level recall: 98.4 (Levenshtein)
Instance-level recall: 98.2 (RatcliffObershelp)
Citation metadata
Evaluation on 1000 random PDF files out of 998 PDF (ratio 1.0).
Strict Matching (exact matches)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 80.99 | 78.01 | 79.47 | 44770 |
| date | 84.35 | 80.62 | 82.44 | 45457 |
| first_author | 91.25 | 87.86 | 89.53 | 44770 |
| inTitle | 81.61 | 83.12 | 82.36 | 42795 |
| issue | 93.41 | 92.1 | 92.75 | 18983 |
| page | 93.75 | 77.48 | 84.84 | 40844 |
| title | 59.85 | 60.22 | 60.03 | 43101 |
| volume | 95.68 | 95.59 | 95.64 | 40458 |
| all fields (micro avg.) | 84.08 | 81.03 | 82.52 | 321178 |
| all fields (macro avg.) | 85.11 | 81.87 | 83.38 | 321178 |
Soft Matching (ignoring punctuation, case and space characters mismatches)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 81.31 | 78.32 | 79.79 | 44770 |
| date | 84.35 | 80.62 | 82.44 | 45457 |
| first_author | 91.48 | 88.08 | 89.75 | 44770 |
| inTitle | 85.39 | 86.98 | 86.18 | 42795 |
| issue | 93.41 | 92.1 | 92.75 | 18983 |
| page | 93.75 | 77.48 | 84.84 | 40844 |
| title | 91.71 | 92.27 | 91.99 | 43101 |
| volume | 95.68 | 95.59 | 95.64 | 40458 |
| all fields (micro avg.) | 89.15 | 85.91 | 87.5 | 321178 |
| all fields (macro avg.) | 89.63 | 86.43 | 87.92 | 321178 |
Levenshtein Matching (Minimum Levenshtein distance at 0.8)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 90.42 | 87.09 | 88.73 | 44770 |
| date | 84.35 | 80.62 | 82.44 | 45457 |
| first_author | 92.01 | 88.6 | 90.27 | 44770 |
| inTitle | 86.28 | 87.88 | 87.07 | 42795 |
| issue | 93.41 | 92.1 | 92.75 | 18983 |
| page | 93.75 | 77.48 | 84.84 | 40844 |
| title | 94.23 | 94.81 | 94.52 | 43101 |
| volume | 95.68 | 95.59 | 95.64 | 40458 |
| all fields (micro avg.) | 90.97 | 87.67 | 89.29 | 321178 |
| all fields (macro avg.) | 91.27 | 88.02 | 89.53 | 321178 |
Ratcliff/Obershelp Matching (Minimum Ratcliff/Obershelp similarity at 0.95)
Field-level results
| label | precision | recall | f1 | support |
|---|---|---|---|---|
| authors | 84.75 | 81.63 | 83.16 | 44770 |
| date | 84.35 | 80.62 | 82.44 | 45457 |
| first_author | 91.25 | 87.86 | 89.53 | 44770 |
| inTitle | 85.02 | 86.6 | 85.8 | 42795 |
| issue | 93.41 | 92.1 | 92.75 | 18983 |
| page | 93.75 | 77.48 | 84.84 | 40844 |
| title | 93.61 | 94.18 | 93.89 | 43101 |
| volume | 95.68 | 95.59 | 95.64 | 40458 |
| all fields (micro avg.) | 89.81 | 86.55 | 88.15 | 321178 |
| all fields (macro avg.) | 90.23 | 87.01 | 88.51 | 321178 |
Instance-level results
Total expected instances: 48449
Total extracted instances: 47788
Total correct instances: 13501 (strict)
Total correct instances: 22245 (soft)
Total correct instances: 24850 (Levenshtein)
Total correct instances: 23233 (RatcliffObershelp)
Instance-level precision: 28.25 (strict)
Instance-level precision: 46.55 (soft)
Instance-level precision: 52 (Levenshtein)
Instance-level precision: 48.62 (RatcliffObershelp)
Instance-level recall: 27.87 (strict)
Instance-level recall: 45.91 (soft)
Instance-level recall: 51.29 (Levenshtein)
Instance-level recall: 47.95 (RatcliffObershelp)
Instance-level f-score: 28.06 (strict)
Instance-level f-score: 46.23 (soft)
Instance-level f-score: 51.64 (Levenshtein)
Instance-level f-score: 48.28 (RatcliffObershelp)
Matching 1 : 35094
Matching 2 : 1260
Matching 3 : 3278
Matching 4 : 1849
Total matches : 41481
Citation context resolution
Total expected references: 48449 - 48.45 references per article
Total predicted references: 47788 - 47.79 references per article
Total expected citation contexts: 69755 - 69.75 citation contexts per article
Total predicted citation contexts: 74480 - 74.48 citation contexts per article
Total correct predicted citation contexts: 57425 - 57.42 citation contexts per article
Total wrong predicted citation contexts: 17055 (wrong callout matching, callout missing in NLM, or matching with a bib. ref. not aligned with a bib.ref. in NLM)
Precision citation contexts: 77.1
Recall citation contexts: 82.32
fscore citation contexts: 79.63
Evaluation metrics produced in 678.894 seconds