Example Report
Public evidence packet built from a committed benchmark snapshot.
This example report is sourced from a real 500-case benchmark artifact committed in the repository. It is intentionally limited to data the current product records directly: benchmark metadata, aggregate outcomes, latency, failure mix, and category-level performance.
Report header
Snapshot date
2026-03-22
Run ID
20260322T031315Z
Run status
completed
Benchmark version
1.0.0-draft
Outcome metrics
Cases completed
500 / 500
Pass rate
35.80%
Success rate
69.80%
Citation presence
73.40%
Blank answers
151
Errors
2
P50 latency
71,861.89 ms
P95 latency
120,006.87 ms
What this public packet includes
Benchmark provenance
Benchmark: UAB Trauma Protocols Clinician Benchmark Draft. 500 questions across 12 categories, with clinician review still required before release-gate use.
Failure transparency
The packet exposes failure reason counts directly from the runner artifact rather than rewriting them into generic marketing language.
Current limitation
Indexing-quality diagnostics and retrieval-probe sections are not yet published because those report layers are not implemented in the current runtime pipeline.
Category performance
Failure mix
Failure reason counts
Interpretation notes
This packet is intentionally conservative. It publishes only values present in the benchmark artifact and does not backfill unavailable fields with synthetic retrieval or report-quality scores.
Citation support is currently reported as not enabled in the runner output, so the public packet does not claim citation-correctness metrics beyond presence and category-level pass rate.
Draft benchmark generated from canonical protocol titles. Clinician review required before using for release gates.
