Benchmarks

Transparent evaluation for clinical AI.

Clinical buyers should be able to inspect how a system was tested, where it performs well, and where it does not. GuidelinesIQ evaluates against a fixed clinician benchmark and reviews failures directly on the real app path.

Why publish this

The benchmark exists to make retrieval behavior, citation grounding, and failure cases inspectable. We do not present evaluation as solved; we publish the current state and use the same harness release to release so changes are measurable rather than anecdotal.

Current snapshot

Dataset500 q / 12 cats
Pass rate35.80%
Citation presence73.40%
P50 latency71,861.89 ms
P95 latency120,006.87 ms

Snapshot from 2026-03-22. Benchmark metadata and example evidence packet are published from committed JSON rather than inline marketing copy.

Methodology

Benchmark construction

Questions are organized into 12 categories across a 500-case clinician benchmark. The current public snapshot uses the committed benchmark corpus and the real application path.

Scoring

Responses are graded for successful return, grounding or refusal behavior, citation presence when supported, and pass/fail under the benchmark rubric.

Execution path

The benchmark hits the production chat path rather than a synthetic shortcut so the results reflect real retrieval and generation behavior.

Current results snapshot

clinical-airway
93.75%
clinical-burn
0.00%
clinical-cardiac
68.75%
clinical-hemostasis
0.00%
clinical-neuro
56.94%
clinical-ortho
46.43%
clinical-thoracic
0.00%
cross-protocol
100.00%
general-trauma
27.21%
icu-workflow
33.33%
safety-policy
100.00%
workflow
8.33%

Benchmark notes

Current public benchmark metadata

Benchmark name
UAB Trauma Protocols Clinician Benchmark Draft
Version
1.0.0-draft
Generated
2026-03-22T13:46:07.858705+00:00
Review required
Yes
Canonical protocols
59
Citations supported
No

Failure mix

blank_answer151
citation_title180
error2

Draft benchmark generated from canonical protocol titles. Clinician review required before using for release gates.