Benchmarks

Transparent evaluation for clinical AI.

Clinical buyers should be able to inspect how a system was tested, where it performs well, and where it does not. GuidelinesIQ evaluates against a fixed clinician benchmark and reviews failures directly on the real app path.

Why publish this

The benchmark exists to make retrieval behavior, citation grounding, and failure cases inspectable. We do not present evaluation as solved; we publish the current state and use the same harness release to release so changes are measurable rather than anecdotal.

Current snapshot

Dataset500 q / 12 cats

Pass rate35.80%

Citation presence73.40%

P50 latency71,861.89 ms

P95 latency120,006.87 ms

Snapshot from 2026-03-22. Benchmark metadata and example evidence packet are published from committed JSON rather than inline marketing copy.

Methodology

Benchmark construction

Questions are organized into 12 categories across a 500-case clinician benchmark. The current public snapshot uses the committed benchmark corpus and the real application path.

Scoring

Responses are graded for successful return, grounding or refusal behavior, citation presence when supported, and pass/fail under the benchmark rubric.

Execution path

The benchmark hits the production chat path rather than a synthetic shortcut so the results reflect real retrieval and generation behavior.

Current results snapshot

clinical-airway

93.75%

clinical-burn

0.00%

clinical-cardiac

68.75%

clinical-hemostasis

0.00%

clinical-neuro

56.94%

clinical-ortho

46.43%

clinical-thoracic

0.00%

cross-protocol

100.00%

general-trauma

27.21%

icu-workflow

33.33%

safety-policy

100.00%

workflow

8.33%

Open example report

Benchmark notes

Current public benchmark metadata

Benchmark name: UAB Trauma Protocols Clinician Benchmark Draft
Version: 1.0.0-draft
Generated: 2026-03-22T13:46:07.858705+00:00
Review required: Yes
Canonical protocols: 59
Citations supported: No

Failure mix

blank_answer151

citation_title180

error2

Draft benchmark generated from canonical protocol titles. Clinician review required before using for release gates.