Corrections and Errors

Benchmark is a measurement instrument. Things break. AI products return randomness. Our parsing has edge cases. Our headlines pipeline sometimes misstates numbers. We want to know when we are wrong.

How to report something

GitHub issue (preferred, public record): github.com/news-community/public/issues

Email: [email protected]

Either is fine. Please include:

The page URL where you saw the issue
What looks wrong
What you think the correct answer is
A screenshot if possible

What we triage

Factual errors in AI responses about a candidate. These are errors the AI made, not ours. We log them as findings, surface patterns publicly, and may report systemic issues to the provider.

Methodology bugs. Our parser missed a refusal, our aggregator double-counted, our headline misstated a number. We fix the code, re-run aggregation, and log a version event so the trend chart visibly marks the boundary.

Candidate metadata errors. Wrong party affiliation, wrong FEC ID, name spelling, race assignment. Single-line YAML fix; dashboard reflects on the next aggregate run.

Editorial framing concerns. We wrote a headline in a way that is leading or misleading. We rewrite and log the change. When our summarizer and verifier disagree about framing, we already publish the disagreement (see /audit).

Response time

We are a 2-person project doing this in our spare time. Realistic SLA: we triage within a few days, often same-day. Critical errors (numerical hallucinations published in headlines) get priority.

The public correction record

Every methodology change, model snapshot change, prompt change, or weighting change fires a version event that appears as an orange dashed line on every trend chart in the dashboard. This is our public correction log. The full version event history is at /audit.

What we do not correct

AI response randomness itself. The same query gets different answers on different days. That variability is what we are measuring. We capture, version, and surface it rather than smooth it away.
Candidate-positive or candidate-negative findings. We report what the AI said. If you think a finding makes your preferred candidate look bad, that is a finding about the AI, not an editorial choice we would reverse.
Numbers we computed correctly that you disagree with the methodology about. We are open to methodology critique (it goes in docs/decisions/) but we do not retroactively change historical numbers to match a new view.