Corrections and Errors
Benchmark is a measurement instrument. Things break. AI products return randomness. Our parsing has edge cases. Our headlines pipeline sometimes misstates numbers. We want to know when we are wrong.
How to report something
GitHub issue (preferred, public record): github.com/news-community/public/issues
Email: [email protected]
Either is fine. Please include:
- The page URL where you saw the issue
- What looks wrong
- What you think the correct answer is
- A screenshot if possible
What we triage
Factual errors in AI responses about a candidate. These are errors the AI made, not ours. We log them as findings, surface patterns publicly, and may report systemic issues to the provider.
Methodology bugs. Our parser missed a refusal, our aggregator double-counted, our headline misstated a number. We fix the code, re-run aggregation, and log a version event so the trend chart visibly marks the boundary.
Candidate metadata errors. Wrong party affiliation, wrong FEC ID, name spelling, race assignment. Single-line YAML fix; dashboard reflects on the next aggregate run.
Editorial framing concerns. We wrote a headline in a way that is leading or misleading. We rewrite and log the change. When our summarizer and verifier disagree about framing, we already publish the disagreement (see /audit).
Response time
We are a 2-person project doing this in our spare time. Realistic SLA: we triage within a few days, often same-day. Critical errors (numerical hallucinations published in headlines) get priority.
The public correction record
Every methodology change, model snapshot change, prompt change, or weighting change fires a version event that appears as an orange dashed line on every trend chart in the dashboard. This is our public correction log. The full version event history is at /audit.
What we do not correct
- AI response randomness itself. The same query gets different answers on different days. That variability is what we are measuring. We capture, version, and surface it rather than smooth it away.
- Candidate-positive or candidate-negative findings. We report what the AI said. If you think a finding makes your preferred candidate look bad, that is a finding about the AI, not an editorial choice we would reverse.
- Numbers we computed correctly that you disagree with the methodology about. We are open to methodology critique (it goes in docs/decisions/) but we do not retroactively change historical numbers to match a new view.