Data validation
Every number on CivAccount must trace to a .gov.uk, ONS or open-government document. In a few clicks you should be able to open the document and find the number yourself.
The rule
Zero hallucination. Zero estimation. Zero algorithms to guess or fill gaps. If a figure cannot be traced to a specific row, cell or page of a public government document, it does not render.
Where a value is currently live on the site but still working towards that bar, you'll see a “Data validation in progress” notice next to it. The notice points at the source we do have, so you can verify the direction while we wire the row-level citation.
How verification works
1. Scrape from the canonical source
Each data category has a canonical publisher — MHCLG for council tax and revenue outturn, ONS for population, DEFRA for waste, DfT for roads, Ofsted for children's services, LGBCE for councillors, each council's own transparency pages for leadership and local spend. We scrape directly from those publishers.
2. Archive the raw file
Every source file is preserved with its fetch date, so citations don't rot when a government site reorganises. The archive is public (see below).
3. Extract with the method recorded
For CSVs: cell lookup by ONS code and column name. For PDFs: text extraction with page number recorded. For council pages: HTML scraping with the selector preserved. The extraction method is part of the citation.
4. Sample-verify by human
Automated extraction is reviewed on a rolling sample. Where a value was extracted by OCR or an LLM tool, a human re-reads the source and records the verification. We don't publish values that haven't been confirmed against the source.
5. Render with a cited source link
Every number carries a tappable source popover. The link opens the document at the location the figure came from. If the source URL goes 404, we fall back to the archived copy.
Current validation status
Snapshot as of publication. A live report is generated by scripts/validate/audit-provenance.mjs.
Verified sources, row-level citations in progress
Council tax (Band D 2021–2026, historical series), revenue-expenditure service budgets, population, DEFRA waste destinations and recycling rate, DfT road condition and length, Ofsted children’s services rating, LGBCE councillor counts, MHCLG housing supply, council reserves, capital programme. All sourced from published national CSVs. Row-level citation UI is the next step.
Data validation in progress
Supplier totals (currently from Contracts Finder — a .gov.uk register of contract ceilings, not payments). Grant entries for councils without a raw source file in the repo. Staff FTE — re-tracing the build path against ONS Public Sector Personnel. CEO salary, cabinet, councillor allowances, salary bands, MTFS figures — PDF-scraped, row-level citations and re-verification queued.
Quiet affirmation (raw source on file)
Grants for Barnet, Birmingham, Cambridgeshire, Camden, Epping Forest, Essex, South Oxfordshire, Trafford, Vale of White Horse — each sourced from a 360Giving CSV/XLSX or spending CSV preserved in the repo.
Removed
The red/amber/green RAG colour on performance indicators used CivAccount-invented thresholds with no statutory basis. We’ve removed the colour; the metric, value, period and any published target stay.
How to spot-check a number
- Tap any figure on the site — a popover shows the source title, data year, and an “Open source document” link.
- Click the link. The document opens directly in a new tab.
- For a CSV source, look up the row by ONS code or council name. For a PDF, use the page reference shown.
- If the document has moved, the popover falls back to the archived copy from when we last verified.
- If the number doesn't match what the source says, tap “Report incorrect data” on the popover — every report is triaged and the triage decision is recorded with the original citation.
Source archive
Raw scraped files are preserved alongside their citations. The archive acknowledges automation: extraction is scripted, verification is sampled, and discrepancies against the live source are welcomed.
Public archive repository
The civaccount-source-archive repo (scaffold in progress) will mirror the raw files that back every citation, with fetch dates and source URLs. Until it's public, raw files live alongside the private dataset — cite-links on the site open the live source directly so you don't need the archive to verify.
Report a discrepancy
Every popover includes a “Report incorrect data” link that pre-fills a form with the council, field, current value, and source we cited. Reports go into a triage queue. We respond by either correcting the rendered value, updating the citation, or (if the source itself is wrong) publishing the discrepancy alongside the figure.