How to improve your published open spending data
Suggestions on how to publish better open spending data for use by both people and computers.
1. Publish in CSV format
- CSV is a universal data exchange format for tabular data and can be read by all spreadsheet software and programming languages
- Don't use PDF for tabular data - this cannot be accurately processsed by any common software
- Don't use XLS or XLSX files - not everyone has Microsoft Excel and the formats can cause difficulties for other software
2. Publish easily accessible files
- Preferably one click to download the file - not a succession of separate pages before you get to the download link
- Use sensible unique file names which relate to the data eg 'LAname-spend-January-2016' not 'download' for every file!
- Add a version number to the filename if it has to be republished eg 'LAname-spend-January-2016-v1'
3. Publish clean data
- Do at least a minimal level of quality control on the data before publishing it *
- Are the dates in a sensible range? Financial transactions are unlikely to be over 6 years old or several years in the future.
- Dates should be in a consistent and unambiguous format throughout the file
- Dates should not in a 5 digit Excel format - who will understand these?
- Amount fields should only contain numeric amount characters - not £ signs
- Amount values should normally exclude VAT
- Key fields (ie date, amount, supplier name) should not be blank or contain #REF
- Unusual non-ascii characters that cannot easily be translated should be avoided
- Preferably there should be just one header row at the beginning of the file
- Preferably the header row should be reasonably consistent across similar files (and without simple spelling errors)
- Preferably there should not be a total row at the end of the file
4. Publish more data rather than less
- Publishing just the very basic requirements - Amount, Date and Supplier is not very useful
- Directorate/Service, Expenditure Type and Procurement/Merchant category are now mandatory fields
- Adding extra data helps the user and probably involves little extra work to provide it
- Standard codes eg for SeRCOP or ProClass are helpful but purely internal codes are probably not worth publishing
5. Publish redacted records
- Redaction is appropriate and necessary in specific circumstances - see pages 7/8 of LGA Local Transparency Guidance document (pdf).
- Records where data has been redacted should still be published with the sensitive data replaced by text like 'Redacted - personal data'
- The average overall rate of redaction is around 15% - if you have 0% or 50% then you might wonder why
6. Provide licence and other explanatory information
- Always provide the licence details - preferably the Open Government Licence
- Provide additional explanatory notes about the data and any relevant issues eg VAT is included (for some good reason)
7. Make use of published standards and guidance
- The basic DCLG publication requirements are at Local authority data transparency code
- The LGA has detailed guidance for local authorities in its Publishing local spending data guide
- A useful summary is on data.gov.uk - Local spending data guidance
- Guidance for central government is at HM Treasury guidance for publishing spend over £25,000
* AppGov will provide a free quality control service in some circumstances
For more information or to make suggestions please contact firstname.lastname@example.org