Data Break: Complaints to San Diego Unified’s Ethics Hotline

Data Break: Complaints to San Diego Unified’s Ethics Hotline

Image via Shutterstock

With our open government narrative, we’ve been pointing out flaws in the ways our public records laws are put into practice, and discussing how an open data policy would allow San Diegans to extract useful information from public records.

Open government reforms could take a while to crystallize here, but we want to provide a glimpse into the future by unlocking data from documents that aren’t easy to analyze in their current form.

Let’s start with a report on ethics hotline complaints that the San Diego Unified school board’s audit and finance committee released last month in a form that is far from open.

The hotline, which has been live since 2006, is the place where school district employees and the public can confidentially report allegations of wasteful spending, theft of district funds and misconduct.

Problem 1: The document is not searchable

Public records can be lengthy and dense. And sometimes all you want to see is one specific thing you’re interested in. In this case, one of those interesting things is fraud.

If San Diego Unified had made this document searchable before posting it online, I would be able to use a simple keyboard command to find the word fraud.

But I can’t do that with this document in its current form.

Problem 2: It’s still not data

There are ways to make documents searchable after they’ve been published, but the process can be time-consuming and difficult.

In this case, the fix was easy enough. I told my document reader to “recognize” the text.  But I couldn’t easily copy the information I wanted into a spreadsheet program that would enable me to see how different chunks of information relate to one another.

The hotline complaint categories were ordered alphabetically. But maybe I want to arrange the number of complaints in each category from largest to smallest so I can see what the biggest problem areas are. Or maybe I want to view the complaints as proportionate slices of a pie.

There are programs that will do that for me, but again, the costs and the amount of time it takes to make that happen varies. And that’s why open data advocates are calling on public agencies to save you the headache by releasing information in a common, easy-to-use form.

Problem 3: It’s not clear what the categories mean

Nothing in the tables that San Diego Unified released tells us precisely what the categories mean. We can easily guess what they mean for designations like “theft of time.” But what about “policy issues” and “other”? And what kinds of fraud are they talking about?

Since they’re referring to abstract concepts here, a glossary of terms would help. To their credit, the district’s auditors give a few examples of fraud on their website, but the hotline report doesn’t direct the reader there.

Solution: Sometimes you have to break the data open manually

There wasn’t a whole lot of information in the ethics hotline report that San Diego Unified released, so I put the statistics into a spreadsheet program, and then used a free program to make some interactive graphics.

Here’s what I did with the 32 cases that San Diego Unified School District’s Office of Audits and Investigations completed from July 2012 to June 2013:

 

And here’s what I did with the 123 cases that remained open as of Sept. 4, 2013:

 

We don’t have a complete picture of what’s happening with San Diego Unified’s ethics hotline here, but we’re further along than we would have been otherwise.

Voice of San Diego is a nonprofit that depends on you, our readers. Please donate to keep the service strong. Click here to find out more about our supporters and how we operate independently.


Joel Hoffmann

Joel Hoffmann

Joel Hoffmann is an investigative reporter for Voice of San Diego, focusing on county government, the San Diego Unified School District and the Unified Port of San Diego. You can reach him directly at joel.hoffmann@voiceofsandiego.org.

  • 38 Posts
  • 21
    Followers

Show comments
Before you comment, read these simple guidelines on what is not allowed.

3 comments
ScrippsDad
ScrippsDad

As more and more data is gathered (this is one example where the database will just continue to grow assuming you never stop from "suggesting/reporting") the difficulty in creating search, organization, and other report parameters to properly try and analyze and evaluate the data gets more and more difficult and complex. The good news is that "bid data" system developers are recognizing this and developing ways to organize and search to create meaningful reports for analysis of this type of information and data. The bad news is, how do you take these typically large enterprise tools and cost effectively apply them to smaller institutions such as school districts, etc... I believe the real challenge is to have somebody on staff or under contract who understands the possible solutions and ways that this can be done and when done properly, it can be done affordably and provide quality information. Just think about all the various information, not just this example, that can be entered into, searched, tracked analyzed in a big data system. Just take and follow one student from Kindergarten through High School and all the various parameters that could be attached to that student from personal data to performance data, to qualitative data (eg. IEP info) to quantitative data (ALL test scores, etc...). Using big data systems this information becomes readily available and can be administrated in a way that protects confidentiality but still provides openness of information/data. In these systems, with proper design, there really is no limitation to the amount of information that can be tracked, organized and searched under a single relational database structure. This also eliminates the need for multiple databases, multiple entries, multiple oversight and maintenance, etc... making a truly open system.

ScrippsDad
ScrippsDad subscriber

As more and more data is gathered (this is one example where the database will just continue to grow assuming you never stop from "suggesting/reporting") the difficulty in creating search, organization, and other report parameters to properly try and analyze and evaluate the data gets more and more difficult and complex. The good news is that "bid data" system developers are recognizing this and developing ways to organize and search to create meaningful reports for analysis of this type of information and data. The bad news is, how do you take these typically large enterprise tools and cost effectively apply them to smaller institutions such as school districts, etc... I believe the real challenge is to have somebody on staff or under contract who understands the possible solutions and ways that this can be done and when done properly, it can be done affordably and provide quality information. Just think about all the various information, not just this example, that can be entered into, searched, tracked analyzed in a big data system. Just take and follow one student from Kindergarten through High School and all the various parameters that could be attached to that student from personal data to performance data, to qualitative data (eg. IEP info) to quantitative data (ALL test scores, etc...). Using big data systems this information becomes readily available and can be administrated in a way that protects confidentiality but still provides openness of information/data. In these systems, with proper design, there really is no limitation to the amount of information that can be tracked, organized and searched under a single relational database structure. This also eliminates the need for multiple databases, multiple entries, multiple oversight and maintenance, etc... making a truly open system.

Eric Busboom
Eric Busboom

Joel, thanks for exposing these issues. These sorts of problems plague a lot of data releases, usually from agencies that don't have data release as part of their mission. I've found that most of the time, the problems are the result of having no one suitable who is in charge of data release, not any intent to obfuscate or limit usability of data. In this case, I'd guess that the person who was responsible for creating the report had no responsibility to publish it, and the person who was responsible for publishing it had the paper version and found it easier to scan it to a PDF/Image rather than track down the person who produced the report originally. The fix requires a mandate to publish that forces an institutional change, with data creators and compilers being directly connected to the data publishers. While these sorts of things are really frustrating, I find that the simplest, and often most productive was to understand them is a combination of (a) organizations are really complicated and (b) most people want to get their jobs done so they can eat dinner with their kids.