‘The Next Frontier of Open Data’

‘The Next Frontier of Open Data’

Photo by Sam Hodgson

Eric Busboom is director of the San Diego Regional Data Library.

Eric Busboom may not be the kind of guy who comes to mind when you think of a librarian. But maybe he’s what a librarian should be as more of us rely on smartphones and computers to make sense of the world.

“Librarians have a stuffy image,” Busboom told me. “They’re amazingly forward-thinking. They’ve got an incredible service-oriented outlook. They aren’t beholden to books. They tend to think of their jobs as connecting the public to information, whatever that information is.”

As CEO of Clarinova Inc., Busboom has been compiling government data and putting it into a form that customers can easily use for nearly six years.

But in January, Busboom founded the San Diego Regional Data Library, to “make our social programs, civic groups, and government organizations more effective; our citizens better informed; and our policy makers able to make better decisions,” according to the library’s website.

Translation: The library is collecting information on crime, traffic patterns, streetlights, alcohol permits and other quality-of-life issues to help San Diegans learn more about one another and the communities they live in.

But Busboom doesn’t want to throw a “pile of files” online. He wants to help San Diegans understand what’s in those files. He wants to teach us how to make sense of that information with computer programs that are free for anyone to use.

You’ve been in the open data game for a while. How long exactly?

The library opened up operations in January. It’s been about nine months. But the previous year was spent in developing the concept. I spent a lot of time interviewing people. I probably talked to about 90 interview subjects so far across the country, so we’ve got a pretty good handle on how data works in a lot of other cities.

Looking into the future, what do you see for the San Diego Data Library?

Well, San Diego’s an interesting market for this. Our organization is what’s called a data intermediary, and the West Coast has very few. There’s a couple of data projects in Oregon and Seattle. But as far as this kind of an organization that really serves to be a conduit for data, the only group that exists on the West Coast is in Oakland.

But if you go to the East Coast, from D.C. up to Boston there’s dozens and dozens of them. And there’s a bunch around Dallas and Austin and a lot of other cities.

So we knew when we started the organization — I and the board members and the staff that I had working on it — that it was going to be a little different here. There’s a reason why the West Coast doesn’t have these things. And there’s something structurally different. So that’s been a real challenge.

Basically: Where do you find the money? It’s always difficult to fund a pure data project. It’s easier to fund something when you’re trying to solve a problem.

Whatever we come up with in San Diego will be unique to San Diego.

But so far that seems to be coming together. We figured out what the market is. There’s stuff that’s coming together, and the region is really opening up to how data-sharing works, and we’re a key part of that community.

When I talked to the guy from the independent budget analyst’s office [Deputy Director Jeff Kawar], he said that one of the things he was looking into is whether or not [open data] would be beneficial to everyone. He was sort of suggesting that perhaps this was being driven by commercial interests. How do you respond to that?

The commercial interests are one of the really important outputs of this. Not for people to make money. Whether somebody’s making a living off of it or not is not my particular interest. But I think if you look at what the federal government has been doing over the last 50 years, there’s an enormous commercial value to Census data in terms of marketing.

If you’re a business owner and you need that information, getting it from a company that can provide it to you is vastly cheaper and simpler than trying to extract it from the government.

The same thing is true with your GPS in your car, which operates off of GPS data and road data, which is largely public.

That commercial aspect is actually one of the end games. You know you’re successful when companies can take that data and do something useful with it. And those commercial uses in no way exclude the civic use of data.

And I think that having more commercial use creates an ecosystem where the civic use becomes more useful because you have all this other help in getting data.

I’d much rather buy some of the data sets that we use rather than spending weeks trying to fix it.

In our field, with reporting, [the nonprofit group] Investigative Reporters and Editors, they sell licenses to data sets they’ve cleaned up.

There’s a lot of people who are skeptical of how commercial interests and nonprofits work. And that’s one of the reasons our aim on the library is to make it a nonprofit.

Most data intermediaries around the country have a fee-for-service component and a grant component. And we expect that our fee-for-service is going to be most of our funding and the grant component will be very small. That’s just the way that San Diego and the West Coast work.

So we’re going to have to sell data. We’re going to have to sell something.

But it’s very important for me and, I think, for the viability in the long term to have a guarantee that there’s an entity that’s serving a civic goal, that we’ve defined what that civic and social interest for data is.

We’re going to try to make it as a cheap as possible. We’ll give everything away for free if we can.

Compelling arguments have been made for keeping certain kinds of data private. Is there any data you can think of in San Diego County that probably should remain private for compelling reasons?

Oh, there’s lots of it. And that’s true in all of the public data, too. What you get from the Census data is all based on questionnaires and interviews. And the Census has your name and your address on all the questions you answered.

Generally, they release all that data disassembled and reaggregated so you can’t tell who’s who. That level of information, anything that’s personally identifiable, won’t make it out into the public.

What do you think is the most important data or data set that’s not open right now but ought to be?

There’s not really a lot that’s in the city or in the county that has broad-scale important value. With SANGIS [a data library managed by the city and county and hosted by the San Diego Association of Governments], we’ve actually got most of that. You can have complaints about the quality or how often it’s updated. But for the most part, the will to release that exists.

The biggest source of data that we don’t have that’s really socially valuable is in nonprofits. It’s social service data, information about homelessness and mental health. What’s going on at hospitals? What diseases do we have? It’s economic activity. There’s a lot of little things like that that’s really social valuable but it’s not coming out of the city.

And that’s the kind of stuff that I think is really the next frontier of open data.

Clarification: This post has been updated to better reflect SANDAG’s role in SANGIS.

Voice of San Diego is a nonprofit that depends on you, our readers. Please donate to keep the service strong. Click here to find out more about our supporters and how we operate independently.


Joel Hoffmann

Joel Hoffmann

Joel Hoffmann is an investigative reporter for Voice of San Diego, focusing on county government, the San Diego Unified School District and the Unified Port of San Diego. You can reach him directly at joel.hoffmann@voiceofsandiego.org.

  • 38 Posts
  • 0
    Followers

Show comments
Before you comment, read these simple guidelines on what is not allowed.

13 comments
Derek Hofmann
Derek Hofmann subscribermember

I can answer #3: yes, the data is often freely available, but you still have to consolidate and process and format it before it's useful. In the same way, Linux is free, but people still pay for Windows because it requires less work to do what you want.

Alison Greenlee
Alison Greenlee

So being a newbie to the whole open data scene maybe you can answer some questions for me. 1. What is so "structurally different" about the West Coast? 2. How would open data NOT be beneficial to everyone unless you had something to hide? 3. Why would we pay for something that was free in the first place?

Jim Jones
Jim Jones

“They’re amazingly forward-thinking." That's worth a good chuckle, it would be hard to find a group more stuck in the past and slow to innovate than the library industry. That their newest monument to bloated and unneeded government resembles a Faberge egg made for a king from the over a century ago is fitting.

Jim Jones
Jim Jones subscriber

“They’re amazingly forward-thinking." That's worth a good chuckle, it would be hard to find a group more stuck in the past and slow to innovate than the library industry. That their newest monument to bloated and unneeded government resembles a Faberge egg made for a king from the over a century ago is fitting.

David Hall
David Hall

"So we’re going to have to sell data. We’re going to have to sell something." Bingo. Making money off of public data. That's his entire business model, right there.

David Hall
David Hall subscriber

"So we’re going to have to sell data. We’re going to have to sell something." Bingo. Making money off of public data. That's his entire business model, right there.

Jim Jones
Jim Jones subscriber

The thing is, if you have the tools and the raw data it is pretty easy to structure it. If they make raw open data free other people will structure it for free.

Eric Busboom
Eric Busboom

Thanks for the questions, Alison. For (1) I think the difference is that the West Cost doesn't have a 100 year old tradition of large scale civic philanthropy. Around 1900, Carnegie, Vanderbuilt, Eastman and many others were engaged in large-scale civic projects, like Carnegie building 2500 libraries around the country. ( San Diego had one, but tore it down in 1952. ) Many of their foundations still exists today and have several billion dollars under management. Our philanthropists are working internationally ( Gates , Hewitt ) engaged in higher education and the arts ( Getty, Jacobs ) or have a really low profile ( Ted Waits ). Locally, we have a few relatively small foundations, like Parker, Price or Jacobs ( Joseph, not Irwin ). Generally, these foundations are too small to get involved in general data projects. For (2), consider a list of elderly people who had victims of financial fraud, or a list of homes that have permits to install alarm systems. The data may be public, but publicizing them would be irresponsible. Derek's dead on for (3). Consider the company Geolytics, that sells 2010 Census data for $1000. The data is free to download, but enough people want to reduce the effort of processing it to be willing to pay $1000.

Derek Hofmann
Derek Hofmann

I can answer #3: yes, the data is often freely available, but you still have to consolidate and process and format it before it's useful. In the same way, Linux is free, but people still pay for Windows because it requires less work to do what you want.

Eric Busboom
Eric Busboom

David, a lot of people would much rather pay $50 for a data set than spend 10 hours cleaning it up to use it. Unless you like working for less than minimum wage, I suspect you would too. With the typical fully-loaded, rate for a technical worker at about $70/hr, buying data rather than devoting the time to prepare it makes a lot of financial sense, and for the Library, having a revenue stream means that we can provide other data to the community for free. No revenue means no Library. And, public data is public data, and having someone sell that doesn't change anything. If you don't mind working for $5/hr, you can download it yourself. I'll even send you the URLs, for free.

Seth Hall
Seth Hall memberauthor

Is that bad, David?

Jim Jones
Jim Jones

The thing is, if you have the tools and the raw data it is pretty easy to structure it. If they make raw open data free other people will structure it for free.