Journalism won’t die if you donate. Support Voice of San Diego today!
This story is part of our reporting series, “Covid Year Two: After the Vaccine.” See the full series here.
I don’t think I ever cried, but I came close.
My colleague Jesse Marx had the brilliant idea to match our Covid-19 death certificates to San Diego County’s voter roll.
“We can do that, right?” he said.
Without thinking twice, with the confidence of a dummy, I said: “Oh yeah, we can definitely do that.”
When blind confidence meets patience amazing things can happen.
It took weeks of trial and error – and learning a new programming language – but we were right. It was possible. We matched thousands of people who died from Covid to their voter record. Those matches, followed by an age adjustment of the data, showed us that Republicans were significantly more likely to die than Democrats in the post-vaccine world.
This was our road map:
It all began with the Covid death records. In 2020, Voice of San Diego sued San Diego County to access Covid-related death certificates – which up until that point were not very accessible to the public, despite being public documents. We won and got access to the documents. But, as movies about genies make abundantly clear, getting your wish is a trap.
The county wouldn’t give us digital copies of the records. Instead, we had to send a team of reporters, freelancers and interns to the county archives in Santee, where we manually logged the information on roughly 6,400 death certificates into a database.
The worst part is that we didn’t really know what stories the database would produce – if any. The database did, however, produce new information that had never been released by county officials. We found out immigrants died at disproportionate percentages, as did farm workers, construction workers and those with relatively little education.
Our year one database had roughly 4,200 rows (each one representing a real person who died.) Year two had roughly 2,200.
The San Diego County voter roll, on the other hand, contains roughly 1.9 million rows of data – one for each voter. A database of that size is too big for Excel. And Excel wouldn’t have done everything we needed it to, anyway.
There is a program that a small number of journalists know how to use. But it is so mysterious and ineffable that it is known simply by one letter: R. R is a programming language that can do many things. One of them is matching thousands of death records to millions of voter records.
The first step wasn’t that hard. I created a column in both files that contained a person’s first name, last name and birthdate. I looked for exact matches, which brought back more than a thousand returns.
Next, Jesse realized that the voter file did not contain hyphens, apostrophes or any other punctuations in a person’s last name, as in “St. John” or “O’Donnell” or “Lupin-Waller.” I removed all the punctuation for the Covid deaths and that got us another several dozen matches.
At this point, it was depressingly clear that we hadn’t travelled far enough. We needed to search for near matches – based on the fact we likely made errors in data entry of the death certificates and so did the people who created the voter roll.
Enter: the fuzzyjoin. Fuzzyjoin is a function of R that allows a user to search for near matches. It also allows the user to set a parameter for how conservative or liberal R should be in searching for those near matches. I’ll spare you the details and gloss over the fact this took me more than a week to accomplish. The operation was too much for my computer and it kept crashing. But eventually I ran a successful fuzzyjoin.
This returned about 3,000 potential matches. Jesse went through each one individually and decided whether or not it was an actual match. If say “Hernandez” was spelled with a “z” in one database and an “s” in the other, the fuzzyjoin caught it. It also caught much fuzzier potential matches. In fact, only around 10 percent of the potential matches were actual matches. We purposefully set the parameters wide. Jesse was able to use a person’s address and date of birth, contained in both files, to help figure out which cases were actual matches.
In the end, we matched 41 percent of the roughly 4,200 death records in year one and 57 percent of the roughly 2,200 death records in year two.
Those tallies are right in the range we would expect. Roughly 60 percent of Californians 18 and older are registered to vote, according to U.S. Census Bureau estimates. So, we wouldn’t have expected to match more than 60 percent of the records.
Also, in year one immigrants died at highly disproportionate rates, as the year one records showed us. At least some of those immigrants would have been ineligible to vote, meaning we would expect a higher percentage of unregistered voters in year one.
When we initially calculated the death rate for Republicans and Democrats, it appeared that Republicans died at twice the rate of Democrats. But Jesse and I knew that Republicans, on the whole, skew older and Covid is a virus that is more likely to kill old people.
Again, I’ll spare you the details, but if you want to know how to age adjust data, this video is a great resource. I probably account for about 1,000 of the views myself. We did what’s called a direct age standardization.
I then had a couple of public health specialists check my work. We fined tuned the analysis based on that and we settled on our age-adjusted death rates.
Republicans, tragically, died at a rate of roughly 51 per 100,000 in the second year of the pandemic. That’s 39 percent higher than the death rate for Democrats of roughly 37 per 100,000.
Surveys from the second year of the pandemic show Republicans were significantly less likely to get vaccinated than Democrats. That is likely the biggest driving factor in the differing death rates, public health specialists told us.
I’ll leave the final word here to Greg Cox, a Republican and former county supervisor.
“I would hope the Republican Party would begin to get a broader view of what public health is all about,” he told us. “And we should all work together to make sure we have the best public health system we can. It’s about the survival of parties but more importantly the individual. Whether someone is Republican or Democrat should be immaterial.”
And this accomplished what?
Learning R – the next installment of “San Diego 101?” or the Learning Curve?
Congratulations all. Excellent reporting and I enjoyed reading the articles. Well done.
Yes, excellent work. My older Republican brother who refused to be vaccinated survived Covid at least so far, but was very sick for nearly a month. My other brother and I (and Democrats) were vaccinated and boosted and have had no bouts with Covid.
It is good to see solid investigative reporting in San Diego. Thank you for your efforts and thank you for the report.