In mid-May, when the health department published a long-awaited list of nursing homes with outbreaks of COVID-19, the numbers were immediately contested. Without disclosure or acknowledgement, the department began quietly correcting issues. Days later, they admitted to some problems.
Complexities and shifts “should be expected,” said Gradeck, of the Western Pennsylvania Regional Data Center. “It’s not surprising that the numbers change.”
But if you’re clear about the data’s limitations from the start, he said, you avoid “setting yourself up for a gotcha moment.”
And with data constantly revised, it’s important to provide historical numbers, said Coral Sheldon-Hess, a professor of computer information technology and data analytics at the Community College of Allegheny County.
People analyzing Pennsylvania’s COVID-19 data need to know when to “correct any past numbers, to help make predictions better going forward,” Sheldon-Hess said.
But that hasn’t happened in every case.
Since March, the health department has kept an archive page of coronavirus data, publishing daily tallies. But the archive doesn’t disclose when numbers were later corrected, nor does it explain why the department changed its methodology.
What’s more, the health department said June 8 that with the launch of the new data dashboard, it would no longer be posting updates to the archive page. That wouldn’t be necessary, Wardle said, given that the dashboard contained a “graphical depiction” of when COVID-19 cases and deaths occurred.
A day later, after hearing that the dashboard was difficult for some people to use, the department resumed posting to the archive page.
Make data easy to scrutinize
In 2016, the Wolf administration pledged to make government data available and usable to the public.
“One of our most valuable and underutilized resources in state government is data,” Wolf said at the time.
The initiative centered around OpenDataPA, an online portal for data that’s both free for anyone to use and structured in a way that’s easy for computers to process. Think: Excel spreadsheets or CSV files, not PDF files or tables posted on web pages.
The format of data is important, because it sets the stage for what the public can do with it.
“If I have three hours to work on a dataset and I spend two hours just getting that data, my time to explore and understand the data is limited,” said Jacob Kaplan, a postdoc fellow at the University of Pennsylvania, who’s been studying the spread of the coronavirus in prisons.
In the OpenDataPA portal, the catalog has a listing for data about the coronavirus. But the page doesn’t actually contain data.
Instead, it just links to the health department’s COVID-19 website, where data is structured in a way that makes it cumbersome to work with and difficult to analyze.
If Pennsylvania made its source data easily accessible, it could have helped quash concerns last month, when the state said its total count of COVID-19 tests included negative antibody tests, then backtracked on the statement a day later.
The situation raised red flags among epidemiologists, as antibody tests show past infections, not current ones, and, if included, would distort the state’s capacity to detect infections in real time.
But as it stands, Pennsylvania is touting total testing numbers impossible for the public to vet. County-level data currently shows only the number of people receiving COVID-19 tests, without disclosing how many times those people are tested.
Those numbers — exactly how many people are being tested more than once — are “reported internally,” Wardle, the spokesperson, said.
Spotlight PA reporter Daniel Simmons-Ritchie contributed to this article.
100% ESSENTIAL: Spotlight PA relies on funding from foundations and readers like you who are committed to accountability journalism that gets results. If you value this reporting, please give a gift today at spotlightpa.org/donate.