How complete is DOH's coronavirus dataset?
MANILA, Philippines (UPDATED) – The number of coronavirus cases in the country has shot up to 57,006 as of Monday, July 13.
Along with it comes the data on these cases that help policymakers in arriving at decisions, as well as data analysts in assessing the spread of the virus.
Over 5 months since the first case of COVID-19 in the Philippines, however, the dataset regularly released by the Department of Health (DOH) still has missing details for some crucial data points.
Data points like the location of cases and the dates related to the onset of illness and test results release are important in analyzing the transmission of COVID-19, as well as in guiding the government's virus responses.
"The dates and locations are vital data, especially [since] the government has pushed for the use of science, statistics, and analytics in policy-making," said Peter Julian Cayton, associate professor at the University of the Philippines (UP) School of Statistics and a member of the UP COVID-19 Pandemic Response Team, in an online chat with Rappler. "Having as complete data as possible is essential for robust decisions to be made."
Cayton also emphasized the accuracy of these data, especially as it relates to drawing conclusions from them. "Having small errors might not be noticeable in the national picture, but if errors are concentrated on small areas, decisions may unequally impact livelihoods and families."
Using data as of Sunday, July 12 – with 56,259 confirmed cases – here is how the DOH's COVID-19 dataset fares in these various checks.
Rappler will be updating this page every week, to track the DOH's progress in completing these data points.
For the first 4 rows in the table below, only 54,103 confirmed cases who are non-repatriates are considered.
For the next two rows, the 2,156 entries that stand for repatriates, some of whom have been tagged to locations in the Philippines, are considered.
For the last row, all 56,259 confirmed cases are considered.
|% complete||# of cases||Item|
So far, over 15% of cases still have blank location data for both non-repatriates and repatriates. Among them, 7 cases had been reported as far back as April, and 1,103 as far back as May.
In addition, while the specific regions of around 82% of non-repatriate cases have been identified, only around 6 in 10 cases so far have complete data for region, province, and city/town.
As for repatriates, only around 1 in 5 have location data so far.
Cayton explained that having accurate data on location is important in analyzing the transmission and the current status of the virus in different parts of the country.
"Different islands have different population dynamics, and thus COVID-19 will not affect each territory similarly. Add to that the different levels of development and health infrastructure each local area has. So, location variables are vital for us to get a comprehensive picture in the fight against COVID-19," he said.
For the table below, all 56,259 confirmed cases are considered, except for the last two rows where total deaths and total recoveries are the base numbers.
|% complete||# of cases||Item|
So far, in the DOH dataset, only around 3 in 10 cases have data on when the symptoms first manifested in patients. This is the standard date used in analyzing COVID-19's transmission, and is also used in computing the reproduction number or the "transmission potential" of the disease, said Cayton.
However, he noted that COVID-19 "is one of the few diseases that can be passed on by pre-symptomatic and in some rare instances, asymptomatic individuals...so sometimes symptoms may not show, or show later."
Therefore, in cases where the date of onset of symptoms is missing, the date of specimen collection can be used as a proxy, like how the DOH does it in its official epidemiological chart showing the progress of cases. "In a way, the discovery of the virus in a patient's specimen implies the presence of the virus," Cayton said.
Fortunately, around 90% of cases have dates of specimen collection. (IN CHARTS: COVID-19 cases in the Philippines)
If either date of onset or date of specimen collection is still unavailable, "the [date of] release of test results is what we use as stand-in for both," he continued. If all 3 dates are missing, Cayton said the date of confirmation or inclusion in DOH's official count (and therefore in the DOH dataset) is then used as substitute.
As for the dates of actual deaths and recoveries, data is almost complete for fatalities, but only around 40% have actual recovery dates so far. Some of the oldest cases with missing actual recovery date had been reported as recovered as far back as March.
Information on deaths and recoveries, given that they are appropriately dated, show the "outcomes of our health sector in combatting COVID-19," said Cayton.
Addressing data gaps
In an email to Rappler, the office of DOH Undersecretary Maria Rosario Vergeire explained that the cleaning and validation of case information are ongoing, and that the missing data so far is caused by "problems encountered by the incomplete accomplishment of the Case Investigation Forms when a patient gets tested for COVID-19."
It also attributed missing actual recovery dates of some cases to "the incomplete and inaccurate contact information placed in the Case Investigation Forms."
To address these gaps, the DOH said it has been using the COVID-Kaya application as its "single electronic repository" for all coronavirus cases "with completed Case Investigation Forms, laboratory results and health status."
The department also said it is hiring more case profilers who can validate the missing information of confirmed cases.
"With the use of this [COVID-Kaya] application and the continued work of our case profilers, the DOH hopes to have more complete and accurate data," the department said.
Earlier, on June 11, researchers and fellows from the OCTA Research group flagged gaps in the DOH's COVID-19 dataset, such as significant backlogs and missing location data.
If these aren't resolved soon, the group said that it "will undermine not just the government's ability to monitor spread of the virus but also hamper its ability to implement appropriate and timely responses to manage the pandemic on the ground."
"Without accurate and accessible DOH data on COVID-19, our national and local government officials as well as other stakeholders will not be able to make decisions crucial to managing the pandemic," the researchers added. – Rappler.com