(upbeat music) – [Bob] WHO defines four dimensions or domains of data quality which you see presented here, completeness, internal consistency, which means do even the
data agree with each other? External consistency, does the data reported
by health facilities agree with data from other sources, such as a health facility survey, and finally a review of the
consistency of the denominators of the population estimates that are used to calculate coverage. So I'm going to review each
of these four domains in turn. You see here a chart
that is showing the trend over the last 12 months in the reporting completeness
for two different datasets. It's possible with a
single chart like this, to show the trend for multiple datasets, completeness, reporting rates, this is the most fundamental
aspect of data quality.
And before you attempt
to interpret any trend in the data themselves, you have to look at what is the trend and the percentage of
completeness of the data. Internal consistency,
what do we mean by that? Well, you could think of
this internal consistency as the apparent accuracy of the data comparing one aspect of the data with another aspect of the same dataset. And there are several different ways of measuring this internal consistency at a way of measuring is
here referred to as a metric. If there's one metric which
is the consistency over time that is our data that
consistent from month to month. B, the second metric is
consistency between indicators that are related and the
commonest example of this is to assess what we refer
to as the DPT dropout rates are the reported numbers of
first doses of Penta vaccine, consistent with the numbers of
third doses of Penta vaccine. And we'll provide some other
examples of related indicators. And finally, there's a form
of internal consistency, which can only be measured by
visiting the health facility as I've referred to.
And that is the consistency
between clinic registers and reported data and in that way, we can measure something
called the verification factor. This is an example of the type of chart which we can use to look at
to month consistency of data. In this case, the number of reported third
doses of Penta vaccine, and each of these lines represents the trend over the 12 month period. And the Penta 3 doses for one region. The blue line is region one. And the blue line in
particular has this hump here, which unlike the data
front of the other regions suggests there's a certain inconsistency from month to month that this has jumped up in a
way that we don't quite see with the data from the other regions.
However, when we're looking at data at the level of a
region, we can't be sure, is this due to a data quality issue or does it represent an
actual increase in services? But look what happens when we actually look at these trend lines for individual districts? So here we see the data
from the same country and from region one, but in this case, there's one line for each of
the districts of region one. And there's a certain
amount of instability from month to month in the indicators.
But look at this line
for district number 12, where in the month of June the reported third doses of
Penta has more than doubled and then it drops back down again. This is highly suspicious. And the lesson here is that when you're looking for
month to month inconsistency, it's best to look at the
data that are desegregated to the level of the district. And then you're more likely to pick up these quite suspicious numbers and have confidence that
it is a data quality issue. In fact, with DHIS 2, it's
possible to drill down, to go down to an even lower level and identify the specific health facility, which has reported this suspicious number. And I think you would agree with me that when we look at these
lines for each of the doses of a vaccine for this
health center number two of the same district of the same region, we see that this jump in
the value for the region is actually due to a single number reported by this health center in June.
And we can also see that this
is almost certainly an error. This is due to a data quality issue. Not only has the typical
value of less than a hundred of the typical value of
Penta 3 jumped to 3,749, but we see that this health
center in the same month has reported normal consistent values for the other data elements. So here we see an example
of an erroneous value that has been reported
for one health facility. This slide shows you the
table that is generated by the WHO data quality tool to automatically identify
facilities and values like this that are extremely suspicious, and which are almost
certainly due to errors that have been entered into the DHIS two. In this case, it has sorted these values and put up at the top of our table, the 12 months of data for Kawe Dispensary data on Penta vaccine doses given in females under one dose three.
And it has found that this
number is the most extreme and most suspicious value of all in all of the immunization datasets, and we'll practice with
this WHO data quality tool and see how it can be used to rapidly and automatically identify such outliers. The next type of internal consistency that we will be reviewing is as I said, consistency
of related indicators. And here's some other examples
of related indicators. In addition to looking at the dropout rate between Penta 1 and Penta 3. We could also compare the values
of the first doses of Penta to values of the first
doses of OPV vaccine. Both of these are typically
administered on the same visit. So we expect the values
of these two indicators to be roughly the same. And then a final example. The data on confirmed cases
are frequently reported both in the monthly
outpatient department report and reported in the
malaria lab test report that is submitted each month. So we could compare
confirmed outpatient cases to the total number of positive
RDT and microscopy tests that were reported in a given month. The slide here shows how
the WHO data quality tool analyzes for the dropout
rate between DPT 1 and DPT 3 and it generates this type of chart showing that there's
one particular district, which had a negative dropout rate, which if reported for a full year is usually a sign of a
data quality problem.
External consistency, I'll just quickly review
these last two dimensions, which aren't a major
focus of this workshop. Here's an example of how
the WHO data quality tool has compared the values reported on a house household survey. In this case, this was a
demographic health survey, the values for anti-natal
care first visit coverage reported on the survey
versus what was estimated based upon the data reported
by health facilities.
And in this case, there's
a striking difference between what is called the
routine estimated coverage coverage that is calculated
with health facility data and the estimates based
on the household survey. And finally, consistency
of population estimates or denominator estimates. There's a couple of things
that should be assessed about the denominators. One is whether they are
consistent from year to year, and the second is whether
related denominators, such as the number of pregnancies, the number of live births,
the number of infants are consistent between each other. So here's a chart giving
an example of a country, which has not had year to year consistency and the growth rate of the
number of surviving infants the population under one. In fact, that estimated number
under population under one is seem to have dropped from 2015 to 2016, and then it increased
dramatically from 2017 to 2018. So this estimated value is not showing year to year consistency. And similarly, if you think about it, it is a bit unusual that we would find that the estimated number
of surviving infants in 2018 and 2019, was estimated
to be actually greater than the estimated number of live births that really should not
happen at national level and suggests that we had some
errors in the denominators used to calculate coverage.
(upbeat music).