Garbage in, garbage out: an Update on Problems of Data Availability and Quality for COVID-19 and Older People in Low and Middle-Income Countries

Oct 5, 2020 | All posts, Country reports

By Peter Lloyd-Sherlock, Barbara Corso, Shah Ebrahim, Ramon Martinez, Nadia Minicuci, Ilaria Rocco, Lucas Sempe, Patricia Solis.

Back on 25 June, the Global Platform posted its first paper about the availability and quality of data on Covid-19 and older people in low and middle-income countries (LMICs). We also ran a webinar on the same theme We plan to make age-disaggregated data on COVID-19 and excess mortality in LMIC one of our main focuses of activity over the next few months, and our new site now has a particular section relating to this issue. As far as we know, nobody else seems to be presenting or engaging critically with this sort of data: neither of the major global COVID-19 databases (WHO’s COVID-19 dashboard and Johns Hopkins Coronavirus Resource Centre) currently provide age-specific data. Over the next few months, we will provide updated lists of data sources, along with guidance about how to interpret, as well as many, many warnings about drawing easy and fast conclusions.

We need good data about how COVID-19 is affecting older adults in LMICs. Although the pattern of the pandemic has not always followed experts’ predictions, one early prediction still holds firm: the majority of deaths resulting from the pandemic are occurring among older adults in LMICs. It would be useful to be more precise about how large a majority this is, but (as we show in this paper), current data do not allow for that. Since older people are the main at-risk group, any general response to the pandemic has to pay special attention to people in this age group, even if only to manage overall pressure on health services. Beyond that, without data on older people, we cannot assess whether they are being fairly treated and whether preventive actions are reducing or even increasing their risk of dying. Amidst widespread concerns about explicit and implicit ageism in COVID-19 policy, there is a need to keep a close eye on what is really going on.

Our previous paper on this issue looked at data on reported cases and deaths of older people attributed to COVID-19. It also touched on the need to develop more comprehensive and reliable indicators, such as excess overall mortality. Lucas Sempe has since published a paper setting out a method for making this calculation and applying it to the case of Peru. Among other things, it shows that official figures up to 12 July captured only around a fifth of total excess mortality and registered deaths, and that under-reporting was higher for people at older ages.

Rates of COVID-19 infection and mortality are key indicators, but do not cover many other important issues about how the pandemic is affecting older people in LMICs. This requires many other forms of data. Liat Ayalon’s webinar talk set out this wider agenda, including a need for robust evidence about social isolation, subjective wellbeing and experiences of discrimination. At the same time, she calls for data that permit analysis of patterns within older populations and asks which national and global bodies should be responsible for marshalling and monitoring these data. Given the many issues of importance and the limited capacity to gather and process data in LMICs, a starting point may be to identify the most urgent priorities, critically evaluate the data coverage of these issues and then advocate for improvement.

That will be an issue for a future paper we hope to write. This one takes the same narrow focus on cases and deaths as the previous paper on data. In large part, this is because these are the data that are most widely reported at this point in time. Two months after our previous paper, we ask what can be safely deduced from the available data, drawing attention to pitfalls in analyses that take the numbers at face value. And we show that with the current data, we can deduce very little.

Table 1 provides data for a limited set of LMICs. We have a more comprehensive excel table for all countries on our site and hope to update it every few weeks. You can find a list of websites where we source these data here. Table 1, compiled in early August 2020, includes countries which provide at least some age-specific data and where there have been at least 100 reported COVID-19 deaths. There are some prominent examples missing from the list: most notably Brazil, whose data reporting on the pandemic has been particularly problematic.

Table 1. Age disaggregated data for reported cases and mortality due to Covid-19, selected countries (data available as of 12 August 2020).

1   2 3 4 5
  COVID-19 experience Incidence/  100,000 Death rates per 100,000 Current case fatality %
Total cases Total deaths Cases 60+ Deaths 60+ Total 60+ Total 60+ Total 60+
Mexico, 9 August 518,231 54,266 94,577 26,699 406.22 661.91 42.54 186.86 10.47 28.23
Algeria, 22 June 11,332 852 3484 637 26.32 81.74 1.98 14.95 7.52 18.28
Argentina, 11 July 95,607 1807 12403 1467 212.75 178.06 4.02 21.06 1.89 11.83
Costa Rica, 27 July 15,605 112 888 79 309.16 117.28 2.22 10.43 0.72 8.90
India, 15 July 79,519 1,842 8510 942 5.82 6.17 0.13 0.68 2.32 11.07
Pakistan, 31 May* 72,460 33.46
Philippines, 15 July 58,152 1,733 8215 1076 53.79 88.35 1.60 11.57 2.98 13.10
South Africa, 27 June 137,387 2,398 15431 1329 234.62 310.02 4.10 26.70 1.75 8.61
Turkey, 19 July 220,657 5,491 24275 3870 264.48 222.11 6.58 35.41 2.49 15.94

1Source: (Year=2019)

*Website not fully functional since that date.

What can we conclude from these data?

Let’s go column by column…

Column 1 shows that few countries are providing recent updates, and some data are weeks or even months old (31 May for Pakistan). This is a concern, given the fast-developing nature of the pandemic in these countries. It also makes it harder to compare between countries which may be at different points in the pandemic curve.

Columns 2 shows the total numbers of reported cases and deaths where age data are included.[1] Column 3 shows the number of reported cases and deaths that occurred among older people. Column 4 presents these data (and those for the total population) in terms of cases per 100,000 (“incidence”) and death rates per 100,000 people (both cases and people who are not). Column 5 then shows case fatality rates: the percentage of people who were reported to have tested positive and who had died at the time of reporting.

This is an impressive and complex set of data (even if many countries are not included and much was out of date). But what can we take from it? A useful way to make sense of the data is to compare countries. We will do this with Mexico and Argentina.

By 9 August 2020, 26,699 people aged 60+ in Mexico were reported to have died as a result of COVID-19. In Argentina, by 11 July the equivalent number was 1,467. It is striking that older people accounted for 81.2% of all deaths in Argentina, but only 49.2% in Mexico. This is even more surprising since the current case fatality rate for older people in Mexico (28.23%) was much higher than in Argentina (11.83%). Put simply, older people in Mexico who were tested positive for COVID-19 were more than twice as likely to have succumbed from this condition (at the time of reporting), than those in Argentina. For the total populations of these countries, current case fatality varied by more than five times (10.47 in Mexico versus 1.89 for Argentina).

These strange patterns can be explained in two different ways.

First, is Argentina under-counting COVID-19 deaths to a much greater degree than Mexico? There have been newspaper reports drawing attention to COVID-19 mortality data in both countries. It would not appear likely that the situation in Argentina is substantially worse than for Mexico, where, back in July, the Financial Times reported that: “tens of thousands of deaths may have been missed in the official statistics”. A study of Mexico found 122,765 more people had died than expected between mid-March and 1 August. Of these only around half (67,326) had been attributed to COVID-19. No similar research has been done for Argentina yet, but it is unlikely to have performed even worse than Mexico.

Second, is Mexico’s testing system identifying a substantially lower share of people with the virus than in Argentina?  There is much more to support this explanation. Official data show that by September 2020, Mexico was providing 9 tests per 100,000 people each day, compared to 56 in Argentina. In the UK, where the limited availability of testing in relation to need is widely acknowledged, daily testing was 345 per 100,000. In other words, data on cases in both countries say more about the availability of tests than of the real incidence of COVID-19.

Putting all of this together, can we draw any conclusions from the data in Table 1?

The short answer is “precious few”. We can get a general sense of the scale of the pandemic in different countries. We can see which countries are publishing data and when they are updated –which is an interesting piece of data in itself. But if I were asked to referee a paper submitted to an academic journal that sought to draw any further conclusions from this sort of data, I would feel compelled to reject it. The Mexican government has what seems to be a state-of-art data portal on the pandemic, with large amounts of apparently important information neatly presented in charts and graphics. But if the quality of these data is as poor as claimed by the Financial Times, then they may do more to obscure than clarify the reality of the pandemic.

It is easy to criticise the governments of countries like Mexico about the woeful state of these data. At the same time, we must not ignore how challenging the collection and compilation of good data must be. Diagnosing cause of death is not easy at the best of times and, as Aravinda Guntupali mentioned in her webinar presentation that in countries like India cause of death data were extremely incomplete even before the pandemic hit, and concerns of missing COVID-19 deaths persist. Inevitably, these problems are especially pronounced for people at very old ages, both due to higher frequency of multiple health conditions and a tendency to view these events as simply dying of old age. The picture is further complicated by changes in definition of a COVID-19 death.  In the UK this definition was restricted to deaths within 28 days of a positive test on 12 August 2020 which reduced the death count by 5,300.  Definitions applied in LMICs are not easily found and may not be comparable. And anyway, shouldn’t we be taking a wider approach than just counting deaths directly attributed to COVID-19, and consider fatalities caused by other conditions which would not have occurred in non-pandemic circumstances?

In the face of these many problems, what hope is there that we can eventually get a clearer view of the pandemic? This will require new methods of data collection and analysis. They are likely to be more labour-intensive and challenging than simply clicking on a government portal. Poring over hospital and health district records, as well as other “downstream” sources may provide a clearer albeit imperfect picture. As Sempe’s paper shows, constructing robust estimates of excess mortality is not for the statistically faint-hearted. The Global Platform will keep a particular eye out for similar studies.

In the meantime, we feel that our simple calculator that estimates potential COVID-19 deaths by age group (the PICHM) offers the best approach for most LMICs until things improve.

Please do share with us any new data sources we haven’t already listed on our site. And please, please bear in mind the many dangers of trying to make any sense of the numbers. Here’s hoping that we will be able to publish future papers that focus more on what the data really do say and less on what they don’t.

[1] Not all reported cases include age data. For example, in Algeria a further 1,973 COVID-19 deaths had been recorded without an age.