How ethnicity recording differs across health data sources and the impact on analysis
With growing concerns about the mortality outcomes for different ethnic groups during the COVID-19 pandemic, researchers worked quickly to understand the impact of the virus using the best data available to them. However, ethnicity information is collected in different ways across the health sector. Researchers do not all have access to the same data sources, which can lead to differences in estimates. Here Rose Drummond provides an update on work being undertaken with Wellcome and the Race Equality Foundation to provide clarity around estimates for England.
There is an urgent need to improve the evidence base around ethnicity and health as we set out in our blog last year. This is to ensure current estimates of health and ethnicity are as reliable as possible and due to the need to greater monitor and identify potential differences between subgroups of the population to better tailor services and policies to these groups. This has been no more apparent than during the COVID-19 pandemic, where ethnicity was found to be a risk factor for increased mortality.
However, despite this, there have been problems identified with the accuracy and completeness of the recording of ethnicity. Currently, there is no single source of ethnicity data available to all analysts, and the different sources that are available are not always consistent.
Often, one individual can be recorded as a differing ethnic group between different health related data sets. As well as this, some sources have lots of missing data, or higher than expected proportions of people coded as ‘Other’. It is important to understand the consistency and completeness of ethnicity data sets, as this informs how the data should be analysed and what the data can be used for.
Lesson learning from the COVID-19 pandemic has highlighted ethnicity data gaps as a key area for the health statistics system to focus on. As part of a wider research collaboration funded by the Wellcome Trust and in partnership with Race Equality Foundation, the ONS has worked to improve understanding of the quality of ethnicity data in key NHS sources, publishing two pieces of analysis which help assess where there is bias in the data and why there might be bias.
Where the differences lie
Our quantitative analysis undertakes person-level comparisons of ethnicity information recorded in key health data sources, such as GP records and Hospital Episodes Statistics (HES), compared to ethnicity information from the 2011 Census. The census is widely regarded as the most robust ethnicity data source covering the whole population as we are confident it is self-reported. It shows the following across all data sources:
- The White British category consistently reported the highest level of agreement with Census 2011; of those recorded as White British in the health admin data sources, more than 96% reported the same ethnicity in census.
- Bangladeshi, Pakistani, Chinese and Indian categories also reported high levels of agreement (greater than 80%) with census for all health admin data sources.
- Across all three health administrative data sources, agreement was lower for all Mixed ethnic groups (less than 67%) and “other” ethnic groups , including Other Asian (less than 60%), Other White (less than 55%), Other Mixed (less than 21%), Other Black (less than 16%) and Any Other ethnic group (less than 15%)
- Overall agreement rates based on the subset of GP data available for analysis were similar for GP and hospital data, though there is some variation between ethnic groups.
The data collection process
Our desk review provides some insight into the process of collecting ethnicity data in healthcare settings and why these differences in ethnicity data might occur.
Staff have a role in collecting data from patients, and as you may expect, human behaviour can vary across settings and influence quality. It was reported that staff inputting responses provided by patients at GP surgeries created opportunities for bias through subjective interpretation of ethnicity. There was also evidence of ethnicity being conflated with country of birth.
We also found that individual hospitals and GP surgeries may use more detailed ethnicity categories for data collection than the standardised categories specified by NHS Digital and used for national-level analyses. This has several potential implications for data quality. For example, variations in how the ethnicity response options are presented between and across healthcare settings could conceivably affect an individual’s response. Furthermore, mapping bespoke ethnicity categories to the harmonised categories was considered complex, particularly if the categories on the data collection form do not match the IT system. If staff are unsure of how to map bespoke categories to high level ethnic groups, data quality could be impacted.
In addition, we found that quality checks focused on completeness; checking accuracy was considered difficult because it requires verification of ethnicity by individual patients.
It was noted hospitals are a more challenging environment in which to collect ethnicity data, as providing medical care to those who need it must take priority over other tasks such as ethnicity data collection. Perhaps, unsurprisingly, this was more pronounced in emergency departments compared to other departments.
Next steps
Our analysis so far describes the extent to which ethnicity recording in different health admin data sources differs, and explores reasons for why that might be. Future work will look at solutions and methods analysts can use to produce more reliable estimates despite the differences between sources. We’ll be keeping our users updated on our next steps as the work progresses.