Searching for the source of the Nile?

Related Pages

One researcher’s experience of carrying out a UKMED project

Paul Garrud, Principal Research Fellow, University of Nottingham

With my co-researcher, doing a UKMED project was a journey of exploration, not least as this was one of UKMED’s pilot studies. The database was like a new continent that some had sailed round, but where no one had yet ventured into the interior.

We started off with a good question: How well did graduate entrants get on at medical school and did this differ between dedicated four-year graduate entry courses and the established five-year courses that had always taken a few graduates but where the majority came straight from secondary school? UKMED seemed a good opportunity to answer this.

Our first thought was to estimate how many graduates would have data present in UKMED, and, based on UCAS summaries, we thought around 800 a year in graduate entry and another 500 in standard courses. Given that in phase 1 only 2007 and 2008 entrants to medical schools were included, we worked on the basis of 1,600 and 1,000 respectively and a simple power calculation showed we could expect to pick up moderate differences between them… if they existed.

UKMED now provides a number of aggregate summaries of the data it holds against many variables, so one can find out in advance whether there is enough data to answer research questions. We applied to do the study, received some further queries and suggestions in response, and then had our application approved reasonably quickly. All eager to start, we had, instead, to wait for a data extract and to complete a number of necessary steps to gain permission to use the ‘safe haven’ (the portal used to access to research extracts). These included online research data governance and protection principles training, and signing a contract covering access, use of the data, reporting, and publication: in retrospect a considered and valuable introduction or reinforcement of formal information governance.

Preliminary exploration and analysis of the first data extract showed up a number of limitations, for instance some medical schools were missing and a non-medicine course was included. These were rapidly rectified via three more data extracts. More challenging were several data limitations. Here are some examples for illustration. First, not every student enrols and then progresses normally to complete their course; some drop out, some suspend and then continue again, others appear for (say) year 4 and then disappear again, yet others move medical schools between (largely) preclinical and clinical phases. Second, some self-reported information includes data that is unlikely to be valid – ages of -1 and 106, ethnicity that changes from taking an aptitude test to registering as a medical student. So, there was a lot of data cleaning to be done and decisions, often pragmatic, about how to deal with aberrant information (fortunately rare).

Some of these issues have since been dealt with through the addition to UKMED of a “standard person” table, which contains person attribute data grouped together from different sources, and through refinements to the data obtained from HESA.

The UKMED data analysts have noted that, unlike for postgraduate trainees, the GMC did not have access to undergraduate data on medical students before the advent of UKMED. Since then, the GMC and MSC have started to produce descriptive statistics on medical students to the benefit of all new projects.

UKMED research will often involve co-ordination between researchers in different universities. This is helped, in part, by common access to the project files area and the use of syntax files that could easily be edited, amended and re-run with the dataset (we used SPSS, but similar approaches can be taken to STATA, R and other statistics packages). However, email, telephone, and face to face working sessions were needed to discuss and resolve some of these issues. Perhaps the main issue was how to deal with missing data and we approached that using multiple imputation comparing the patterns of results obtained over 100 iterations with the patterns in the original data (even though not always statistically reliable).

After that, analysis of the cleaned dataset was much more straightforward. Like any project it not only started to answer most of the original research questions, but also threw up further questions, which we merrily pursued within the limits of the dataset.

So, around nine months after applying to do the project we had a draft final report submitted. That report was discussed, dissected really, by the research committee and returned with a set of methodological and other queries for us to address. Another month of reanalysis and rewriting, though, and we had a final report that was acceptable. One critical issue that did crop up was concern about the further research questions that had emerged in the course of the project and our attempt to answer them: this wasn’t part of the original proposal and, it was felt, should not be part of any published paper. There was a protracted discussion about this because, as many will recognise, this is often what happens in research.

Nevertheless, a conclusion was agreed not to include this additional work; indeed a further (and much more extensive) proposal was subsequently drawn up and approved to investigate these additional research questions using a later and substantially larger data extract (‘A comparison of the properties of BMAT, GAMSAT and UKCAT’, Tiffin). That does all take time though, and with only two application rounds a year, the minimum time from application for published report is likely to be over a year. However, it is worth noting that if an organisation involved in medical education needs quick answers to questions, UKMED can help through standardised reports.

What have I learned about using UKMED? Four main lessons may be worth passing on to others.

  1. Talk to the data managers for UKMED before applying. This will help to get a good grasp of what information is there, how much there is, and what its limitations are likely to be.
  2. Allow two or three times as much of your time as you expect to complete your project. Although the data extracts are much better organised and validated in the current phase of UKMED researchers continue to find data issues they didn’t anticipate.
  3. Make sure you have a co-researcher with good statistical or data science expertise: this isn’t just because the complexity and size of UKMED data requires more advanced and sophisticated approaches, but also because having to talk through and discuss what and how you are going about your analyses with a heavily involved colleague is invaluable.
  4. It’s well worth persisting with the (sometimes time consuming) safe haven and governance processes, because in the end you have answers to questions that are based on evidence from across the UK, not just generalisations from a single medical school that may be unwarranted, and with high statistical power.

Exploring a new continent is a great adventure (and a privilege). Fortunately, starting with a clear mission, I think we did find a plausible source, and it didn’t take five years to do so.

The research referred to in this article is ‘What has been the impact of accelerated graduate-entry medicine courses in terms of educational and sociodemographic profile, success at medical school, completion of Foundation training, and specialty entry?’ It is summarised on the UKMED website.