Data Collection Overview

Data Bundles

The following table shows all the bundles and their relationships. The individual pages describe each data bundle in detail.

Bundle Name	Required Bundles	Goal
OMOP Person	None	Ensure that simple data can be successfully shared.
HIC Person	OMOP Person	Data required to minimally characterise patients, their visits, and merge patient records (data can flow through the full pipeline).
Physiology Basics	HIC Person	Fixed number of different types of physiological measurements to prepare the pipeline.
Pathology Basics	HIC Person	Fixed number of different types of pathology results to prepare the pipeline.
Drugs Basics	HIC Person	Submit data about a small fixed number of common ICU drugs.
Device Basics	HIC Person	Submit data about exposure to specific devices.
Procedure Basics	HIC Person	Submit data about a small number procedures.
Diagnoses	HIC Person	Submit data about diagnoses.
Locations	HIC Person	Submit data about patients non-ICU movements through the hospital.

Data Transfer

All tables requested in all data bundles a site is expecting to send should be converted into CSV files compliant with RFC 4180. Header rows must be included. Column ordering should be consistent with the order shown in the wiki pages. All columns must be present, even if all the values in them are missing / NULL. NULL values in SQL should be represented by empty values in the CSV.

Each table should be its own CSV file. Each CSV file should be named in the format:

<ISO 8601-1:2019 DATE-TIME: YYYY-MM-DDTHHMMSS>-<NHS TRUST ODS CODE>-CC-<TABLE>-v1.csv

Please note: there is a “T” between date and time.

E.g.:

2022-11-10T115300-RRV-CC-PERSON-v1.csv

When uploading data to the UCL Data Safe Haven, all CSV files for a single batch should be archived into a single zip file with no password. The zip file should contain all the CSVs at the root, and there should be no directories or other files included. The zip file should be named following the format:

<ISO 8601-1:2019 DATE-TIME: YYYY-MM-DDTHHMMSS>-<NHS TRUST ODS CODE>-CC.zip

Full data extraction

The ingestion process does not handle incremental extracts. When a new extraction is received, the process truncates the database tables (all existent data is overwritten).

This means that each CSV file should contain a complete extract of all HIC data, as appropriate for the data bundle you are generating.

Please note: previous version of this guidance described the need of two extra columns last_updated_datetime and deleted_datetime to handle incremental uploads. They are now irrelevant and must not be included in the CSV files.

NHS Trust ODS Codes

NHS Trust	Code
Cambridge University Hospitals NHS Foundation Trust	RGT
Guy’s and St Thomas’ NHS Foundation Trust	RJ1
Imperial College Healthcare NHS Trust	RYJ
Manchester University NHS Foundation Trust	R0A
Oxford University Hospitals NHS Foundation Trust	RTH
The Royal Marsden NHS Foundation Trust	RPY
University College London Hospitals NHS Foundation Trust	RRV
University Hospitals Bristol and Weston NHS Foundation Trust	RA7

Inclusion criteria

For any of the data bundles you should consider the inclusion criteria defined below:

ADULT patients. Please DO NOT include patients who are under 18 on the day of their ICU admission.
The original 7 sites should include all ADULT patients admitted to critical care areas since 1st February 2014.
All other sites should only include ADULT patients admitted after the date of local aproval to submit data.
Patients admitted before 1st February 2014 (original 7 sites) or before the local date of aproval (Manchester, Bristol, Marsden) should not be included.
DO NOT send patients who have opted-out via the National data opt-out scheme.

Deleting Data

Since we currently only support full uploads, each file is considered to contain all active records. To delete previously submitted records the sites need to send a new extract. The process will truncate all tables and re-populate with the new data.

Records may be updated or deleted for any number of reasons including, but not limited to: corrections to the source EHR, patient identity merges, and updates about patients from subsequent encounters.

Opt-outs and Deletions

There are two key types of patient opt-out. Each has a slightly different mechanism for implementation by the sites and central processing. However, these two should cover all patient preferences.

Site local deletion: This type of opt-out occurs when a patient requests that a specific site not share their data with the project. This might cover all of their encounters, or only a specific subset. In either case, the site should generate a new extract excluding the appropriate data. When this extract is processed, previously submitted data will be overriden from the later stages of the pipeline, and the final researcher accessible dataset.
Complete opt-out: Any patient record can be added to the Full Opt-Out Cohort (specified in the HIC Person bundle). The central processing pipeline will then ensure that this patient’s data, after merging with all contributing sites, does not flow into researcher accessible datasets. This mechanism can support National opt-out or a patient requesting at a local site to opt-out of the HIC data collection. However, it will only work reliably where the patient’s identity is well recorded and patient matching works. If a site needs to opt-out a patient who lacks reliable identifiers (e.g. does not have an NHS Number), and the patient requests that data from another contributing site is also removed, the local site will need to contact all other sites as appropriate to ensure that they opt that patient out in their own records, as all those records may not be correctly merged in processing.

Alchemist / Critical Care

Documentation on the data exchange format for HIC Critical Care and the Alchemist ingestion pipeline.