Data Collection Overview
Data Bundles
The following table shows all the bundles and their relationships. The individual pages describe each data bundle in detail.
Bundle Name | Required Bundles | Goal |
---|---|---|
OMOP Person | None | Ensure that simple data can be successfully shared. |
HIC Person | OMOP Person | Data required to minimally characterise patients, their visits, and merge patient records (data can flow through the full pipeline). |
Physiology Basics | HIC Person | Fixed number of different types of physiological measurements to prepare the pipeline. |
Pathology Basics | HIC Person | Fixed number of different types of pathology results to prepare the pipeline. |
Drugs Basics | HIC Person | Submit data about a small fixed number of common ICU drugs. |
Device Basics | HIC Person | Submit data about exposure to specific devices. |
Procedure Basics | HIC Person | Submit data about a small number procedures. |
Diagnoses | HIC Person | Submit data about diagnoses. |
Locations | HIC Person | Submit data about patients non-ICU movements through the hospital. |
Data Transfer
All tables requested in all data bundles a site is expecting to send should be converted into CSV files compliant with RFC 4180. Header rows must be included. Column ordering should be consistent with the order shown in the wiki pages. All columns must be present, even if all the values in them are missing / NULL. NULL values in SQL should be represented by empty values in the CSV.
Each table should be its own CSV file. Each CSV file should be named in the format:
<ISO 8601-1:2019 DATE-TIME: YYYY-MM-DDTHHMMSS>-<NHS TRUST ODS CODE>-CC-<TABLE>-v1.csv
Please note: there is a “T” between date and time.
E.g.:
2022-11-10T115300-RRV-CC-PERSON-v1.csv
When uploading data to the UCL Data Safe Haven, all CSV files for a single batch should be archived into a single zip file with no password. The zip file should contain all the CSVs at the root, and there should be no directories or other files included. The zip file should be named following the format:
<ISO 8601-1:2019 DATE-TIME: YYYY-MM-DDTHHMMSS>-<NHS TRUST ODS CODE>-CC.zip
Full data extraction
The ingestion process does not handle incremental extracts. When a new extraction is received, the process truncates the database tables (all existent data is overwritten).
This means that each CSV file should contain a complete extract of all HIC data, as appropriate for the data bundle you are generating.
Please note: previous version of this guidance described the need of two extra
columns last_updated_datetime
and deleted_datetime
to handle incremental
uploads. They are now irrelevant and must not be included in the CSV files.
NHS Trust ODS Codes
NHS Trust | Code |
---|---|
Cambridge University Hospitals NHS Foundation Trust | RGT |
Guy’s and St Thomas’ NHS Foundation Trust | RJ1 |
Imperial College Healthcare NHS Trust | RYJ |
Manchester University NHS Foundation Trust | R0A |
Oxford University Hospitals NHS Foundation Trust | RTH |
The Royal Marsden NHS Foundation Trust | RPY |
University College London Hospitals NHS Foundation Trust | RRV |
University Hospitals Bristol and Weston NHS Foundation Trust | RA7 |
Inclusion criteria
For any of the data bundles you should consider the inclusion criteria defined below:
- ADULT patients. Please DO NOT include patients who are under 18 on the day of their ICU admission.
- The original 7 sites should include all ADULT patients admitted to critical care areas since 1st February 2014.
- All other sites should only include ADULT patients admitted after the date of local aproval to submit data.
- Patients admitted before 1st February 2014 (original 7 sites) or before the local date of aproval (Manchester, Bristol, Marsden) should not be included.
- DO NOT send patients who have opted-out via the National data opt-out scheme.
Deleting Data
Since we currently only support full uploads, each file is considered to contain all active records. To delete previously submitted records the sites need to send a new extract. The process will truncate all tables and re-populate with the new data.
Records may be updated or deleted for any number of reasons including, but not limited to: corrections to the source EHR, patient identity merges, and updates about patients from subsequent encounters.
Opt-outs and Deletions
There are two key types of patient opt-out. Each has a slightly different mechanism for implementation by the sites and central processing. However, these two should cover all patient preferences.
-
Site local deletion: This type of opt-out occurs when a patient requests that a specific site not share their data with the project. This might cover all of their encounters, or only a specific subset. In either case, the site should generate a new extract excluding the appropriate data. When this extract is processed, previously submitted data will be overriden from the later stages of the pipeline, and the final researcher accessible dataset.
-
Complete opt-out: Any patient record can be added to the Full Opt-Out Cohort (specified in the HIC Person bundle). The central processing pipeline will then ensure that this patient’s data, after merging with all contributing sites, does not flow into researcher accessible datasets. This mechanism can support National opt-out or a patient requesting at a local site to opt-out of the HIC data collection. However, it will only work reliably where the patient’s identity is well recorded and patient matching works. If a site needs to opt-out a patient who lacks reliable identifiers (e.g. does not have an NHS Number), and the patient requests that data from another contributing site is also removed, the local site will need to contact all other sites as appropriate to ensure that they opt that patient out in their own records, as all those records may not be correctly merged in processing.