Pseudonymisation
Working notes on how to provide access to data to each site whilst minimising exposure to direct identifiers and minimising the risk of re-identification via indirect identifiers.
Technical options considered:
- Keep a single centralised schema and use views to present anonymised identifiers.
- Maintain an anonymised copy of the data (this simplifies the management of permissions but takes more space and may incur in synchronisation errors).
- Provide column- and table-level permissions, hiding identifiable data.
These notes will be built up as per each “bundle”.
OMOP Person bundle
Hide the following items:
person_source_value
: this is very likely to be the local MRN (Medical Record Number).
Replace the following items:
person_id
: in Silver, we will assume that the id is potentially identifying (i.e. it is possible but unlikely that sites chose to use the local hospital number). This value will be replaced by a generated number when transferring data into Gold.
Do not hide the following items (on the basis that these are not direct identifiers):
- Columns relating to the patient’s date of birth.
- Columns relating to gender.
Data Visibility
Data Category | Fields | Description | Bundle | Silver | Gold Pseudo anonymised | Bespoke Release |
---|---|---|---|---|---|---|
Episode Descriptor | person_id | ID unique to the data set | OMOP Person | Yes | Yes | Yes |
Episode Descriptor | person_source_value | ID unique at the patients site | OMOP Person | Yes | No | No |
Direct identifier | NHS number | Only used for Data Linkage | HIC Person | Yes | No | No |
Identifying | year_of_birth | OMOP Person | Yes | Yes | Provided as age at admission | |
Direct identifier | birth_date_time | OMOP Person | Yes | Provided as age at admission | Provided as age at admission | |
Direct identifier | death_date | HIC Person | Yes | Revert to 01/01/2020 maintaining $\Delta$ with date of admission | Revert to 01/01/2020 maintaining $\Delta$ with date of admission | |
Identifying | Post Code | Hospital, GP, Person | HIC Person | Yes | Max 2 inbound or transformation to deprivation index | Deprivation Index Only |
Date time | visit_start_date | Hospital, ICU, Ward | HIC Person | Yes | Only if directly required by research question | Revert to 01/01/2020 maintaining seasonal cadence |
Date time | visit_end_date | Hospital, ICU, Ward | HIC Person | Yes | Only if directly required by research question | Revert to 01/01/2020 maintaining seasonal cadence |
Date time | Multiple possibilites | Tests, interventions, results | All Bundles | Yes | Only if directly required by research question | Revert to 01/01/2020 maintaining seasonal cadence |
Sensitive | Comorbidities | Sensitive or related diagnosis | Diagnoses | Yes | Only if directly required by research question | No |
Sensitive | Diagnosis | Sensitive or related diagnosis | Diagnoses | Yes | Only if directly required by research question | No |
Sensitive | Drugs | Sensitive or related diagnosis related | Drugs Basics | Yes | Only if directly required by research question | No |
Sensitive | Test Results | Sensitive or related diagnosis related | Pathology Basics | Yes | Only if directly required by research question | No |