E.1.1 De-Identifier

An Application may claim conformance to an Application Level Confidentiality Profile and Options as a de-identifier if it protects and retains all Attributes as specified in the Profile and Options. Protection in this context is defined as the following process:

  1. The application may create one or more instances of the Encrypted Attributes Data Set and copy Attributes to be protected into the (single) item of the Modified Attributes Sequence (0400,0550) of one or more of the Encrypted Attributes Data Set instances.

Notes: 1. A complete reconstruction of the original Data Set may not be possible; however, Attributes (e.g. SOP Instance UID) in the Modified Attributes Sequence of an Encrypted Attributes Data Set may refer back to the original SOP Instance holding the original Data Set.

2. It is not required that the Encrypted Attributes Data Set be created; indeed, there may be circumstances where the Dataset is expected to be archived long enough that any contemporary encryption technology may be inadequate to provide long term protection against unauthorized recovery of identification.

3. Other mechanisms to assist in identity recovery or longitudinal consistency of replaced UIDs or dates and times are deprecated in favor of the Encrypted Attributes Data Set mechanism that is intended for this purpose. For example, if it is desired to include an encrypted hash of the Patient’s Name, it should not be encoded in a separate private attribute implemented for that purpose, but should be included in the Encrypted Attributes Data Set and encoded using the standard mechanism. This allows for compatibility between different implementations and provides security based on the quality and control of the encryption keys. Note also, that unencrypted hashes are considerably less secure and should be avoided, since they are vulnerable to trivial dictionary based attacks.

  1. Each Attribute to be protected shall then either be removed from the dataset, or have its value replaced by a different “replacement value” which does not allow identification of the patient.

Notes: 1. It is the responsibility of the de-identifier to ensure that this process does not negatively affect the integrity of the Information Object Definition, i. e. Dummy values may be necessary for Type 1 Attributes that are protected but may not be sent with zero length, and are to be stored or exchanged in encrypted form by applications that may not be aware of the security mechanism.

2. The standard does not mandate the use of any particular dummy value, and indeed it may have some meaning, for example in a data set that may be used for teaching purposes, where the real patient identifying information is encrypted for later retrieval, but a meaningful alternative form of identification is provided. For example, a dummy Patient’s Name (0010,0010) may convey the type of pathology in a teaching case. It is the responsibility of the de-identifier software or human operator to ensure that the dummy values cannot be used to identify the patient.

3. It is the responsibility of the de-identifier to ensure the consistency of dummy values for Attributes such as Study Instance UID (0020,000D) or Frame of Reference UID (0020,0052) if multiple related SOP Instances are protected. Indeed, all Attributes of every entity about the Instance level should remain consistent for all Instances protected, e.g., Patient ID for the Patient entity, Study ID for the Study entity, Series Number for the Series entity.

4. Some profiles do not allow selective protection of parts of a Sequence of Items. If an Attribute to be protected is contained in a Sequence of Items, the complete Sequence of Items may need to be protected.

5. The de-identifier should ensure that no identifying information that is burned in to the image pixel data either because the modality does not generate such burned in identification in the first place, or by removing it through the use of the Clean Pixel Data Option; see Section E.3. If non-pixel data graphics or overlays contain identification, the de-identifier is required to remove them, or clean them if the Clean Graphics option is supported. See Section E.4 The means by which burned in or graphic identifying information is located and removed is outside the scope of this standard.

  1. Each Attribute specified to be retained shall be retained. At the discretion of the de-identifier, Attributes may be added to the dataset to be protected.

Note: As an example, the Attribute Patient’s Age (0010,1010) might be introduced as a replacement for Patient’s Birth Date (0010,0030) if the patient’s age is of importance, and the profile permits it.

  1. If used, all instances of the Encrypted Attributes Data Set shall be encoded with a DICOM Transfer Syntax, encrypted, and stored in the dataset to be protected as an Item of the Encrypted Attributes Sequence (0400,0500). The encryption shall be done using RSA [RFC 2313] for the key transport of the content-encryption keys. A de-identifier conforming to this security profile may use either AES or Triple-DES for content-encryption. The AES key length may be any length allowed by the RFCs. The Triple-DES key length is 168 bits as defined by ANSI X9.52. Encoding shall be performed according to the specifications for RSA Key Transport and Triple DES Content Encryption in RFC-3370 and for AES Content Encryption in RFC-3565.

Note: 1. Each item of the Encrypted Attributes Sequence (0400,0500) consists of two Attributes, Encrypted Content Transfer Syntax UID (0400,0510) containing the UID of the Transfer Syntax that was used to encode the instance of the Encrypted Attributes Data Set, and Encrypted Content (0400,0520) containing the block of data resulting from the encryption of the Encrypted Attributes Data Set instance.

2. RSA key transport of the content-encryption keys is specified as a requirement in the European Prestandard ENV 13608-2: Health Informatics - Security for healthcare communication – Part 2: Secure data objects.

  1. No requirements on the size of the asymmetric key pairs used for RSA key transport are defined in this confidentiality scheme. Implementations claiming conformance to the Basic Application Level Confidentiality Profile as a de-identifier shall always protect (e.g. encrypt and replace) the SOP Instance UID (0008,0018) Attribute as well as all references to other SOP Instances, whether contained in the main dataset or embedded in an Item of a Sequence of Items, that could potentially be used by unauthorized entities to identify the patient.

Note: In the case of a SOP Instance UID embedded in an item of a sequence, this means that the enclosing Attribute in the top-level data set must be encrypted in its entirety.

6. The attribute Patient Identity Removed (0012,0062) shall be replaced or added to the dataset with a value of YES, and one or more codes from PS 3.16 CID 7050 De-identification Method corresponding to the profile and options used shall be added to De-identification Method Code Sequence (0012,0064). A text string describing the method used may also be inserted in or added to De-identification Method (0012,0063), but is not required.

7. If the Dataset being de-identified is being stored within a DICOM File, then the File Meta Information including the 128 byte preamble, if present, shall be replaced with a description of the de-identifying application. Otherwise, there is a risk that identity information may leak through unmodified File Meta Information or preamble. See PS 3.10.

The Attributes listed in Table E.1-1 for each profile are contained in Standard IODs, or may be contained in Standard Extended IODs. An implementation claiming conformance to an Application Level Confidentiality Profile as a de-identifier shall protect or retain all instances of the Attributes listed in Table E.1-1, whether contained in the main dataset or embedded in an Item of a Sequence of Items. The following action codes are used in the table:

These action codes are applicable to both Sequence and non-Sequence attributes; in the case of Sequences, the action is applicable to the Sequence and all of its contents. Cleaning a sequence (“C” action) may entail either changing values of attributes within that Sequence when the meaning of the Sequence within the context of its use in the IOD is understood, or recursively applying the profile rules to each Dataset in each Item of the Sequence. Keeping a Sequence (“K” action) requires recursively applying the profile rules to each Dataset in each Item of the Sequence (for example, in order to remap any UIDs contained within that sequence).

A requirement for an Option, when implemented, overrides any requirement for the underlying Profile.

Notes: 1. The Attributes listed in Table E.1-1 may not be sufficient to guarantee confidentiality of patient identity. In particular, identifying information may be contained in Private Attributes, new Standard Attributes, Retired Standard Attributes and additional Standard Attributes not present in Standard Composite IODs (as defined in PS 3.3) but used in Standard Extended SOP Classes. Table E.1-1 indicates those Attributes that are used in Standard Composite IODs as well as those Attributes that are Retired. Also included in Table E.1-1 are some Elements that are not normally found in a Dataset, but are used in Commands, Directories and Meta Information Headers, but which could be misused within Private Sequences. Textual Content Items of Structured Reports, textual annotations of Presentation States, Curves and Overlays are specifically addressed. It is the responsibility of the de-identifier to ensure that all identifying information is removed.

2. It should be noted that conformance to an Application Level Confidentiality Profile does not necessarily guarantee confidentiality. For example, if an attacker already has access to the original images, the Pixel Data could be matched, though the probability and impact of such a threat may be deemed to be negligible. If the Encrypted Attributes Sequence is used, it should be understood that any encryption scheme may be vulnerable to attack. Also, an organization’s Security Policy and Key Management policy are recognized to have a much greater impact on the effectiveness of protection.

3. National and local regulations, which may vary, might require that additional attributes be de-identified, though the Profiles and Options have been designed to be sufficient to satisfy known regulations without compromising the usefulness of the de-identified instances for their intended purpose.

4.Table E.1-1 is normative, but it is subject to extension as the DICOM Standard evolves and other similar Attributes are added to IODs. De-identifiers may take this extensibility into account, for example, by considering handling all dates and times on the basis of their Value Representation of DT, DA or TM, rather than just those date and time Attributes lists.

5. The Profiles and Options do not specify whether the design of a de-identifier should be to remove what is know to be a risk of identity leakage, or to retain only what is known to be safe. The former approach may fail when the standard is extended, or when a vendor adds unanticipated standard or private attributes, whilst the latter requires an extensive, if not complete, comparison of each instance with the Information Object Definitions in PS 3.3 to avoid discarding required or useful information. Table E.1-1 defines the minimum actions required for conformance.

6. De-identification of Private SOP Classes is not defined.

7. The “C” (clean) action is specified not only for string VRs, but also for Code Sequences, since the use of private or local codes and non-standard code meanings may potentially cause identity leakage.

8. The Digital Signatures Sequences needs to be removed because it contains the certificate of the signer; theoretically the signature could be verified and the object re-signed by the de-identifier itself with its own certificate, but this is not required by the Standard.

9. In general, there are no CS VR Attributes in this table, since it is usually safe to assume that code strings do not contain identifying information.

10. In general, there are no Code Sequence Attributes in this table, since it is usually safe to assume that coded sequence entries, including private codes, do not contain identifying information. Exceptions are codes for providers and staff.

11. The Clean Pixel Data and Clean Recognizable Visual Features Options are not listed in this table, since they are defined by descriptions of operations on the Pixel Data itself. The Clean Pixel Data option may be applied to the Pixel Data within the Icon Image Sequence, or more likely the Icon Image Sequence may be recreated entirely once the Pixel Data of the main Dataset has been cleaned. The Icon Image Sequence is to be removed when its Pixel Data cannot be cleaned.

12. The Original Attributes Sequence (0400,0561) (which in turn contains the Modified Attributes Sequence (0400,0550)) generally needs to be removed, because it may contain unencrypted copies of other Attributes that may have been modified (e.g., coerced to use local identifiers and names during import of foreign images); an alternative approach would be to selectively modify its contents. This is distinct from the use of the Modified Attributes Sequence (0400,0550) within the Encrypted Attributes Sequence (0400,0500).

13. Table E.1-1 distinguishes Attributes that are in standard Composite IODs defined in PS 3.3 from those that are not; some Attributes are defined in PS 3.3 for other IODs, or have a specific usage other than in the top level Dataset of a Composite IOD, but are (mis-)used by implementers in instances as a Standard Extended SOP Class at other levels than as defined by the Standard. Any such Attributes encountered may be removed without compromising the conformance of the instance with the standard IOD. For example, Verifying Observer Sequence (0040,A073) is only defined in structured report IODs and hence is described in Table E.1-1 as D since it is Type 1C; if encountered in an image instance, it should simply be removed (treated as X).

Table E.1-1 Application Level Confidentiality Profile Attributes

Attribute Name Tag Retired (from PS 3.6) In Std. Comp. IOD (from PS 3.3) Basic Profile
(7053,xx00) Philips PET Private Group DS 1 SUV Factor – Multiplying stored pixel values by Rescale Slope then this factor results in SUVbw in g/l
(7053,xx09) Philips PET Private Group DS 1 Activity Concentration Factor – Multiplying stored pixel values by Rescale Slope then this factor results in MBq/ml.
(00E1,xx21) ELSCINT1 DS 1 DLP
(01E1,xx26) ELSCINT1 CS 1 Phantom Type
(01E1,xx50) ELSCINT1 DS 1 Acquisition Duration
(01F1,xx01) ELSCINT1 CS 1 Acquisition Type
(01F1,xx07) ELSCINT1 DS 1 Table Velocity
(01F1,xx26) ELSCINT1 DS 1 Pitch
(01F1,xx27) ELSCINT1 DS 1 Rotation Time
(0019,xx23) GEMS_ACQU_01 DS 1 Table Speed [mm/rotation]
(0019,xx24) GEMS_ACQU_01 DS 1 Mid Scan Time [sec]
(0019,xx27) GEMS_ACQU_01 DS 1 Rotation Speed (Gantry Period)
(0043,xx27) GEMS_PARM_01 SH 1 Scan Pitch Ratio in the form "n.nnn:1"
(0045,xx01) GEMS_HELIOS_01 SS 1 Number of Macro Rows in Detector
(0045,xx02) GEMS_HELIOS_01 FL 1 Macro width at ISO Center
(0903,xx10) GEIIS PACS US 1 Reject Image Flag
(0903,xx11) GEIIS PACS US 1 Significant Flag
(0903,xx12) GEIIS PACS US 1 Confidential Flag
(2001,xx03) Philips Imaging DD 001 FL 1 Diffusion B-Factor
(2001,xx04) Philips Imaging DD 001 CS 1 Diffusion Direction
(0019,xx0C) SIEMENS MR HEADER IS 1 B Value
(0019,xx0D) SIEMENS MR HEADER CS 1 Diffusion Directionality
(0019,xx0E) SIEMENS MR HEADER FD 3 Diffusion Gradient Direction
(0019,xx27) SIEMENS MR HEADER FD 6 B Matrix
(0043,xx39) GEMS_PARM_01 IS 4 1st value is B Value

2. One approach to retaining Private Attributes safely, either when the VR is encoded explicitly or known from a data dictionary (such as may be derived from published DICOM Conformance Statements or previously encountered instances, perhaps by adaptively extending the data dictionary as new explicit VR instances are received), is to retain those Attributes that are numeric only. For example, one might retain US, SS, UL, SS, FL and FD binary values, and IS and DS string values that contain only valid numeric characters. One might assume that other string Value Representations are unsafe in the absence of definite confirmation from the vendor to the contrary; code strings (CS) may be an exception. Bulk binary data in OB Value representations is particularly unsafe, and may often contain entire proprietary format headers in binary or text or XML form that includes the patient’s name and other identifying information.

The safe private attributes that are retained shall be described in the Conformance Statement.