C.12.1.1.2 Specific Character Set

Specific Character Set (0008,0005) identifies the Character Set that expands or replaces the Basic Graphic Set (ISO 646) for values of Data Elements that have Value Representation of SH, LO, ST, PN, LT or UT. See PS 3.5.

If the Attribute Specific Character Set (0008,0005) is not present or has only a single value, Code Extension techniques are not used. Defined terms for the Attribute Specific Character Set (0008,0005), when single valued, are derived from the International Registration Number as per ISO 2375 (e.g., ISO_IR 100 for Latin alphabet No. 1). See Table C.12-2.

Table C.12-2 DEFINED TERMS FOR SINGLE-BYTE CHARACTER SETS WITHOUT CODE EXTENSIONS

Character Set Description Defined Term ISO registration number Number of characters Code element Character Set
Default repertoire none ISO-IR 6 94 G0 ISO 646
Latin alphabet No. 1 ISO_IR 100 ISO-IR 100 96 G1 Supplementary set of ISO 8859
ISO-IR 6 94 G0 ISO 646
Latin alphabet No. 2 ISO_IR 101 ISO-IR 101 96 G1 Supplementary set of ISO 8859
ISO-IR 6 94 G0 ISO 646
Latin alphabet No. 3 ISO_IR 109 ISO-IR 109 96 G1 Supplementary set of ISO 8859
ISO-IR 6 94 G0 ISO 646
Latin alphabet No. 4 ISO_IR 110 ISO-IR 110 96 G1 Supplementary set of ISO 8859
ISO-IR 6 94 G0 ISO 646
Cyrillic ISO_IR 144 ISO-IR 144 96 G1 Supplementary set of ISO 8859
ISO-IR 6 94 G0 ISO 646
Arabic ISO_IR 127 ISO-IR 127 96 G1 Supplementary set of ISO 8859
ISO-IR 6 94 G0 ISO 646
Greek ISO_IR 126 ISO-IR 126 96 G1 Supplementary set of ISO 8859
ISO-IR 6 94 G0 ISO 646
Hebrew ISO_IR 138 ISO-IR 138 96 G1 Supplementary set of ISO 8859
ISO-IR 6 94 G0 ISO 646
Latin alphabet No. 5 ISO_IR 148 ISO-IR 148 96 G1 Supplementary set of ISO 8859
ISO-IR 6 94 G0 ISO 646
Japanese ISO_IR 13 ISO-IR 13 94 G1 JIS X 0201: Katakana
ISO-IR 14 94 G0 JIS X 0201: Romaji
Thai ISO_IR 166 ISO-IR 166 88 G1 TIS 620-2533 (1990)
ISO-IR 6 94 G0 ISO 646

Note: To use the single-byte code table of JIS X0201, the value of attribute Specific Character Set (0008,0005), value 1 should be ISO_IR 13. This means that ISO-IR 13 is designated as the G1 code element which is invoked in the GR area. It should be understood that, in addition, ISO-IR 14 is designated as the G0 code element and this is invoked in the GL area.

If the attribute Specific Character Set (0008,0005) has more than one value, Code Extension techniques are used and Escape Sequences may be encountered in all character sets. Requirements for the use of Code Extension techniques are specified in PS 3.5. In order to indicate the presence of Code Extension, the Defined Terms for the repertoires have the prefix “ISO 2022”, e.g., ISO 2022 IR 100 for the Latin Alphabet No. 1. See Table 12-3 and Table 12-4. Table 12-3 describes single-byte character sets for value 1 to value n of the attribute Specific Character Set (0008,0005), and Table 12-4 describes multi-byte character sets for value 2 to value n of the attribute Specific Character Set (0008,0005).

Note: A prefix other than “ISO 2022” may be needed in the future if other Code Extension techniques are adopted.

The same character set shall not be used more than once in Specific Character Set (0008,0005).

Note: For example, the values “ISO 2022 IR 100\ISO 2022 IR 100” or “ISO_IR 100\ISO 2022 IR 100” are redundant and not permitted.

Table C.12-3DEFINED TERMS FOR SINGLE-BYTE CHARACTER SETS WITH CODE EXTENSIONS

Character Set Description Defined Term Standard for Code Extension ESC sequence ISO registration number Number of char-acters Code element Character Set
Default repertoire ISO 2022 IR 6 ISO 2022 ESC 02/08 04/02 ISO-IR 6 94 G0 ISO 646
Latin alphabet No. 1 ISO 2022 IR 100 ISO 2022 ESC 02/13 04/01 ISO-IR 100 96 G1 Supplementary set of ISO 8859
ISO 2022 ESC 02/08 04/02 ISO-IR 6 94 G0 ISO 646
Latin alphabet No. 2 ISO 2022 IR 101 ISO 2022 ESC 02/13 04/02 ISO-IR 101 96 G1 Supplementary set of ISO 8859
ISO 2022 ESC 02/08 04/02 ISO-IR 6 94 G0 ISO 646
Latin alphabet No. 3 ISO 2022 IR 109 ISO 2022 ESC 02/13 04/03 ISO-IR 109 96 G1 Supplementary set of ISO 8859
ISO 2022 ESC 02/08 04/02 ISO-IR 6 94 G0 ISO 646
Latin alphabet No. 4 ISO 2022 IR 110 ISO 2022 ESC 02/13 04/04 ISO-IR 110 96 G1 Supplementary set of ISO 8859
ISO 2022 ESC 02/08 04/02 ISO-IR 6 94 G0 ISO 646
Cyrillic ISO 2022 IR 144 ISO 2022 ESC 02/13 04/12 ISO-IR 144 96 G1 Supplementary set of ISO 8859
ISO 2022 ESC 02/08 04/02 ISO-IR 6 94 G0 ISO 646
Arabic ISO 2022 IR 127 ISO 2022 ESC 02/13 04/07 ISO-IR 127 96 G1 Supplementary set of ISO 8859
ISO 2022 ESC 02/08 04/02 ISO-IR 6 94 G0 ISO 646
Greek ISO 2022 IR 126 ISO 2022 ESC 02/13 04/06 ISO-IR 126 96 G1 Supplementary set of ISO 8859
ISO 2022 ESC 02/08 04/02 ISO-IR 6 94 G0 ISO 646
Hebrew ISO 2022 IR 138 ISO 2022 ESC 02/13 04/08 ISO-IR 138 96 G1 Supplementary set of ISO 8859
ISO 2022 ESC 02/08 04/02 ISO-IR 6 94 G0 ISO 646
Latin alphabet No. 5 ISO 2022 IR 148 ISO 2022 ESC 02/13 04/13 ISO-IR 148 96 G1 Supplementary set of ISO 8859
ISO 2022 ESC 02/08 04/02 ISO-IR 6 94 G0 ISO 646
Japanese ISO 2022 IR 13 ISO 2022 ESC 02/0 9 04/09 ISO-IR 13 94 G1 JIS X 0201: Katakana
ISO 2022 ESC 02/08 04/10 ISO-IR 14 94 G0 JIS X 0201: Romaji
Thai ISO 2022 IR 166 ISO 2022 ESC 02/13 05/04 ISO-IR 166 88 G1 TIS 620-2533 (1990)
ISO 2022 ESC 02/08 04/02 ISO-IR 6 94 G0 ISO 646

Note: If the attribute Specific Character Set (0008,0005) has more than one value and value 1 is empty, it is assumed that value 1 is ISO 2022 IR 6.

Table C.12-4DEFINED TERMS FOR MULTI-BYTE CHARACTER SETS WITH CODE EXTENSIONS

Character Set Description Defined Term Standard for Code Extension ESC sequence ISO registration number Number of char-acters Code element Character Set
Japanese ISO 2022 IR 87 ISO 2022 ESC 02/04 04/02 ISO-IR 87 942 G0 JIS X 0208: Kanji
ISO 2022 IR 159 ISO 2022 ESC 02/04 02/08 04/04 ISO-IR 159 942 G0 JIS X 0212: Supplementary Kanji set
Korean ISO 2022 IR 149 ISO 2022 ESC 02/04 02/09 04/03 ISO-IR 149 942 G1 KS X 1001: Hangul and Hanja

There are multi-byte character sets that prohibit the use of Code Extension Techniques. The Unicode character set used in ISO 10646, when encoded in UTF-8, and the GB18030 character set, encoded per the rules of GB18030, both prohibit the use of Code Extension Techniques. These character sets may only be specified as value 1 in the Specific Character Set (0008,0005) attribute and there shall only be one value. The minimal length UTF-8 encoding shall always be used for ISO 10646.

Notes: 1. The ISO standards for 10646 now prohibit the use of anything but the minimum length encoding for UTF-8. UTF-8 permits multiple different encodings, but when used to encode Unicode characters in accordance with ISO 10646-1 and 10646-2 (with extensions) only the minimal encodings are legal.

2. The representation for the characters in the DICOM Default Character Repertoire is the same single byte value for the Default Character Repertoire, ISO 10646 in UTF-8, and GB18030. It is also the 7-bit US-ASCII encoding.

Table C.12-5DEFINED TERMS FOR MULTI-BYTE CHARACTER SETS WITHOUT CODE EXTENSIONS

Character Set Description Defined Term
Unicode in UTF-8 ISO_IR 192
GB18030 GB18030