| 2
IDNs UAMicro-Learning Module Objectives:
◉ The "Introducing Internationalized Domain Names (IDNs)" micro-learning
module provides an introduction to IDNs and their significance in the global
Internet ecosystem.
◉ At the end of this module, students should be able to:
− Understand the concept and significance of Internationalized Domain
Names (IDNs) in the domain name system (DNS);
− Identify the different types of top-level domains (TLDs), including
generic TLDs (gTLDs) and country-code TLDs (ccTLDs), and their
role in organizing domain names;
− Explain the process of converting Unicode-based domain names into
an ASCII-compatible format using the Punycode Algorithm (RFC
3492);
− Configure DNS servers to support IDN-encoded domain names,
ensuring accurate resolution and functionality;
− Differentiate between IDNA2003 and IDNA2008 standards and their
impact on handling IDNs; and
− Use network troubleshooting commands such as dig, traceroute,
nslookup, mtr, and ping to effectively diagnose and resolve
IDN-related issues specifically in the context of U-labels.
3.
| 3
Introducing theroot zone: TLD, gTLD, ccTLD:
◉ Top-level domains (TLDs):
− The highest level in the DNS tree structure.
● Country Code Top-Level Domains (ccTLDs): ccTLDs are
TLDs that are specifically designated for individual countries
or geographic regions.
● Generic Top-Level Domains (gTLDs): gTLDs are TLDs that
are not inherently tied to any specific country or geographic
region.
◉ What is the Root Zone?
− The root zone refers to the highest level in the hierarchical
structure of the Domain Name System (DNS).
− It is the starting point for resolving domain names on the internet.
− In summary, the root zone is the highest level in the DNS
hierarchy and contains the authoritative information about the
TLDs.
4.
| 4
What isIDN?
◉ IDN stands for Internationalized Domain Name:
− It enable internet users to register and access domain names in
their native languages.
− Makes the internet more inclusive and accessible to individuals
and communities worldwide.
− Traditionally, domain names were limited to a set of characters
from the ASCII character set.
− IDNs, domain names can now include characters from a wide
range of scripts, such as Cyrillic, Arabic, Chinese, Japanese,
Korean, and many others.
5.
| 5
Why dowe need IDNs?
◉ IDNs have several important benefits:
− Linguistic Diversity.
− Cultural and Linguistic Preservation.
− Localized Internet Presence.
− Localized Email Addresses.
− User-Friendly Experience.
6.
| 6
Unicode BasedDomain Names:
◉ Unicode-based domain names refer to domain names created from
Unicode characters, allowing for representation in various scripts and
languages.
− A-labels (ASCII labels):
● ASCII-compatible encoding of Unicode-based domain names.
● Example: the A-label representation of the U-label "экзампл.
ком" (in Cyrillic script) would be "xn--80aniges7g.xn--j1aef"
(using ASCII characters).
● A-labels are used for internal processing and storage within
the DNS infrastructure
− U-labels (Unicode labels):
● U-labels are the human-readable representation of
Unicode-based domain names.
● They consist of Unicode characters encoded in UTF-8 or
UTF-16 format.
● U-labels allow domain names to be registered and displayed
in languages such as Chinese, Arabic, Cyrillic, and others.
● The U-label 'ουτοπία.ευ' represents the Greek-based domain
name 'ουτοπία.ευ’ (using Greek characters).
7.
| 7
Punycode Algorithm(RFC 3492:.
◉ It is a vital component in the implementation of Internationalized
Domain Names (IDNs).
◉ It prrovides a standardized method for converting Unicode-based
domain names.
◉ Punycode algorithm enables the seamless integration of IDNs.
◉ The Punycode algorithm allows non-ASCII characters to be
represented in a compatible format within the DNS infrastructure.
8.
| 8
Normalization ofDomain Name Strings or Labels:
◉ It refers to the process of transforming and standardizing domain names to a
consistent format
◉ Some key aspects related to the normalization of domain name strings or labels:
− Unicode Normalization: Unicode normalization, specifically Unicode
Normalization Form C (NFC) or Normalization Form D (NFD), is employed
to convert domain names to a standardized Unicode representation.
− Case Normalization:Domain names are case-insensitive, meaning that the
case of letters in a domain name does not affect its resolution.
− IDN Normalization: To ensure interoperability, IDNs undergo
normalization.
● Unicode Normalization Form C (NFC).
● Normalization Form D (NFD)
− Punycode Encoding:
● IDNs encoded in Punycode are first decoded to obtain the original
Unicode representation.
● Normalization is then applied to the Unicode form of the domain name.
9.
| 9
Normalization Requirementsof Domain Name Strings or
Labels:
◉ Some of the factors that make domain name normalization unique:
− Case Insensitivity.
− ASCII Compatibility.
− Label Separators.
− Unicode Normalization
● NFC
− DNS Considerations:
● Label Length- a label needs to be a maximum of 63
characters long.
● Domain Name Length- FQDN needs to be a maximum of
253 characters long.
● Path Length- a complete domain name path needs to be a
maximum of 255 characters long.
10.
| 10
Example- Punycodealgorithm Steps.
◉ Punycode algorithm using an example domain name in Arabic script:
اﻟﻌرﺑﯾﺔ.com (the word "Arabic" in Arabic script):
− Input: the Unicode input string is اﻟﻌرﺑﯾﺔ.com.
− Prepare the Input: Apply NFKC
− Encoding
− ASCII Conversion
− Basic Encoding
− Handling Non-Basic Code Points
− Handling Bias
− Output: the Punycode representation for اﻟﻌرﺑﯾﺔ.com would be
"xn—mgba3a4f16a.com"
− Conversion Complete: the resulting Punycode representation,
"xn--mgba3a4f16a.com" is now ACE, and it can be used within
the existing DNS infrastructure.
11.
| 11
Configuring DNSfor IDN-Encoded Domain Names:
◉ Configure DNS for IDN-encoded domain names:
− selecting the appropriate encoding,
− mapping- establishing a relationship between the IDN-encoded
domain name and its corresponding ASCII-compatible
representation (Punycode), and
− configuring DNS records and name servers accordingly.
12.
| 12
Configuring DNSfor Non-ASCII Domain Names- Key Steps:
◉ Verify Registrar Support: Ensure that your domain name registrar
supports Internationalized Domain Names (IDNs).
◉ Choose IDN Encoding: Select the appropriate encoding mechanism
for representing non-ASCII characters in your domain name.
− The standard encoding scheme is Punycode, which converts
non-ASCII characters into an ASCII-compatible format.
◉ Encode the Domain Name: Apply the chosen encoding mechanism
(e.g., Punycode) to convert your domain name with non-ASCII
characters into an ASCII-compatible representation.
◉ Configure DNS Records: Set up DNS records for your IDN-encoded
domain name.
◉ Name Server Configuration: Configure the name servers for your
domain to handle IDN-encoded domain names.
◉ Test and Validate: Perform thorough testing and validation of your
DNS configuration to ensure that the IDN-encoded domain name
resolves correctly.
13.
| 13
Internationalized DomainNames in Applications (IDNA) 2003
(1/2):
◉ Key features of IDNA2003 include:
○ Punycode Encoding: IDNA2003 utilizes Punycode encoding to
represent non-ASCII characters in an ASCII-compatible format.
○ Unicode Normalization: IDNA2003 requires Unicode normalization to
ensure consistency and avoid variations in equivalent character
sequences.
○ Mapping Characters: IDNA2003 has specific rules for mapping
characters that are not allowed in domain names.
○ Label Length Limit: DNA2003 imposes a limit of 63 characters for
each label within an IDN. This limit includes both ASCII and
non-ASCII characters.
14.
| 14
Internationalized DomainNames in Applications (IDNA) 2003
(2/2):
◉ It has critical limitations:
− Limited Character Set or Restricted Character Repertoire:
IDNA2003 only supports a limited set of Unicode characters known as
Unicode 3.2
− Language-specific Rules: IDNA2003 uses language-specific rules for
handling certain characters.
− It leads to inconsistencies and conflicts: IDNA2003 does not have
built-in script awareness.
− Lack of Script Awareness: IDNA2003 uses a normalization process
called Nameprep, which is based on Unicode 3.2.
− Limited Normalization: IDNA2003 uses a normalization process
called Nameprep, which is based on Unicode 3.2.
− Security Vulnerabilities: IDNA2003 introduced security concerns
related to homograph attacks.
− Lack of error handling: IDNA2003 does not provide explicit error
handling mechanisms.
15.
| 15
Internationalized DomainNames in Applications (IDNA) 2008:
◉ Key features and Improvements of IDNA2008 include:
− Extended Character Set: character set based on Unicode 5.2.
− Backward Compatibility: While IDNA2008 is not fully backward
compatible with IDNA2003, efforts were made to minimize disruption
during the transition.
− Script-aware Processing: IDNA2008 introduces script awareness.
− Contextual Rules: IDNA2008 introduced contextual rules for certain
characters.
− Enhanced Normalization: IDNA2008 improves the normalization
process(NFC, NFD, NFKC, NFKD).
− Bidi (bidirectional) Support: IDNA2008 addresses bidirectional text
handling
− Security enhancements: IDNA2008 introduces several security
measures to mitigate homograph attacks.
− Error handling: IDNA2008 provides explicit error handling
mechanisms.
16.
| 16
Ensuring ProperDisplay: How IDNA2008 Manages Bidirectional
Text Ordering?
◉ To ensure the accurate ordering and representation of mixed-script
domain names, IDNA2008 implements the following steps:
− Directional Formatting Characters (DFCs): IDNA2008 uses
Directional Formatting Characters (DFCs) to control the directionality
of text within a domain name.
− RAL and LAL Labels: In IDNA2008, a domain name is divided into
labels separated by dots. Each label can be either a Right-to-Left (RTL)
label (RAL) or a Left-to-Right (LTR) label (LAL).
− RTL Embedding and LTR Embedding: IDNA2008 introduces the
RTL Embedding (RLE) and LTR Embedding (LRE) DFCs.
− Punctuation Handling: IDNA2008 defines rules for handling
punctuation marks and symbols within bidi text.
− Contextual Rules: IDNA2008 incorporates contextual rules to
determine the correct ordering of characters within a label.
17.
| 17
Potential CompatibilityIssues between IDNA2008 and
IDNA2003:
◉ IDNA2003 Vs 2008: there are some cases where differences in the handling
and interpretation of certain characters- some examples.
− Character Set Differences: IDNA2008 includes an expanded character
set compared to IDNA2003.
− Unicode Normalization: IDNA2008 requires Unicode normalization
(NFC or NFKC) before encoding domain names.
− Contextual Rules: IDNA2008 introduces refined contextual rules for
character handling, especially in the context of bidi text.
− Error Handling: IDNA2008 provides more explicit error handling
mechanisms compared to IDNA2003.
− Mapping and Compatibility: IDNA2008 introduced improved
mapping and compatibility mechanisms for visually similar characters.
18.
| 18
IDN Supportin FTP, HTTP, and HTTPS: Addressing the
Limitations.
◉ IDNA2003 Vs 2008: there are some cases where differences in the handling
and interpretation of certain characters- some examples.
− FTP (File Transfer Protocol):
● Limitations on character encoding.
− HTTP (Hypertext Transfer Protocol):
● Limitations on character encoding.
− HTTPS (HTTP Secure):
● Inherits the same limitations as HTTP.
◉ Note:
− While there are limitations in directly supporting IDNs in protocols like
FTP, HTTP, and HTTPS, the use of Punycode encoding allows for the
representation and usage of IDNs in these protocols.
19.
| 19
Network TroubleshootingCommands for IDNs: dig, traceroute
◉ dig (Domain Information Groper):
− Usage Example: Suppose you want to query information for the U-label
"普遍适用测试.我爱你"
− Convert it to Punycode (A-label) format:
"xn—tkvs6ms8gqpywye3ma.xn—6qq986b3xl".
◉ Traceroute:
○ Usage Example: you can use traceroute in both U-labels orA-labels
(Punycode):
dig
xn--tkvs6ms8gqpywye3ma.xn--6qq986b3xl
traceroute 普遍适用测试.我爱你
traceroute xn—tkvs6ms8gqpywye3ma.xn--6qq986b3xl
20.
| 20
Network TroubleshootingCommands for IDNs: nslookup,ping, and
mtr:
◉ nslookup (Name Server Lookup):
− Usage Example: for instance to query information for the U-label "普
遍适用测试.我爱你.
− Convert it to Punycode (A-label) format:
"xn—tkvs6ms8gqpywye3ma.xn—6qq986b3xl".
◉ mtr (My Traceroute):
− Usage Example: for instance to diagnose network issues with the
U-label “普遍适用测试.我爱你".
− convert it to Punycode representation as
“xn—tkvs6ms8gqpywye3ma.xn—6qq986b3xl”.
◉ Ping:
nslookup
xn--tkvs6ms8gqpywye3ma.xn—6qq986b3xl
mtr xn—tkvs6ms8gqpywye3ma.xn--6qq986b3xl
ping 普遍适用测试.我爱你
ping xn--tkvs6ms8gqpywye3ma.xn--6qq986b3xl
21.
| 21
Reference:
[1]. InternetCorporation for Assigned Names and Numbers (ICANN). (2023,
October 4). Top-level domains (TLDs). Retrieved from
https://www.icann.org/resources/pages/tlds-2012-02-25-en.
[2]. 1.ICANN. (2023, November). Guidelines for the Implementation of
Internationalized Domain Names, Version 3.0. Retrieved November 25, 2023,
from https://www.icann.org/resources/pages/idn-guidelines-2011-09-02-en.
[3]. Internet Assigned Numbers Authority (IANA). (2022, May 23). Root zone.
Retrieved from https://www.iana.org/domains/root:
https://www.iana.org/domains/root.
[4]. International Corporation for Assigned Names and Numbers (ICANN).
(2023, October 4). Internationalized Domain Names (IDNs). Retrieved from
https://www.icann.org/resources/pages/idn-2012-02-25-en.
[5]. International Corporation for Assigned Names and Numbers (ICANN).
(2023, October 4). Unicode-based domain names (IDNs). Retrieved from
https://www.icann.org/resources/pages/idn-2012-02-25-en.
22.
| 22
Reference:
[6]. Alvestrand,H., & Rose, M. (2003, March). Punycode: A Bootstring encoding
of Unicode for Internationalized Domain Names in Applications (IDNA).
IETF. https://doi.org/10.17487/RFC3492.
[7]. Internet Engineering Task Force (IETF). (2003, March). Internationalizing
Domain Names in Applications (IDNA): Requirements and Solutions.
Retrieved from https://doi.org/10.17487/RFC3492:
https://doi.org/10.17487/RFC3492.
[9]. Internet Engineering Task Force (IETF). (2008, June). Internationalizing
Domain Names in Applications (IDNA): Current Status. Retrieved from
https://doi.org/10.17487/RFC5893: https://doi.org/10.17487/RFC5893.