Where in the World is xn--mgberp4a5d4ar?
The deployment of Internationalized Domain Names (IDNs) reached the root of the DNS infrastructure in recent weeks with the creation of four internationalized Country Code Top Level Domains (ccTLDs). Mentioned in our Cyber Risk Report, these newly deployed IDNs represent Egypt, Saudi Arabia, the United Arab Emirates, and the Russian Federation.
IDNs leverage Unicode to display various non-Latin scripts, such as Arabic or Chinese, within computer applications. An encoding syntax called Punycode bidirectionally transforms the Unicode that is needed to represent these scripts into the subset of the Latin script that is used for domain names. This essentially reduces the scripts of the world into a form suitable for processing by applications that have no understanding of Unicode. This, for example, transforms the newly minted TLD for Saudi Arabia, السعودية, into xn--mgberp4a5d4ar so that it can be processed similarly to any ASCII-based domain name.
Punycode has several advantageous characteristics. For example, it encodes the discrete components of a DNS name individually making it possible to encode only part of a DNS name. Encoded name components are prefixed with xn--. One such partially-encoded DNS name is xn--vckfdb7e3c7hma3m9657c16c.jp which, with one encoded and one unencoded label, represents the Japan Registry Services. This partial encoding has allowed the use of local languages in parts of the world for several years without support for IDNs at the DNS root.
Allowing users to connect with one another or online resources without the constraint or burden of Latin characters is certainly a good thing. However, there are security risks to be understood.
What you see may not be what you get. It is possible to represent many of the world’s scripts using Unicode. This makes it possible to present characters from different scripts that appear identical to one another. However, when these characters are compared by a computer they are as different as ‘A’ and ‘z’. This particular risk is not new and already existed within the Latin script where, for example, the digit ‘1’ and letter ‘L’ can be appear identical when using some fonts. Discussions on this topic began in 2002 and ultimately presented two visually similar, albeit fake domains for paypal.com and Microsoft.com.
Foreign languages produce foreign URLs. We have been educating those around us to avoid following links that look like hxxp://tvxwoajfwad.info and to use preview tools when faced with a shortened URL. However, is a URL composed in a foreign language somehow more trustworthy? How can I as an American monoglot discern a legitimate URL in Chinese from the Chinese equivalent of hxxp://tvxwoajfwad.info (one of the pseudo-random domains used by Conficker)? I cannot, and so I must place all IDNs into the category of URLs that I do not trust.
New TLDs create registration activity and opportunity. When new TLDs are deployed there will often be a rush to create desirable domain names within that TLD. It is expected that as the deployment of new TLDs continue, this trend will also continue. It is common for organizations to register domains in multiple TLDs and this process will be more complex as a result of the disparate scripts involved with the introductions of IDNs. Furthermore, the range of available scripts will present opportunities as phishers look to capitalize on unclaimed international brands.
Any time there are advancements in technology we should take a minute to understand the associated security risks. Internationalized Domain Names are no different. The mindset that we apply when faced with untrusted URLs and odd URLs from trusted sources should also be applied to IDNs.