Cisco Blogs

Where in the World is xn--mgberp4a5d4ar?

May 20, 2010 - 3 Comments

The deployment of Internationalized Domain Names (IDNs) reached the root of the DNS infrastructure in recent weeks with the creation of four internationalized Country Code Top Level Domains (ccTLDs). Mentioned in our Cyber Risk Report, these newly deployed IDNs represent Egypt, Saudi Arabia, the United Arab Emirates, and the Russian Federation.

IDNs leverage Unicode to display various non-Latin scripts, such as Arabic or Chinese, within computer applications. An encoding syntax called Punycode bidirectionally transforms the Unicode that is needed to represent these scripts into the subset of the Latin script that is used for domain names. This essentially reduces the scripts of the world into a form suitable for processing by applications that have no understanding of Unicode. This, for example, transforms the newly minted TLD for Saudi Arabia, ‏السعودية, into xn--mgberp4a5d4ar so that it can be processed similarly to any ASCII-based domain name.

Punycode has several advantageous characteristics. For example, it encodes the discrete components of a DNS name individually making it possible to encode only part of a DNS name. Encoded name components are prefixed with xn--. One such partially-encoded DNS name is which, with one encoded and one unencoded label, represents the Japan Registry Services. This partial encoding has allowed the use of local languages in parts of the world for several years without support for IDNs at the DNS root.

Allowing users to connect with one another or online resources without the constraint or burden of Latin characters is certainly a good thing. However, there are security risks to be understood.

What you see may not be what you get. It is possible to represent many of the world’s scripts using Unicode. This makes it possible to present characters from different scripts that appear identical to one another. However, when these characters are compared by a computer they are as different as ‘A’ and ‘z’. This particular risk is not new and already existed within the Latin script where, for example, the digit ‘1’ and letter ‘L’ can be appear identical when using some fonts. Discussions on this topic began in 2002 and ultimately presented two visually similar, albeit fake domains for and

Foreign languages produce foreign URLs. We have been educating those around us to avoid following links that look like hxxp:// and to use preview tools when faced with a shortened URL. However, is a URL composed in a foreign language somehow more trustworthy? How can I as an American monoglot discern a legitimate URL in Chinese from the Chinese equivalent of hxxp:// (one of the pseudo-random domains used by Conficker)? I cannot, and so I must place all IDNs into the category of URLs that I do not trust.

New TLDs create registration activity and opportunity. When new TLDs are deployed there will often be a rush to create desirable domain names within that TLD. It is expected that as the deployment of new TLDs continue, this trend will also continue. It is common for organizations to register domains in multiple TLDs and this process will be more complex as a result of the disparate scripts involved with the introductions of IDNs. Furthermore, the range of available scripts will present opportunities as phishers look to capitalize on unclaimed international brands.

Any time there are advancements in technology we should take a minute to understand the associated security risks. Internationalized Domain Names are no different. The mindset that we apply when faced with untrusted URLs and odd URLs from trusted sources should also be applied to IDNs.

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.


  1. Fantastic point, Christopher. Thank you!

  2. Good piece Tim. While those who have fluency with a Latin Alphabetic language may view the introduction of TLDs as both an operational and security challenge, for those whose language is not comprised of a Latin Alphabet have had to address this challenge since the introduction of the internet. From my optic, this is an opportunity to widen the accessibility of the internet, and no doubt utilities and tools will evolve to help those on both sides of the linguistic challenge to successfully cross the chasm.

  3. The flood of gTLDs is not going to help matters involving homograph attacks. I’m waiting to see someone try to propose a .com homograph for all sorts of ‘fun’.