s

Country flag emojis in Python using the unicodedata module

By Angela C

October 15, 2021 in software html

Reading time: 2 minutes.

Python’s unicodedata module provides access to the Unicode Character Database (UCD) which defines character properties for all Unicode characters. 1

unicodedata Functions include lookup to lookup a character by name. If you know the regional indicator symbols for a country code, for example IE for Ireland, then you can use the lookup function to return the corresponding character. Concatenating the characters will result in a flag emoji for the country.

For example:

lookup('REGIONAL INDICATOR SYMBOL LETTER i') + lookup('REGIONAL INDICATOR SYMBOL LETTER e')

‘🇮🇪’


The unicodedata module uses the same names and symbols as defined by Unicode Standard Annex #44, “Unicode Character Database” 2

The Unicode Character Database3 annex provides the core documentation for the Unicode Character Database (UCD). It describes the layout and organization of the Unicode Character Database and how it specifies the formal definitions of the Unicode Character Properties.


Flags / Emojis

I came across this when reading through the Interactive Dashboards and Data Apps with Plotly and Dash book by Elias Dabbas. 4

Ccountry flag emojis are a concatenation of two special letters whose name is “REGIONAL INDICATOR SYMBOL LETTER `.

from unicodedata import lookup

If you have the two-letter code of a country such as ie for Ireland: lookup the REGIONAL INDICATOR SYMBOL LETTER from the unicodedata module for each letter, then concatenate to create the flag symbol.

lookup('REGIONAL INDICATOR SYMBOL LETTER i') + lookup('REGIONAL INDICATOR SYMBOL LETTER e')

lookup('REGIONAL INDICATOR SYMBOL LETTER i')

‘🇮’

lookup('REGIONAL INDICATOR SYMBOL LETTER e')

‘🇪’

lookup('REGIONAL INDICATOR SYMBOL LETTER i') + lookup('REGIONAL INDICATOR SYMBOL LETTER e')

‘🇮🇪’


Each country has a unique 2-letter country code published by the International Organization for Standardization (ISO), to represent countries, dependent territories, and special areas of geographical interest5.

Then there is a unique 2-character code for every country.

Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes.6

The regional indicator symbols are a set of 26 alphabetic Unicode characters (A–Z) intended to be used to encode ISO 3166-1 alpha-2 two-letter country codes in a way that allows optional special treatment. 7 These were defined in October 2010 as part of the Unicode 6.0 support for emoji, as an alternative to encoding separate characters for each country flag.


https://www.unicode.org/Public/UCD/latest/ucd/


  1. https://docs.python.org/3/library/unicodedata.html?highlight=unicodedata ↩︎

  2. https://www.unicode.org/Public/13.0.0/ucd/UnicodeData.txt ↩︎

  3. https://www.unicode.org/reports/tr44/ ↩︎

  4. Dabbas, Elias. Interactive Dashboards and Data Apps with Plotly and Dash: Harness the power of a fully fledged frontend web framework in Python – no JavaScript required. Packt Publishing. Kindle Edition. ↩︎

  5. https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#Current_codes ↩︎

  6. https://en.wikipedia.org/wiki/Enclosed_Alphanumeric_Supplement ↩︎

  7. https://en.wikipedia.org/wiki/Regional_indicator_symbol ↩︎