Unicode Encoder Decoder

Convert text to Unicode code points (U+XXXX) and back. Supports BMP and supplementary characters (emojis, rare scripts).

  1. Home
  2. Encoder & Decoder
  3. Unicode Encoder & Decoder

Encode

Decode

What is Unicode Encoding?

Unicode is a universal character encoding standard that assigns a unique number (called a code point) to every character across all writing systems. Code points are typically written in the format U+XXXX, where XXXX is a hexadecimal number. The range U+0000 to U+FFFF is the Basic Multilingual Plane (BMP), which covers most common characters. Supplementary characters (like emojis) use code points from U+10000 to U+10FFFF.

For example, the Latin letter A has code point U+0041, the Euro sign is U+20AC, and the globe emoji 🌍 is U+1F30D.

Why Use Unicode Encoding?

  • Universal representation of text across all languages and scripts
  • Foundation of modern web technologies (HTML, JSON, JavaScript)
  • Essential for internationalization (i18n) and localization (l10n)
  • Used in escape sequences for strings in programming languages
  • Understanding Unicode is critical for proper text processing

How to Use This Unicode Encoder/Decoder

  1. Encode text — Type or paste text into the left panel, then click Encode to convert each character to its Unicode code point.
  2. Decode Unicode — Type or paste U+XXXX values into the right panel, then click Decode to convert them back to text.
  3. Toggle options — Use Uppercase, U+ prefix, and Add spaces to control the output format.
  4. Swap & Clear — Click Swap to exchange encode/decode values, Clear All to reset everything.

The encoder converts each character to its full Unicode code point (up to 6 hex digits). The decoder accepts formats like U+0041, 0041, \\u0041, and 0x0041.

Common Use Cases

  • Web development — Use Unicode escape sequences (\\u0041) in JavaScript strings for non-ASCII characters.
  • Internationalization — Identify code points of characters from different languages for proper encoding support.
  • Debugging text encoding — Diagnose mojibake (garbled text) by examining the underlying code points.
  • Data processing — Convert between Unicode code points and text for CSV, JSON, or database operations.
  • CTF & puzzles — Decode hidden messages encoded as Unicode code point sequences.

Frequently Asked Questions

What is the difference between Unicode and UTF-8?

Unicode assigns a unique code point to each character. UTF-8 is a encoding scheme that converts those code points into a sequence of bytes for storage and transmission. For example, the character (U+20AC) is encoded as the three bytes E2 82 AC in UTF-8.

What are surrogate pairs?

Surrogate pairs are a mechanism in UTF-16 to represent supplementary characters (code points above U+FFFF). Two 16-bit code units (U+D800-U+DFFF) are combined to represent one character. JavaScript internally uses UTF-16, so characters like emojis appear as two surrogate pairs. This tool handles surrogates and converts them to the proper code point.

What formats does the decoder accept?

The decoder accepts multiple formats: U+0041, u+0041, \\u0041, 0x0041, 0X0041, or plain 0041. Hex digits can be uppercase or lowercase, and values can be separated by spaces, commas, or other whitespace.

Why does JavaScript give me two code units for emojis?

In JavaScript, strings are UTF-16 encoded. Emojis like 🌍 (U+1F30D) have code points above U+FFFF, so JavaScript represents them as two surrogate code units: 0xD83C 0xDF0D. This encoder detects surrogates and combines them into the correct code point (U+1F30D).

What is the maximum code point value?

The Unicode standard defines code points from U+0000 to U+10FFFF (1,114,112 possible values). This tool supports the full range, though code points above U+FFFF require surrogate pair handling in JavaScript.

Last updated: 24 Jun 2026