Composing Text in Emacs: Unicode, Emojis, and the Power of C-x 8

One of the things that never stops fascinating me about Emacs is how deep the rabbit hole goes when your start exploring... well... anything!
You come for a text editor.
You stay because it quietly turns into a laboratory and eventually the custom interface for your entire computer experience.
Most people associate emojis with mobile keyboards, chat apps, or
graphical pickers. Emacs takes a completely different approach:
Unicode as a first-class citizen. And at the center of this
philosophy lives a deceptively simple prefix C-x 8.
Altough not obligatory, In order to explore this feature, try enabling
which-key with M-x which-key-mode RET, then issue C-x 8 and wait
for the options.
If you've ever wondered how to compose, combine, and bend Unicode inside Emacs, this post is for you.
C-x 8
In Emacs, C-x 8 is the entry point to character composition.
It allows you to:
→ Insert Unicode characters by name
→ Compose accented characters manually
→ Combine multiple code points into a single visible glyph
→ Create emoji sequences using Zero Width Joiners
→ Abuse combining characters in ways that would scare typography purists
This is not about convenience, this is about control.
Inserting Unicode by Name
The most straightforward usage is:
C-x 8 RETThis prompts you for a Unicode character name.
Try typing:
LATIN SMALL LETTER A WITH ACUTE→ á
Or:
GREEK SMALL LETTER LAMBDA→ λ
Tip: If you have something like vertico-mode or
icomplete-vertical-mode turned ON, you get a preview of your search.
You can also type the code directly, even if Emacs doesn't show it as a preview, like:
274CReturns:
→ ❌
Composing Accents Manually (Base + Combining Marks)
Unicode doesn’t require precomposed characters like á or
ç. Instead, it allows you to build them from base characters +
combining marks.
This is where Emacs shines again:
Using M-x compose-region.
Start by tiping a and ' for example, you'll get:
a´Mark the region (select) with both chars then and use M-x compose-region RET do get:
áAnd even more layered:
c¸´Becomes:
ḉThese are not “single characters” in the traditional sense.
They are Unicode sequences rendered as one glyph.
If you try to delete from right to left, you'll get it decomposing, like:
ḉ
ç
cCombining Characters
Let’s take a simple letter:
ZNow, using C-x 8 RET, start adding combining marks:
COMBINING OGONEK
COMBINING TILDE
COMBINING DOT BELOW
COMBINING CEDILLA→ …repeat as many times as your conscience allows
Result:
→ Ẓ̨̧̧̧̧̧̧̧̧̧̧̧̃
This is valid Unicode.
It may render differently depending on font and platform, but Emacs treats it as a perfectly legitimate text sequence. If you're not inside Emacs, try copying it into some buffer and/or change fonts.
At this point, hopefully you can see how text stops being "letters" and starts being stacked metadata.
Variation Selector-16 (U+FE0F): Forcing Emoji Presentation
Not every emoji-looking thing starts as an emoji.
Many Unicode characters exist in a neutral or text presentation form.
Whether they render as plain text or as colorful emoji depends on an invisible hint called a Variation Selector.
The most important one is:
VARIATION SELECTOR-16 (U+FE0F)In Emacs:
C-x 8 RET VARIATION SELECTOR-16VS16 explicitly tells the renderer:
"Treat the previous character as an emoji, not as text."
Let's start with this example:
❤ (HEAVY BLACK HEART)Adding VARIATION SELECTOR-16, we get:
→ ❤️ (HEAVY BLACK HEART + VARIATION SELECTOR-16)
Same base character. Different semantic intent.
VS16 doesn’t draw anything. It changes meaning.
Another example, insert the digit:
1That’s just ASCII text.
Now add:
COMBINING ENCLOSING KEYCAPResult:
→ 1⃣
This might look like a keycap, or it might look broken. That’s because we didn’t force emoji presentation.
Now do it properly:
Sequence:
1
VARIATION SELECTOR-16
COMBINING ENCLOSING KEYCAPResult:
→ 1️⃣
Important note: that is not one character.
It’s a sequence of three code points:
→ 1 (DIGIT ONE)
→ U+FE0F (VARIATION SELECTOR-16)
→ U+20E3 (COMBINING ENCLOSING KEYCAP)
Without VS16, renderers are free to choose a text glyph. With VS16, you get the emoji version, consistently.
Zero Width Joiner (U+200D): Emoji Glue
If combining marks stack vertically, Zero Width Joiner (ZWJ) connects characters horizontally.
Unicode code point:
U+200DIn Emacs:
C-x 8 RET ZERO WIDTH JOINERZWJ tells the renderer:
"These characters should be treated as a single semantic unit."
Here's an example:
❤ HEAVY BLACK HEARTH
❤️ HEAVY BLACK HEARTH + VARIATION SELECTOR 16
❤️🔥 HEAVY BLACK HEARTH + VARIATION SELECTOR 16 + ZERO WIDTH JOINER + FIREThe "flaming heart" is not a single character. It’s a sequence:
- ❤️
- ZWJ
- 🔥
Another example is creating "Families" from "Individuals":
This emoji:
👨👩👦Actually is the combination of:
- 👨
- ZWJ
- 👩
- ZWJ
- 👦
You can add more members to your "family" and change their appearance, this is only limited by your "font set", meaning some fonts might not render well an specific combination, but many will.
Inspecting Unicode with C-x = (what-cursor-position)
Typing Unicode is only half the story.
Understanding what you actually typed is the another half.
Lucky us, Emacs users. Just place point over any character and press:
C-x =You will have a description of the char in your minibuffer, with information like:
→ Character name
→ Unicode code point
→ Script
→ Charset
→ How Emacs internally represents it
This alone is already incredibly useful, but wait! There's more!
Deep Inspection: C-u C-x =
Now add the (universal-argument) prefix:
C-u C-x =This is where things get serious.
For composed characters and emojis, Emacs will show:
→ The full Unicode sequence
→ Each individual code point
→ Combining characters
→ Zero Width Joiners
→ Text properties and composition details
This is essential when dealing with:
→ Accents built from combining marks
→ ZWJ emoji sequences
→ “Why does this look right but behave wrong?”
→ Copy/paste weirdness across platforms
Some examples:
C-x =on a simple accented character (á)
Char: á (225, #o341, #xe1, file ...) point=7495 of 8976 (83%) column=43
C-u C-x =on a simple accented character (á)
position: 7495 of 9102 (82%), column: 43
character: á (displayed as á) (codepoint 225, #o341, #xe1)
charset: unicode (Unicode (ISO10646))
code point in charset: 0xE1
script: latin
syntax: w which means: word
category: .:Base, L:Strong L2R, c:Chinese, j:Japanese, l:Latin, v:Viet
to input: type "C-x 8 RET e1" or "C-x 8 RET LATIN SMALL LETTER A WITH ACUTE"
buffer code: #xC3 #xA1
file code: #xC3 #xA1 (encoded by coding system utf-8-unix)
display: by this font (glyph code):
ftcrhb:-JB-JetBrainsMono Nerd Font-regular-italic-normal-*-14-*-*-*-m-0-iso10646-1 (#xA4)
Character code properties: customize what to show
name: LATIN SMALL LETTER A WITH ACUTE
old-name: LATIN SMALL LETTER A ACUTE
general-category: Ll (Letter, Lowercase)
decomposition: (97 769) ('a' '́')
...This can be specially usefull in our emojis, let's try it on ❤️🔥:
position: 8529 of 10012 (85%), column: 61
character: ❤ (displayed as ❤) (codepoint 10084, #o23544, #x2764)
charset: unicode (Unicode (ISO10646))
code point in charset: 0x2764
script: symbol
syntax: w which means: word
category: .:Base, 5:symbol
to input: type "C-x 8 RET 2764" or "C-x 8 RET HEAVY BLACK HEART"
buffer code: #xE2 #x9D #xA4
file code: #xE2 #x9D #xA4 (encoded by coding system utf-8-unix)
display: composed to form "❤️🔥" (see below)
Composed with the following character(s) "️🔥" using this font:
ftcrhb:-GOOG-Noto Color Emoji-regular-normal-normal-*-14-*-*-*-m-0-iso10646-1
by these glyphs:
[0 3 10084 1537 17 0 18 13 4 nil]
with these character(s):
️ (#xfe0f) VARIATION SELECTOR-16
(#x200d) ZERO WIDTH JOINER
🔥 (#x1f525) FIRE
Character code properties: customize what to show
name: HEAVY BLACK HEART
general-category: So (Symbol, Other)
decomposition: (10084) ('❤')
...Fonts Matter (A Lot)
One important caveat: fonts decide how far you can go.
Some fonts handle combining characters and emoji sequences beautifully. Others collapse under pressure.
If you see:
→ Misaligned accents
→ Overlapping glyphs
→ Missing emoji components
It’s almost always a font issue, probably not Emacs, not Unicode.
Final Thoughts
Once you internalize C-x 8, unicode and emojis stop being
“special”. If you explored it with which-keys as I recommended at
the beginning of this post, you probably saw a lot more than C-x 8 RET, emojis even get the special C-x 8 e e sub-menu for
convenience.
And I hope once you use C-u C-x =, special characters, block non
visible (non printable with your font choice) chars stops being
mysterious.
Unicode is a vast and deeply technical subject, but the goal here wasn't theory, it was to show how Emacs turns it into something you can actually use, inspect, and enjoy.
Thanks!
This is likely my final post for 2025, so it feels like a good moment to say thank you.
Thank you to everyone who read these posts, sent feedback, pointed out mistakes, shared them around, or just quietly followed along. This year was especially fun to write: lots of small discoveries, deep dives into Emacs internals, and many "oh wow, it already does that?" moments.
Happy hacking!