Composing Text in Emacs: Unicode, Emojis, and the Power of C-x 8

Rahul M. Juliato

Cover Image for Composing Text in Emacs: Unicode, Emojis, and the Power of C-x 8

Rahul M. Juliato

December 21, 2025

#emacs#unicode# emoji

One of the things that never stops fascinating me about Emacs is how deep the rabbit hole goes when your start exploring... well... anything!

You come for a text editor.

You stay because it quietly turns into a laboratory and eventually the custom interface for your entire computer experience.

Most people associate emojis with mobile keyboards, chat apps, or graphical pickers. Emacs takes a completely different approach: Unicode as a first-class citizen. And at the center of this philosophy lives a deceptively simple prefix C-x 8.

Altough not obligatory, In order to explore this feature, try enabling which-key with M-x which-key-mode RET, then issue C-x 8 and wait for the options.

If you've ever wondered how to compose, combine, and bend Unicode inside Emacs, this post is for you.

`C-x 8`

In Emacs, C-x 8 is the entry point to character composition.

It allows you to:

→ Insert Unicode characters by name

→ Compose accented characters manually

→ Combine multiple code points into a single visible glyph

→ Create emoji sequences using Zero Width Joiners

→ Abuse combining characters in ways that would scare typography purists

This is not about convenience, this is about control.

Inserting Unicode by Name

The most straightforward usage is:

C-x 8 RET

This prompts you for a Unicode character name.

Try typing:

LATIN SMALL LETTER A WITH ACUTE

→ á

Or:

GREEK SMALL LETTER LAMBDA

→ λ

Tip: If you have something like vertico-mode or icomplete-vertical-mode turned ON, you get a preview of your search.

You can also type the code directly, even if Emacs doesn't show it as a preview, like:

274C

Returns:

→ ❌

Composing Accents Manually (Base + Combining Marks)

Unicode doesn’t require precomposed characters like á or ç. Instead, it allows you to build them from base characters + combining marks.

This is where Emacs shines again:

Using `M-x compose-region`.

Start by tiping a and ' for example, you'll get:

a´

Mark the region (select) with both chars then and use M-x compose-region RET do get:

á

And even more layered:

c¸´

Becomes:

ḉ

These are not “single characters” in the traditional sense.

They are Unicode sequences rendered as one glyph.

If you try to delete from right to left, you'll get it decomposing, like:

ḉ
ç
c

Note: After I first published this post, Eli Zaretskii (Emacs maintainer) was kind enough to share his critique of this approach. Here are his remarks (original here):

The section about composing accents is, in my opinion,
misguided.  Emacs already handles this automatically by
default, without any need to invoke `compose-region`.

The reason the command appeared necessary in the earlier
example is that it used the wrong characters. It should have
used **COMBINING CEDILLA** and **COMBINING ACUTE ACCENT**
instead — which is exactly what the following section does.

I recommend staying away from `compose-region` in Emacs. It
does not fully support all display features and options
(such as bidirectional display), and it may cause problems
in certain corner cases.

Combining Characters

Let’s take a simple letter:

Now, using C-x 8 RET, start adding combining marks:

COMBINING OGONEK
COMBINING TILDE
COMBINING DOT BELOW
COMBINING CEDILLA

→ …repeat as many times as your conscience allows

Result:

→ Ẓ̨̧̧̧̧̧̧̧̧̧̧̧̃

This is valid Unicode.

It may render differently depending on font and platform, but Emacs treats it as a perfectly legitimate text sequence. If you're not inside Emacs, try copying it into some buffer and/or change fonts.

At this point, hopefully you can see how text stops being "letters" and starts being stacked metadata.

Variation Selector-16 (U+FE0F): Forcing Emoji Presentation

Not every emoji-looking thing starts as an emoji.

Many Unicode characters exist in a neutral or text presentation form.

Whether they render as plain text or as colorful emoji depends on an invisible hint called a Variation Selector.

The most important one is:

VARIATION SELECTOR-16 (U+FE0F)

In Emacs:

C-x 8 RET VARIATION SELECTOR-16

VS16 explicitly tells the renderer:

"Treat the previous character as an emoji, not as text."

Let's start with this example:

❤   (HEAVY BLACK HEART)

Adding VARIATION SELECTOR-16, we get:

→ ❤️ (HEAVY BLACK HEART + VARIATION SELECTOR-16)

Same base character. Different semantic intent.

VS16 doesn’t draw anything. It changes meaning.

Another example, insert the digit:

That’s just ASCII text.

Now add:

COMBINING ENCLOSING KEYCAP

Result:

→ 1⃣

This might look like a keycap, or it might look broken. It depends on your font. That’s because we didn’t force emoji presentation.

Now do it properly:

Sequence:

1
VARIATION SELECTOR-16
COMBINING ENCLOSING KEYCAP

Result:

→ 1️⃣

Important note: that is not one character.

It’s a sequence of three code points:

→ 1 (DIGIT ONE)

→ U+FE0F (VARIATION SELECTOR-16)

→ U+20E3 (COMBINING ENCLOSING KEYCAP)

Without VS16, renderers are free to choose a text glyph. With VS16, you get the emoji version, consistently.

Zero Width Joiner (U+200D): Emoji Glue

If combining marks alter a single character by adding overlays, the Zero Width Joiner (ZWJ) works at the sequence level, telling the renderer to treat multiple characters as one visual unit — most notably in emoji composition.

Unicode code point:

U+200D

In Emacs:

C-x 8 RET ZERO WIDTH JOINER

ZWJ tells the renderer:

"These characters should be treated as a single semantic unit."

Here's an example:

❤  HEAVY BLACK HEART
❤️  HEAVY BLACK HEART + VARIATION SELECTOR 16
❤️‍🔥  HEAVY BLACK HEART + VARIATION SELECTOR 16 + ZERO WIDTH JOINER + FIRE

The "flaming heart" is not a single character. It’s a sequence:

❤️
ZWJ
🔥

Another example is creating "Families" from "Individuals":

This emoji:

👨‍👩‍👦

Actually is the combination of:

👨
ZWJ
👩
ZWJ
👦

You can add more members to your "family" and change their appearance, this is only limited by your "font set", meaning some fonts might not render well an specific combination, but many will.

Note: After I first published this post, shipmints suggested that you can also create these sequences via Emacs Lisp (original here), like:

(concat "\N{WOMAN}"
		"\N{ZERO WIDTH JOINER}"
		"\N{HEAVY BLACK HEART}"
		"\N{VARIATION SELECTOR-16}"
		"\N{ZERO WIDTH JOINER}"
		"\N{MAN}"
		" ")

Result:

→ 👩‍❤️‍👨

Inspecting Unicode with `C-x =` (`what-cursor-position`)

Typing Unicode is only half the story.

Understanding what you actually typed is the another half.

Lucky us, Emacs users. Just place point over any character and press:

C-x =

You will have a description of the char in your minibuffer, with information like:

→ Character name

→ Unicode code point

→ Script

→ Charset

→ How Emacs internally represents it

This alone is already incredibly useful, but wait! There's more!

Deep Inspection: `C-u C-x =`

Now add the (universal-argument) prefix:

C-u C-x =

This is where things get serious.

For composed characters and emojis, Emacs will show:

→ The full Unicode sequence

→ Each individual code point

→ Combining characters

→ Zero Width Joiners

→ Text properties and composition details

This is essential when dealing with:

→ Accents built from combining marks

→ ZWJ emoji sequences

→ “Why does this look right but behave wrong?”

→ Copy/paste weirdness across platforms

Some examples:

C-x = on a simple accented character (á)

Char: á (225, #o341, #xe1, file ...) point=7495 of 8976 (83%) column=43

C-u C-x = on a simple accented character (á)

			 position: 7495 of 9102 (82%), column: 43
			character: á (displayed as á) (codepoint 225, #o341, #xe1)
			  charset: unicode (Unicode (ISO10646))
code point in charset: 0xE1
			   script: latin
			   syntax: w	which means: word
			 category: .:Base, L:Strong L2R, c:Chinese, j:Japanese, l:Latin, v:Viet
			 to input: type &quot;C-x 8 RET e1&quot; or &quot;C-x 8 RET LATIN SMALL LETTER A WITH ACUTE&quot;
		  buffer code: #xC3 #xA1
			file code: #xC3 #xA1 (encoded by coding system utf-8-unix)
			  display: by this font (glyph code):
	ftcrhb:-JB-JetBrainsMono Nerd Font-regular-italic-normal-*-14-*-*-*-m-0-iso10646-1 (#xA4)

Character code properties: customize what to show
  name: LATIN SMALL LETTER A WITH ACUTE
  old-name: LATIN SMALL LETTER A ACUTE
  general-category: Ll (Letter, Lowercase)
  decomposition: (97 769) (&#39;a&#39; &#39;́&#39;)
...

This can be specially usefull in our emojis, let's try it on ❤️‍🔥:

			 position: 8529 of 10012 (85%), column: 61
			character: ❤ (displayed as ❤) (codepoint 10084, #o23544, #x2764)
			  charset: unicode (Unicode (ISO10646))
code point in charset: 0x2764
			   script: symbol
			   syntax: w	which means: word
			 category: .:Base, 5:symbol
			 to input: type &quot;C-x 8 RET 2764&quot; or &quot;C-x 8 RET HEAVY BLACK HEART&quot;
		  buffer code: #xE2 #x9D #xA4
			file code: #xE2 #x9D #xA4 (encoded by coding system utf-8-unix)
			  display: composed to form &quot;❤️‍🔥&quot; (see below)

Composed with the following character(s) &quot;️‍🔥&quot; using this font:
  ftcrhb:-GOOG-Noto Color Emoji-regular-normal-normal-*-14-*-*-*-m-0-iso10646-1
by these glyphs:
  [0 3 10084 1537 17 0 18 13 4 nil]
with these character(s):
  ️ (#xfe0f) VARIATION SELECTOR-16
  ‍ (#x200d) ZERO WIDTH JOINER
  🔥 (#x1f525) FIRE

Character code properties: customize what to show
  name: HEAVY BLACK HEART
  general-category: So (Symbol, Other)
  decomposition: (10084) (&#39;❤&#39;)
 ...

Fonts Matter (A Lot)

One important caveat: fonts decide how far you can go.

Some fonts handle combining characters and emoji sequences beautifully. Others collapse under pressure.

If you see:

→ Misaligned accents

→ Overlapping glyphs

→ Missing emoji components

It’s almost always a font issue, probably not Emacs, not Unicode.

I'd recommend JetBrainsMono Nerd Font or Maple Mono NF. You can find a lot more fonts on nerd fonts site, or use my custom installer script (post, github).

Final Thoughts

Once you internalize C-x 8, unicode and emojis stop being “special”. If you explored it with which-keys as I recommended at the beginning of this post, you probably saw a lot more than C-x 8 RET, emojis even get the special C-x 8 e e sub-menu for convenience.

And I hope once you use C-u C-x =, special characters, block non visible (non printable with your font choice) chars stops being mysterious.

Unicode is a vast and deeply technical subject, but the goal here wasn't theory, it was to show how Emacs turns it into something you can actually use, inspect, and enjoy.

Thanks!

This is likely my final post for 2025, so it feels like a good moment to say thank you.

Thank you to everyone who read these posts, sent feedback, pointed out mistakes, shared them around, or just quietly followed along. This year was especially fun to write: lots of small discoveries, deep dives into Emacs internals, and many "oh wow, it already does that?" moments.

Happy hacking!

Edit

2025-12-28: Added Eli Zaretskii remarks. Added Shipmints code suggestion, typo fix and font recommendations.

Rahul's Blog

Composing Text in Emacs: Unicode, Emojis, and the Power of C-x 8

C-x 8

Inserting Unicode by Name

Composing Accents Manually (Base + Combining Marks)

Using M-x compose-region.

Combining Characters

Variation Selector-16 (U+FE0F): Forcing Emoji Presentation

Zero Width Joiner (U+200D): Emoji Glue

Inspecting Unicode with C-x = (what-cursor-position)

Deep Inspection: C-u C-x =

Fonts Matter (A Lot)

Final Thoughts

Thanks!

Edit

`C-x 8`

Using `M-x compose-region`.

Inspecting Unicode with `C-x =` (`what-cursor-position`)

Deep Inspection: `C-u C-x =`