The Science Behind Emoji-Scaffolded Reading

How visual symbols help children learn to read — and help everyone learn a new language.

The Core Idea

Phonetic places emoji above words in sentences as inline reading scaffolds — a visual hint that gives the reader instant comprehension of an unfamiliar word without breaking the flow of reading. As the reader masters each word, the scaffold fades away.

This technique isn’t a guess. It draws on decades of cognitive science, developmental psychology, and emerging research in emoji linguistics. Here’s what the evidence says.

1. Children Understand Emoji Before They Can Read

Children are emoji-literate years before they are word-literate.

Linguist Gretchen McCulloch (author of Because Internet, Wired’s resident linguist) studied pre-literate children’s emoji use and found that children ages 2–5 routinely send emoji-only text messages to family members. What looks like random strings of pictures — 🥀🌻💐🍁🐘🐁 — is actually digital babbling: children exploring symbolic communication the same way they babble nonsense syllables before learning to speak.

McCulloch’s conclusion: emoji may serve as a useful precursor to reading — a way of acclimating kids to the reality of using symbols to communicate meaning before they can decode written words.

A 2021 eye-tracking study by Liu & Li confirmed this developmental timeline, demonstrating that 30-month-old toddlers can correctly identify emoji representing basic emotions and match them to spoken emotion words. That’s two-and-a-half-year-olds — many of whom can barely speak — already reading symbols.

Key citations

McCulloch, G. (2019). “Children Are Using Emoji for Digital-Age Language Learning.” Wired.
Liu, S. & Li, N. (2021). “Going virtual in the early years: 30-month-old toddlers recognize commonly used emojis.” Infant Behavior and Development, 63, 101529.

2. The Baby Sign Language Parallel

This pattern — children comprehending symbols before speech — has a well-studied precedent: baby sign language.

Research has shown that infants can learn to communicate through manual gestures months before their vocal apparatus can produce speech. The reason is simple: babies develop gross motor control (hand gestures) before fine motor control (tongue and lip movements required for speech). Signs give them a symbolic system they can use now, which then scaffolds their transition to spoken language.

The research on baby sign shows that signing babies tend to have larger vocabularies and earlier speech development compared to non-signing peers. The act of mapping meaning to a visual symbol — whether a hand gesture or an emoji — appears to strengthen the cognitive pathways that later support word learning.

Emoji scaffolding works on the same principle. A child who can’t yet read the word “cat” can instantly comprehend 🐱. When 🐱 appears above the written word “cat,” the child has a bridge: a symbol they already understand sitting directly above a symbol they’re learning to decode. Over time, the written word absorbs the meaning, and the emoji becomes unnecessary.

Key citations

Acredolo, L. & Goodwyn, S. (2000). “The Longterm Impact of Symbolic Gesturing During Infancy on IQ at Age 8.” International Conference on Infant Studies, Brighton, UK.
Hall, L. (2025). “Signs of Success: How Baby Sign Language Boosts Early Literacy Skills.” Indiana Institute on Disability & Community, Early Childhood Center.
Daniels, M. (2001). Dancing with Words: Signing for Hearing Children’s Literacy. Bergin & Garvey.

3. Dual Coding Theory: Two Memory Traces Are Better Than One

The theoretical foundation for why visual-plus-verbal works is Dual Coding Theory (DCT), proposed by Allan Paivio in 1971 and extensively validated over the following five decades.

DCT holds that the brain processes information through two distinct but interconnected systems: a verbal system (words, speech, text) and a nonverbal system (images, spatial information, sensory experience). When information is encoded through both systems simultaneously, it creates two independent memory traces that reinforce each other. Recalling either the image or the word can trigger recall of the other.

This has direct implications for vocabulary learning. When a child sees the word “sun” paired with ☀️, the verbal system encodes the word while the nonverbal system encodes the image. The cross-referencing between systems — what Paivio called referential processing — creates a stronger, more retrievable memory than either channel alone.

Clark & Paivio’s 1991 review established DCT as a general framework for educational psychology, demonstrating that concreteness and imagery play major roles in knowledge representation, comprehension, learning, and memory of school material across educational domains.

Richard Mayer’s subsequent Cognitive Theory of Multimedia Learning built on DCT to show that people learn more effectively from words and relevant pictures together than from words alone — provided the two forms of information are spatially and temporally integrated. Emoji placed directly above the words they represent is a near-ideal implementation of this principle: the visual and verbal cues are co-located, simultaneous, and semantically linked.

Key citations

Paivio, A. (1971). Imagery and Verbal Processes. New York: Holt, Rinehart & Winston.
Clark, J.M. & Paivio, A. (1991). “Dual coding theory and education.” Educational Psychology Review, 3(3), 149–170.
Mayer, R.E. (2009). Multimedia Learning (2nd ed.). Cambridge University Press.
Sadoski, M. (2005). “A dual coding view of vocabulary learning.” Reading & Writing Quarterly, 21(3), 221–238.

4. The Picture Superiority Effect

One of the most replicated findings in memory research is the picture superiority effect: pictures are remembered better than words. In experiments, participants shown a rapid sequence of pictures and a rapid sequence of words consistently recall images at higher rates.

This effect has been observed across the lifespan, from young children to older adults with mild cognitive impairment. It extends to associative recognition (remembering which items were paired together), spatial memory, and even foreign language vocabulary acquisition.

Critically for reading instruction, the effect is strongest when pictures and text are presented together rather than separately. A child who sees an image of a dog paired with the word “dog” retains the word better than a child who sees either the image or the word alone. Emoji, as standardized, universally recognized pictographs, function as compact, instantly comprehensible images that can be placed inline with text at scale — something photographs and illustrations cannot easily do.

Key citations

Paivio, A. & Csapo, K. (1973). “Picture superiority in free recall: Imagery or dual coding?” Cognitive Psychology, 5(2), 176–206.
Whitehouse, A.J., Maybery, M.T., & Durkin, K. (2006). “The development of the picture-superiority effect.” British Journal of Developmental Psychology, 24(4), 767–773.
Carpenter, S.K. & Olson, K.M. (2012). “Are pictures good for learning new vocabulary in a foreign language?” Journal of Experimental Psychology: Learning, Memory, and Cognition, 38(1), 92–111.

5. Emoji Annotations Measurably Improve Comprehension

The most direct evidence for emoji-scaffolded reading comes from the Emojinize research (2024), which studied precisely this technique: enriching arbitrary text with inline emoji annotations.

Using a cloze test methodology — where words are hidden from text and must be guessed by the reader — the researchers compared comprehension with and without emoji annotations. The results were striking: emoji annotations produced a 55% increase in correct guesses compared to unannotated text. Most importantly, participants received no training in reading emoji annotations. They could immediately leverage the visual information to improve their text comprehension.

The researchers noted that emoji, unlike custom illustrations, require no special creation process. The Unicode standard contains over 3,600 emoji, and their visual meanings are intuitively understood across cultures. They explicitly identified children’s books and language learning as primary applications, writing that emoji can “act as helpful annotations, facilitating comprehension and even propelling early reading capabilities.”

Key citation

Beutter et al. (2024). “Emojinize: Enriching Any Text with Emoji Translations.” arXiv:2403.03857v2.

6. Emoji Are Processed Like Words — But Faster to Comprehend

Research from Ruhr-Universität Bochum (2021) examined how the brain processes emoji when they appear within sentences. Using self-paced reading experiments, the researchers found that sentences containing emoji in place of words were comprehended equally well as all-word sentences.

Furthermore, emoji were found to activate complete lexical entries — including the phonological (sound-based) representation of the associated word. When readers see 🐱, their brain doesn’t just process “animal” — it activates the word “cat” including its pronunciation. This means emoji don’t merely convey meaning; they prime the full word, potentially accelerating the transition from visual comprehension to word recognition.

A separate study in Computers in Human Behavior (Cohn et al., 2018) confirmed that multimodal sentences containing emoji are as comprehensible as all-word sentences, with participants sometimes rating emoji-containing sentences as more enjoyable.

Key citations

Scheffler, T. et al. (2021). “The processing of emoji-word substitutions: A self-paced reading study.” Computers in Human Behavior, 124, 106898.
Cohn, N. et al. (2018). “Are emoji a poor substitute for words? Sentence processing with emoji substitutions.” Proceedings of the 40th Annual Conference of the Cognitive Science Society.

7. Historical Precedent: Furigana and Ruby Text

Emoji scaffolding is not a new concept in form — only in medium. The technique of placing a comprehension aid above unfamiliar text has been used for centuries in East Asian writing systems.

Furigana (振り仮名) are small phonetic characters placed above kanji (logographic characters) in Japanese text to indicate pronunciation. They are standard in children’s books, newspapers, manga, and instructional materials. In Chinese, an equivalent system called Zhuyin (in Taiwan) or Pinyin (in mainland China) serves the same purpose: a simpler, known symbol system placed above a more complex, unfamiliar one to scaffold comprehension.

The W3C has even standardized this pattern in HTML as “ruby text” — small annotations placed above base text — recognizing its universal utility as a reading aid.

Phonetic applies this centuries-old scaffolding pattern to a new symbol system: emoji. Instead of phonetic characters above logographic characters, it places universally understood pictographs above alphabetic words. The structural principle is identical: use what the reader already knows to help them decode what they don’t.

8. Why This Matters for Accessibility

Emoji scaffolding is inherently inclusive. Because emoji are:

Visual and high-contrast — they function as large, colorful touch targets for children with motor disabilities
Language-independent — 🐱 means “cat” whether you speak English, Spanish, or Japanese
Cognitively lightweight — they reduce the decoding burden for readers with dyslexia or other reading difficulties
Universally familiar — children encounter emoji years before formal reading instruction begins

Preliminary research on emoji and dyslexia supports this potential. A study in Frontiers in Psychology (Szpadel & Gawda, 2021) examined emoticon and emoji comprehension in dyslexic youth and found that nonstandard emoji — particularly object-based emoji that reinforce verbal content — showed promise as comprehension aids for students with reading impairments.

Key citation

Szpadel, K. & Gawda, A. (2021). “The Role of Emoticons in the Comprehension of Emotional and Non-emotional Messages in Dyslexic Youth.” Frontiers in Psychology, 12, 695921.

9. Learning to Read and Learning a New Language Are the Same Process

A child learning to read English and an adult learning to read Spanish face the same fundamental cognitive challenge: decoding unfamiliar symbols into meaning. The child sees “cat” and doesn’t know what those marks mean. The adult sees “gato” and doesn’t know what those marks mean. Both need a bridge from the unknown to the known.

For the child, 🐱 above “cat” provides that bridge — the emoji IS the meaning. For the adult, 🐱 (and optionally “cat”) above “gato” provides that bridge — the emoji and native word together make the foreign word instantly comprehensible.

This is why the same app, the same mechanic, and the same emoji scaffold can serve both use cases. Dual coding theory predicts that pairing the visual (emoji) with the verbal (target word) will strengthen memory for the new word regardless of whether the learner is 3 years old or 30.

10. Closing the Digital Babbling Loop

Gretchen McCulloch’s research documented a developmental dead end: children ages 2–5 send emoji-only text messages to family members, but parents reply with words the child can’t read. The child can “write” (emoji) but can’t “read” (words). The conversation is one-directional — digital babbling into a void.

A keyboard extension built on emoji scaffolding closes that loop in both directions.

Outbound: A child taps emoji on a custom keyboard. The extension generates the corresponding word underneath each emoji. The message sends as real, readable text. The child has just written a sentence — before they can spell. And they see their emoji become words in real time, reinforcing the emoji-to-word mapping with every message sent.

Inbound: A text from Mom arrives. The extension annotates each word with an emoji above it — the same scaffold used in the app. The child who couldn’t read a text message five minutes ago is now reading one.

Every sent message becomes a writing lesson. Every received message becomes a reading lesson. The scaffold is invisible to the parent on the other end — they just see a normal text. But for the child, texting has become literacy practice embedded in the thing they already want to do: talk to their family.

This is the natural endpoint of the research. If emoji is a precursor to reading, and if emoji scaffolding bridges the gap between symbol comprehension and word decoding, then the place that bridge matters most is in the communication channel children are already using.

Summary of Evidence

Finding	Source	Year
Children ages 2–5 use emoji before reading as “digital babbling”	McCulloch, Wired	2019
30-month-old toddlers recognize and match emoji to emotions	Liu & Li, Infant Behavior and Development	2021
Baby sign language enhances vocabulary and early literacy	Acredolo & Goodwyn; Daniels	1988–2001
Dual coding (visual + verbal) strengthens memory and learning	Paivio; Clark & Paivio	1971–1991
Pictures are remembered better than words (picture superiority effect)	Paivio & Csapo; Whitehouse et al.	1973–2006
Emoji annotations improve text comprehension by 55%	Beutter et al., Emojinize	2024
Emoji within sentences activate full word entries including pronunciation	Scheffler et al.	2021
Multimodal sentences with emoji are equally comprehensible	Cohn et al.	2018
Object-based emoji may aid reading comprehension in dyslexia	Szpadel & Gawda	2021

What Makes Phonetic Different

No app has combined these findings into a single product. Existing reading apps teach phonics through drills and games. Existing language apps use flashcards, quizzes, and spaced repetition. Existing emoji apps use emoji as replacements for words or as standalone vocabulary tools.

Phonetic is the first app to use emoji as inline reading scaffolds — placed directly above words within complete sentences — for both native-language reading instruction and foreign-language acquisition. It takes a technique that has been validated by cognitive science, proven effective by empirical research, and used for centuries in East Asian typography, and makes it available to every reader on their phone.

And with a keyboard extension, it goes further: turning every text message a child sends into a writing lesson, and every text message they receive into a reading lesson. Reading training wheels — not just in an app, but everywhere words appear.

The science says it works. Now there’s an app for it.

Last updated: February 2026
For questions about the research cited here, contact hello@f.app