The Science Behind Emoji-Scaffolded Reading
How visual symbols help children learn to read — and help everyone learn a new language.
The Core Idea
Phonetic places emoji above words in sentences as inline reading scaffolds — a visual hint that gives the reader instant comprehension of an unfamiliar word without breaking the flow of reading. As the reader masters each word, the scaffold fades away.
This technique isn’t a guess. It draws on decades of cognitive science, developmental psychology, and emerging research in emoji linguistics. Here’s what the evidence says.
1. Children Understand Emoji Before They Can Read
Children are emoji-literate years before they are word-literate.
Linguist Gretchen McCulloch (author of Because Internet, Wired’s resident linguist) studied pre-literate children’s emoji use and found that children ages 2–5 routinely send emoji-only text messages to family members. What looks like random strings of pictures — ๐ฅ๐ป๐๐๐๐ — is actually digital babbling: children exploring symbolic communication the same way they babble nonsense syllables before learning to speak.
McCulloch’s conclusion: emoji may serve as a useful precursor to reading — a way of acclimating kids to the reality of using symbols to communicate meaning before they can decode written words.
A 2021 eye-tracking study by Liu & Li confirmed this developmental timeline, demonstrating that 30-month-old toddlers can correctly identify emoji representing basic emotions and match them to spoken emotion words. That’s two-and-a-half-year-olds — many of whom can barely speak — already reading symbols.
Key citations
- McCulloch, G. (2019). “Children Are Using Emoji for Digital-Age Language Learning.” Wired.
- Liu, S. & Li, N. (2021). “Going virtual in the early years: 30-month-old toddlers recognize commonly used emojis.” Infant Behavior and Development, 63, 101529.
2. The Baby Sign Language Parallel
This pattern — children comprehending symbols before speech — has a well-studied precedent: baby sign language.
Research has shown that infants can learn to communicate through manual gestures months before their vocal apparatus can produce speech. The reason is simple: babies develop gross motor control (hand gestures) before fine motor control (tongue and lip movements required for speech). Signs give them a symbolic system they can use now, which then scaffolds their transition to spoken language.
The research on baby sign shows that signing babies tend to have larger vocabularies and earlier speech development compared to non-signing peers. The act of mapping meaning to a visual symbol — whether a hand gesture or an emoji — appears to strengthen the cognitive pathways that later support word learning.
Emoji scaffolding works on the same principle. A child who can’t yet read the word “cat” can instantly comprehend ๐ฑ. When ๐ฑ appears above the written word “cat,” the child has a bridge: a symbol they already understand sitting directly above a symbol they’re learning to decode. Over time, the written word absorbs the meaning, and the emoji becomes unnecessary.
Key citations
- Acredolo, L. & Goodwyn, S. (2000). “The Longterm Impact of Symbolic Gesturing During Infancy on IQ at Age 8.” International Conference on Infant Studies, Brighton, UK.
- Hall, L. (2025). “Signs of Success: How Baby Sign Language Boosts Early Literacy Skills.” Indiana Institute on Disability & Community, Early Childhood Center.
- Daniels, M. (2001). Dancing with Words: Signing for Hearing Children’s Literacy. Bergin & Garvey.
3. Dual Coding Theory: Two Memory Traces Are Better Than One
The theoretical foundation for why visual-plus-verbal works is Dual Coding Theory (DCT), proposed by Allan Paivio in 1971 and extensively validated over the following five decades.
DCT holds that the brain processes information through two distinct but interconnected systems: a verbal system (words, speech, text) and a nonverbal system (images, spatial information, sensory experience). When information is encoded through both systems simultaneously, it creates two independent memory traces that reinforce each other. Recalling either the image or the word can trigger recall of the other.
This has direct implications for vocabulary learning. When a child sees the word “sun” paired with โ๏ธ, the verbal system encodes the word while the nonverbal system encodes the image. The cross-referencing between systems — what Paivio called referential processing — creates a stronger, more retrievable memory than either channel alone.
Clark & Paivio’s 1991 review established DCT as a general framework for educational psychology, demonstrating that concreteness and imagery play major roles in knowledge representation, comprehension, learning, and memory of school material across educational domains.
Richard Mayer’s subsequent Cognitive Theory of Multimedia Learning built on DCT to show that people learn more effectively from words and relevant pictures together than from words alone — provided the two forms of information are spatially and temporally integrated. Emoji placed directly above the words they represent is a near-ideal implementation of this principle: the visual and verbal cues are co-located, simultaneous, and semantically linked.
Key citations
- Paivio, A. (1971). Imagery and Verbal Processes. New York: Holt, Rinehart & Winston.
- Clark, J.M. & Paivio, A. (1991). “Dual coding theory and education.” Educational Psychology Review, 3(3), 149–170.
- Mayer, R.E. (2009). Multimedia Learning (2nd ed.). Cambridge University Press.
- Sadoski, M. (2005). “A dual coding view of vocabulary learning.” Reading & Writing Quarterly, 21(3), 221–238.
4. The Picture Superiority Effect
One of the most replicated findings in memory research is the picture superiority effect: pictures are remembered better than words. In experiments, participants shown a rapid sequence of pictures and a rapid sequence of words consistently recall images at higher rates.
This effect has been observed across the lifespan, from young children to older adults with mild cognitive impairment. It extends to associative recognition (remembering which items were paired together), spatial memory, and even foreign language vocabulary acquisition.
Critically for reading instruction, the effect is strongest when pictures and text are presented together rather than separately. A child who sees an image of a dog paired with the word “dog” retains the word better than a child who sees either the image or the word alone. Emoji, as standardized, universally recognized pictographs, function as compact, instantly comprehensible images that can be placed inline with text at scale — something photographs and illustrations cannot easily do.
Key citations
- Paivio, A. & Csapo, K. (1973). “Picture superiority in free recall: Imagery or dual coding?” Cognitive Psychology, 5(2), 176–206.
- Whitehouse, A.J., Maybery, M.T., & Durkin, K. (2006). “The development of the picture-superiority effect.” British Journal of Developmental Psychology, 24(4), 767–773.
- Carpenter, S.K. & Olson, K.M. (2012). “Are pictures good for learning new vocabulary in a foreign language?” Journal of Experimental Psychology: Learning, Memory, and Cognition, 38(1), 92–111.
5. Emoji Annotations Measurably Improve Comprehension
The most direct evidence for emoji-scaffolded reading comes from the Emojinize research (2024), which studied precisely this technique: enriching arbitrary text with inline emoji annotations.
Using a cloze test methodology — where words are hidden from text and must be guessed by the reader — the researchers compared comprehension with and without emoji annotations. The results were striking: emoji annotations produced a 55% increase in correct guesses compared to unannotated text. Most importantly, participants received no training in reading emoji annotations. They could immediately leverage the visual information to improve their text comprehension.
The researchers noted that emoji, unlike custom illustrations, require no special creation process. The Unicode standard contains over 3,600 emoji, and their visual meanings are intuitively understood across cultures. They explicitly identified children’s books and language learning as primary applications, writing that emoji can “act as helpful annotations, facilitating comprehension and even propelling early reading capabilities.”
Key citation
- Beutter et al. (2024). “Emojinize: Enriching Any Text with Emoji Translations.” arXiv:2403.03857v2.
6. Emoji Are Processed Like Words — But Faster to Comprehend
Research from Ruhr-Universität Bochum (2021) examined how the brain processes emoji when they appear within sentences. Using self-paced reading experiments, the researchers found that sentences containing emoji in place of words were comprehended equally well as all-word sentences.
Furthermore, emoji were found to activate complete lexical entries — including the phonological (sound-based) representation of the associated word. When readers see ๐ฑ, their brain doesn’t just process “animal” — it activates the word “cat” including its pronunciation. This means emoji don’t merely convey meaning; they prime the full word, potentially accelerating the transition from visual comprehension to word recognition.
A separate study in Computers in Human Behavior (Cohn et al., 2018) confirmed that multimodal sentences containing emoji are as comprehensible as all-word sentences, with participants sometimes rating emoji-containing sentences as more enjoyable.
Key citations
- Scheffler, T. et al. (2021). “The processing of emoji-word substitutions: A self-paced reading study.” Computers in Human Behavior, 124, 106898.
- Cohn, N. et al. (2018). “Are emoji a poor substitute for words? Sentence processing with emoji substitutions.” Proceedings of the 40th Annual Conference of the Cognitive Science Society.
7. Historical Precedent: Furigana and Ruby Text
Emoji scaffolding is not a new concept in form — only in medium. The technique of placing a comprehension aid above unfamiliar text has been used for centuries in East Asian writing systems.
Furigana (ๆฏใไปฎๅ) are small phonetic characters placed above kanji (logographic characters) in Japanese text to indicate pronunciation. They are standard in children’s books, newspapers, manga, and instructional materials. In Chinese, an equivalent system called Zhuyin (in Taiwan) or Pinyin (in mainland China) serves the same purpose: a simpler, known symbol system placed above a more complex, unfamiliar one to scaffold comprehension.
The W3C has even standardized this pattern in HTML as “ruby text” — small annotations placed above base text — recognizing its universal utility as a reading aid.
Phonetic applies this centuries-old scaffolding pattern to a new symbol system: emoji. Instead of phonetic characters above logographic characters, it places universally understood pictographs above alphabetic words. The structural principle is identical: use what the reader already knows to help them decode what they don’t.
8. Why This Matters for Accessibility
Emoji scaffolding is inherently inclusive. Because emoji are:
- Visual and high-contrast — they function as large, colorful touch targets for children with motor disabilities
- Language-independent — ๐ฑ means “cat” whether you speak English, Spanish, or Japanese
- Cognitively lightweight — they reduce the decoding burden for readers with dyslexia or other reading difficulties
- Universally familiar — children encounter emoji years before formal reading instruction begins
Preliminary research on emoji and dyslexia supports this potential. A study in Frontiers in Psychology (Szpadel & Gawda, 2021) examined emoticon and emoji comprehension in dyslexic youth and found that nonstandard emoji — particularly object-based emoji that reinforce verbal content — showed promise as comprehension aids for students with reading impairments.
Key citation
- Szpadel, K. & Gawda, A. (2021). “The Role of Emoticons in the Comprehension of Emotional and Non-emotional Messages in Dyslexic Youth.” Frontiers in Psychology, 12, 695921.
9. Learning to Read and Learning a New Language Are the Same Process
A child learning to read English and an adult learning to read Spanish face the same fundamental cognitive challenge: decoding unfamiliar symbols into meaning. The child sees “cat” and doesn’t know what those marks mean. The adult sees “gato” and doesn’t know what those marks mean. Both need a bridge from the unknown to the known.
For the child, ๐ฑ above “cat” provides that bridge — the emoji IS the meaning. For the adult, ๐ฑ (and optionally “cat”) above “gato” provides that bridge — the emoji and native word together make the foreign word instantly comprehensible.
This is why the same app, the same mechanic, and the same emoji scaffold can serve both use cases. Dual coding theory predicts that pairing the visual (emoji) with the verbal (target word) will strengthen memory for the new word regardless of whether the learner is 3 years old or 30.
10. Closing the Digital Babbling Loop
Gretchen McCulloch’s research documented a developmental dead end: children ages 2–5 send emoji-only text messages to family members, but parents reply with words the child can’t read. The child can “write” (emoji) but can’t “read” (words). The conversation is one-directional — digital babbling into a void.
A keyboard extension built on emoji scaffolding closes that loop in both directions.
Outbound: A child taps emoji on a custom keyboard. The extension generates the corresponding word underneath each emoji. The message sends as real, readable text. The child has just written a sentence — before they can spell. And they see their emoji become words in real time, reinforcing the emoji-to-word mapping with every message sent.
Inbound: A text from Mom arrives. The extension annotates each word with an emoji above it — the same scaffold used in the app. The child who couldn’t read a text message five minutes ago is now reading one.
Every sent message becomes a writing lesson. Every received message becomes a reading lesson. The scaffold is invisible to the parent on the other end — they just see a normal text. But for the child, texting has become literacy practice embedded in the thing they already want to do: talk to their family.
This is the natural endpoint of the research. If emoji is a precursor to reading, and if emoji scaffolding bridges the gap between symbol comprehension and word decoding, then the place that bridge matters most is in the communication channel children are already using.
Summary of Evidence
| Finding | Source | Year |
|---|---|---|
| Children ages 2–5 use emoji before reading as “digital babbling” | McCulloch, Wired | 2019 |
| 30-month-old toddlers recognize and match emoji to emotions | Liu & Li, Infant Behavior and Development | 2021 |
| Baby sign language enhances vocabulary and early literacy | Acredolo & Goodwyn; Daniels | 1988–2001 |
| Dual coding (visual + verbal) strengthens memory and learning | Paivio; Clark & Paivio | 1971–1991 |
| Pictures are remembered better than words (picture superiority effect) | Paivio & Csapo; Whitehouse et al. | 1973–2006 |
| Emoji annotations improve text comprehension by 55% | Beutter et al., Emojinize | 2024 |
| Emoji within sentences activate full word entries including pronunciation | Scheffler et al. | 2021 |
| Multimodal sentences with emoji are equally comprehensible | Cohn et al. | 2018 |
| Object-based emoji may aid reading comprehension in dyslexia | Szpadel & Gawda | 2021 |
What Makes Phonetic Different
No app has combined these findings into a single product. Existing reading apps teach phonics through drills and games. Existing language apps use flashcards, quizzes, and spaced repetition. Existing emoji apps use emoji as replacements for words or as standalone vocabulary tools.
Phonetic is the first app to use emoji as inline reading scaffolds — placed directly above words within complete sentences — for both native-language reading instruction and foreign-language acquisition. It takes a technique that has been validated by cognitive science, proven effective by empirical research, and used for centuries in East Asian typography, and makes it available to every reader on their phone.
And with a keyboard extension, it goes further: turning every text message a child sends into a writing lesson, and every text message they receive into a reading lesson. Reading training wheels — not just in an app, but everywhere words appear.
The science says it works. Now there’s an app for it.
Last updated: February 2026
For questions about the research cited here, contact
hello@f.app