Interactions between voice-activated AI assistants and human speakers and their implications for second-language acquisition

Jae Yung Song , Anne Pycha & Tessa Culleton
Frontiers in Communication, 2022-10-21


Voice-activated artificially intelligent (voice-AI) assistants, such as Alexa, are remarkably effective at processing spoken commands by native speakers. What happens when the command is produced by an L2 speaker? In the current study, we focused on Korean-speaking L2 learners of English, and we asked (a) whether Alexa could recognize intended productions of two vowel contrasts, /i/ vs. /ɪ/ and /æ/ vs. /ε/, that occur in English but not in Korean, and (b) whether L2 talkers would make clear-speech adjustments when Alexa misrecognized their intended productions. L2 talkers (n = 10) and native English (n = 10) controls asked Alexa to spell out words. Targets were words that formed minimal vowel pairs, e.g., beat-bit, pet-pat. Results showed that Alexa achieved a 55% accuracy rate with L2 productions, compared to 98% for native productions. When Alexa misrecognized an intended production (e.g., spelling P-E-T when the speaker intended pat), L2 talkers adjusted their subsequent production attempts by altering the duration, F1 and F2 of individual vowels (except for /ε/), as well as increasing vowel duration difference between contrasting vowels. These results have implications for theories of speech adaptation, and specifically for our understanding of L2 speech modifications oriented to voice-AI devices.
Read Full Paper

Cite this

Berges Institute logo

Join thousands of students who are already learning Spanish with us!

Unlimited, grammar-intensive live classes
FREE for the first 15 days
/month after that
Cancel any time with two clicks