How Users with Non-Native Accents Interact with Smart Speakers


With the skyrocketing popularity of voice user interfaces (VUIs), such as Amazon’s Alexa, Google Home/Assistant, and Apple’s Siri, there is a growing need to more deeply understand users’ perspective on the perceived control of the interface, as this influences the role these voice assistants will play in the users’ households. How non-native English speakers perceive their control of VUIs is an area that has not been previously investigated, especially for instances when the VUI does not understand the user’s command. Furthermore, as a still developing technology, smart speakers have a range of usability issues which should be addressed, including first-time setup and use. This is especially critical when the primary way to interact with voice interfaces is through one’s voice. The aim of this research is to uncover and analyze whether/how having a non-native American accent affects the user’s perception of an Amazon Echo device.



Previous research regarding user accents and VUIs have involved how users perceive the knowledgeability of the system when the VUI’s accent differs from one’s own (Dahlbäck et al., 2007). The present study aims to understand how having a non-native English accent affects one’s perception of the VUI, including perceived control of the interface, as accented speakers have different perceptions of and are perceived differently in conversations with native speakers (Beinhoff, 2013). Gaining a understanding of non-native English speaking users’ experiences and perceptions of VUIs may have implications on technology adoption, role in the home, as well as future design of VUIs.


Dahlbäck, N., Wang, Q., Nass, C., & Alwin, J. (2007, April). Similarity is more important than expertise: Accent effects in speech interfaces. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 1553-1556). ACM.

Beinhoff, B. (2013). Perceiving identity through accent: Attitudes towards non-native speakers and their accents in English (Vol. 35). Peter Lang.



Two groups of 15 participants (30 total), native English speakers and non-native English speakers, were recruited for the study via flyers and email at a large research midwest university. Participants were screened as native or non-native English speaking through an online survey which asked for domestic or international student status. If self-identified as a international student, participants were considered non-native English speaking, while denoting domestic assumed native English speaking. Participants were then invited for a 60-minute study, which consisted of three parts: initial set-up of an Amazon Echo Dot device, usability study consisting of 10 tasks utilizing different functionalities, and a System Usability Scale (SUS) survey plus semi-structured interview. Study sessions were recorded and transcribed for analysis. The total time to complete was recorded for the the initial set-up, as well as the usability study. Additionally, the SUS survey scores of the two participant groups were compared, and themes were identified from the semi-structured interviews.