HAI-Lecture13
Conversational and
Voice User Interfaces
Copyright By PowCoder代写 加微信 powcoder
Introduction
Human-AI Interaction
Lecture 13
§Module progress – interim report
§ Introduction to Conversational UI/UX
§Socio-economic Challenges
This lecture
COMP3074-HAI Lecture 13, Conversational UI
Part 1. Module progress
§Foundations of NLP
§Practical NLP techniques, processes, applications
§Application of these techniques in labs and coursework 1 – due
next week! Straw poll: who is already working on CW1?
§ Learning outcomes: You should now know how to implement a
basic but functional NLP pipeline to drive an interactive NLP-
based system, e.g., a ‘chatbot’
§Theoretical and critical reflection on AI and NLP in the real world
§ Learning outcomes: appreciate the ethical, societal, and social
complexities involved in design and use of NLP applications
Part 1 of the module is complete
COMP3074-HAI Lecture 13, Conversational UI
§Critical issues in research,
design, development and
deployment of AI-driven
systems, e.g.,
§Ethical issues
§Filter bubbles
§ Interpretable AI
Theory, Concepts Practice
§First half of the term
§Natural Language
Processing
§Second half of the term
§VUI design
5COMP3074-HAI
Assessed by quizzes (30%)
Lecture 13, Conversational UI
§Practical development of NLP
§Python / nltk
§Concludes with CW1
NLP – 1st half of term VUI design – 2nd half
§Practical design, testing of a
VUI prototype
§Voiceflow
§Concludes with CW2
6COMP3074-HAI Lecture 13, Conversational UI
Week Theory lecture Practical lecture Labs Assessment
9 Conversational and Voice
Basic VUI design
principles
Coursework 1 Quiz 2 (10%)
10 Automatic Speech
Recognition
Advanced VUI design Coursework 2
release – Voiceflow
CW1 due (40%)
11 Discoverability and
Response Design
User Testing for VUIs Voiceflow
12 Progressivity for VUIs TBC Voiceflow Quiz 3 (10%)
13 TBC TBC Coursework 2
14-16 CHRISTMAS BREAK 🥳 🤩 🥰
17 CW Q&A CW2 due (30%)
Schedule – Pt. 2
COMP3074-HAI Lecture 13, Conversational UI
§Goal is to learn how to build the voice user interface, the ‘cockpit’ to
our NLP-engine, including learning
§Practical VUI design principles and techniques that help
understand, design, build, and evaluate VUIs with real people
§Application of these principles and techniques in labs and
coursework
§Theoretical and critical reflection on VUIs in use in the real
Part 2 of the module
COMP3074-HAI Lecture 13, Conversational UI
Part 2. Conversational UI/UX
What are examples of conversational systems you have
used? What worked well, what didn’t? Discuss for 2 mins.
§CUI = Conversational User Interface
§Can refer to text-based ‘chatbots’
§Or voice-based VUIs (Voice User Interfaces)
§Or hybrid versions of GUI + voice (e.g., SIRI)
§UX = User Experience
§Broader term encompassing experience of the
interaction, not just design of interfaces and
information
§UX Design is an industry profession (web / app /
interface / game / graphic / designers)
§HCI research – largely academic discipline
Conversational UI/UX
COMP3074-HAI Lecture 13, Conversational UI
§ Robotics
§ Cobots, Social /
Companion robots,
domestic robots
§ Smart speakers / Home
§ “personal assistants”
§ shopping, entertainment
§ Mobility / transportation
§ Safety critical
§ Hands free
environments
The many faces of conversational interfaces J
COMP3074-HAI Lecture 13, Conversational UI
The smart speaker market
COMP3074-HAI Lecture 13, Conversational UI
§ 29% of Brits own a smartspeaker (Sept, 2020)
§ 4-fold growth since 2017
§ Second most popular smart home device
(49% have a smart TV)
§ Source: https://mobilemarketingmagazine.com/uk-
smart-speaker-ownership-2020-gfk-techuk
§ In the US it’s similar, 32%
https://voicebot.ai/2020/04/28/nearly-90-million-u-s-
adults-have-smart-speakers-adoption-now-exceeds-one-
third-of-consumers/
§ Unsurprisingly, it’s quite a lot lower in countries
in which English is not the first language
Adoption varies
COMP3074-HAI Lecture 13, Conversational UI
https://mobilemarketingmagazine.com/uk-smart-speaker-ownership-2020-gfk-techuk
Nearly 90 Million U.S. Adults Have Smart Speakers, Adoption Now Exceeds One-Third of Consumers
§ In late 2021, 39% of adults in England owned
and used at home a Voice-activated personal
assistant or smart speaker device, according to
a survey commissioned by the UK gov’t
(https://www.gov.uk/government/statistics/participation-survey-october-to-december-2021-
report/participation-survey-october-to-december-2021-main-report#background)
§ In the US, ~35% of the population own smart
speakers in early 2022
§ Some say that the market may be becoming
stagnant (rises decreasing)
2022 smart speaker market
COMP3074-HAI Lecture 13, Conversational UI
https://www.gov.uk/government/statistics/participation-survey-october-to-december-2021-report/participation-survey-october-to-december-2021-main-report
VUI interaction – an example
§ Example: https://www.youtube.com/watch?v=IRmGZSdH2qY
User: Alexa remind me to fix my clock.
Alexa: I put fix my clock on your todo list.
What is happening between the request and the response?
Discuss for 3 mins, share with the class.
Technically, what’s going on?
COMP3074-HAI Lecture 13, Conversational UI
§ Example: https://www.youtube.com/watch?v=IRmGZSdH2qY
§User: “Alexa remind me to fix my clock”
§ wakeword detection + acoustic model-based transcription + confidence level
§ Keyword recognition + intent matching [TODO list] + variable/slot
§ Dialogue management + TTS vocalization
§ Response generation
§Alexa: “I put fix my clock on your todo list.”
Technically, what’s going on?
COMP3074-HAI Lecture 13, Conversational UI
action/intent Variable/slot
§As with other AI-driven technologies, there are a host of problems
§Voice is a profoundly personal trait
§Reflects where you’re from, your mother tongue, dialect and
accent, sex/gender, sexual orientation, your socio-economic
background (e.g., ethnicity and education)
§ Issues around privacy and surveillance
§Pertinent given the often intimate / home environment in which
VUIs are placed
A host of challenges
COMP3074-HAI Lecture 13, Conversational UI
Part 3. Socio-economic challenges
§Language support
§There are 7,000+ or so actively spoken languages
https://www.ethnologue.com/guides/how-many-languages
§ 91 of them have at least 10M speakers
https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers
§As of late 2022, Amazon Alexa supports
§ 8 languages (English, French, Spanish, German, Hindi,
Italian, Japanese, Portuguese) ,
§ and dialects in 3 languages (English, French, Spanish)
https://www.globalme.net/blog/language-support-voice-assistants-compared/
VUI challenges…there are many!
COMP3074-HAI Lecture 13, Conversational UI
https://www.ethnologue.com/guides/how-many-languages
https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers
https://www.globalme.net/blog/language-support-voice-assistants-compared/
§Dialects, accents and pitch
§ Tatman, Rachael. (2017, April). Gender and dialect bias in
YouTube’s automatic captions. In Proceedings of the First
ACL Workshop on Ethics in Natural Language
Processing (pp. 53-59).
§VUIs struggle more with certain
§Worst for people from Scotland
§Accuracy also lower for higher
pitched voices
§Worse for women (and probably
children) à gender and ageist bias
VUI challenges…there are many!
COMP3074-HAI Lecture 13, Conversational UI
§ Racial bias
§ et al. (2020). Racial disparities in automated
speech recognition. PNAS April 7, 2020 117 (14) 7684-7689; first
published March 23, 2020; https://doi.org/10.1073/pnas.1915768117
§ Examined 5 state-of-the-art ASR systems—developed
by Amazon, Apple, Google, IBM, and Microsoft
§ To transcribe structured interviews conducted
with 42 white speakers and 73 black speakers
§ found that all five ASR systems exhibited substantial
racial disparities
§ an average word error rate (WER) of 0.35 for black
speakers compared with 0.19 for white speakers.
§ WER standard measure of discrepancy between
machine and human transcription, based on
substitutions, insertions, deletions.
§ Authors trace these disparities to the underlying
acoustic models used by the ASR systems
VUI challenges…there are many!
WER by % of audio snippets.
Assuming a WER of >0.5 implies transcript
is unusable, then 23% of transcribed audio
snippets of black speakers unusable, whereas
only 1.6% of audio snippets of white speakers
result in unusable transcripts.
https://doi.org/10.1073/pnas.1915768117
§Sociophonetics
§Phonetics = study of the production and perception of spoken
§ Selina Jeanne Sutton, , , and . 2019. Voice as a Design Material:
Sociophonetic Inspired Design Strategies in Human-Computer Interaction. In Proceedings of the 2019 CHI
Conference on Human Factors in Computing Systems (CHI ’19). Association for Computing Machinery,
, NY, USA, Paper 603, 1–14. DOI:https://doi.org/10.1145/3290605.3300833
§Study of the social factors influencing production and perception
of speech, shaping sociocultural identities
§Accent and voice quality influenced by, e.g.,
Geography, sex and gender, age, sexuality, social class
§VUIs synthesised speech generally represents a homogenous,
mainstream accent – lack of diversity in what this voice represents
VUI challenges…there are many!
COMP3074-HAI Lecture 13, Conversational UI
§ Gendering synthetic speech
§ Why are voice assistants generally given a female voice?
§ Rausch (Amazon) said in trials user preferred female voices
§ Echoing earlier research that found people find female voices more
agreeable, pleasant (e.g., Mitchell et al., 2011)
http://macdorman.com/kfm/writings/pubs/Mitchell2010DoesSocialDesirabilityBiasFavorHumans.pdf
§ BUT, gendering is embedding gender stereotypes
§ UNESCO report “I’d blush if I could’ criticizes ‘the female servant’ /
‘personal assistant’ stereotype https://unesdoc.unesco.org/ark:/48223/pf0000367416.page=1
§ Does it invite sexualized, gendered language? (Woods, 2018)
https://doi.org/10.1080/15295036.2018.1488082
§ Gender neutral voices, e.g., EqualAI’s ’Q’
VUI challenges…there are many!
COMP3074-HAI Lecture 13, Conversational UI
http://macdorman.com/kfm/writings/pubs/Mitchell2010DoesSocialDesirabilityBiasFavorHumans.pdf
https://unesdoc.unesco.org/ark:/48223/pf0000367416.page=1
https://doi.org/10.1080/15295036.2018.1488082
§Surveillance and privacy intrusion
VUI challenges…there are many!
COMP3074-HAI Lecture 13, Conversational UI
§ Alleged mistreatment / exploitation of workers
§ Workers producing linguistic data sets
§ Not Google employees, but underpaid
subcontractors, “routinely pressured to work
unpaid overtime”
§ Case of Pygmalion, company creates training
data for Google’s Neural Networks
§ Annotation work, manual Part-Of-Speech
tagging to allow Assistant to produce high quality
answers https://www.wired.com/2016/11/googles-search-engine-can-now-
answer-questions-human-help/
§ Vast amounts of human labour involved in ASR /
VUI challenges…there are many!
COMP3074-HAI Lecture 13, Conversational UI
Quiz on Friday
CW1 due next week
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com