HAI-Lecture16
Advanced VUI Design
Copyright By PowCoder代写 加微信 powcoder
Human-AI Interaction
Lecture 16
§Advanced VUI Design
§Making your VUI engaging, easy to use, successful
§Based on Chapter 5 of ’s book
§Branching
§Disambiguation
§Dialog Management and other advanced topics
This Lecture
COMP3074-HAI Lecture 16, Advanced VUI Design
Part 1. Branching and Disambiguation
§Think, discuss, share (3 mins).
§What are some examples you can think of where branching
based on voice input would be beneficial? Where should it be
§Approaches to branching
§Constrained responses
§Open Speech
§Categorisation of input
§Wildcards and logical expressions
Branching based on voice input
COMP3074-HAI Lecture 16, Advanced VUI Design
§Think, discuss, share (3 mins).
§What are some examples you can think of where branching
based on voice input would be beneficial? Where should it be
§Approaches to branching
§Constrained responses
§Open Speech
§Categorisation of input
§Wildcards and logical expressions
§Designing the prompt to constrain the response, so that the
response is likely to contain a single keyword to progress with, e.g.,
§ “Please tell me the name of the restaurant you’re looking for”
(Not, “where do you want to eat”)
§ “What city are you traveling to?” (Not, “where are you traveling”)
§ “What is your main symptom?”
§ “What song would you like to hear?”
§A good strategy is to use N-best lists based on the users following
constrained response
§Containing the N likely matches to find the correct result
§Then iterate over the list until the user confirms
§E.g., “I heard
Constrained responses
COMP3074-HAI Lecture 16, Advanced VUI Design
§To allow for flow when the response doesn’t need to be processed
(i.e., no branching)
§Provide a general reply
§Similar to the ‘generic confirmation’ strategy
§Can be used to obtain information in a ‘natural’, ’conversational’ way
§But in this case it’s important the user has given informed consent
that this sensitive information is shared with their doctor!
Open Speech
COMP3074-HAI Lecture 16, Advanced VUI Design
VIRTUAL NURSE
Please tell me a little more about the headaches you’ve been having.
Well, they usually start in the evening and last a couple of hours.
VIRTUAL NURSE
Thank you. I will share this with your doctor.
§Categorizing user input
§E.g., emotions
happy (“happy, joyful, great, excited, good”)
sad (“sad, depressed, bad, unhappy, miserable”)
§Works for cases where there is no need to confirm exactly what
user said, just acknowledge category and move on
Categorisation of input
COMP3074-HAI Lecture 16, Advanced VUI Design
VIRTUAL COMPANION
How are you feeling?
Well…kind of depressed, to be honest.
VIRTUAL COMPANION
I’m sorry to hear that. Want to tell me more?
§ Wildcards allows for more flexibility by allowing certain words to be repeated
without having to specify them explicitly
§ My computer is really* slow (“My computer is slow .” “My computer is really
slow .” “My computer is really really slow .”)
§ Logical expressions (AND/OR/NOT etc.) to link keywords/phrases together,
§ e.g., keyphrases for tech support VUI
§ Internet is not working
§ Forgot my password
§ Printer won’t print
§ Forgot AND password (“My dad forgot his password again,” “I don’t
remember my password…I forgot it .”)
Wildcards and logical expressions
COMP3074-HAI Lecture 16, Advanced VUI Design
§Different kinds of situations that require disambiguation
§Not enough information
§More than one piece of information, when only one is expected
Disambiguation
COMP3074-HAI Lecture 16, Advanced VUI Design
§ Not enough information
§ When there are more than one of a
§ E.g., ”What’s the weather in
Springfield?”
§ When information is omitted
§ “I’d like a large, please.”
§ When the intent is unclear:
Disambiguation
COMP3074-HAI Lecture 16, Advanced VUI Design
TECH SUPPORT ASSISTANT
Hi there, I’m Pat, . How can I help you today?
I need help with the Internet.
TECH SUPPORT ASSISTANT
Internet. Sure, I can help with that. Let me get some more information. I can help you set up your Wifi,
find information online, or fix your Internet connection. Which one do you need help with?
§More than one piece of information, when
only one is expected
§Case a) the user’s response is ambiguous
§Strategy: ask the user to pick the more
important one, e.g.,
§Case b) the user’s request is ambiguous
§Strategy, ask user to pick one of the
options, e.g.,
Disambiguation
COMP3074-HAI Lecture 16, Advanced VUI Design
HEALTH ASSISTANT
What is your main symptom?
I have a fever and a cough.
HEALTH ASSISTANT
Which one of those would you say is
bothering you the most right now?
The cough…it’s pretty bad.
HEALTH ASSISTANT
OK, let’s start with the cough. I can also
help with the fever symp- tom later. USER
Call HONE VUI
Sure, mobile or work?
Part 2. Dialog Management and other
advanced topics
What are some examples of good DM you can think of?
Think, discuss, share. 3 mins
§To make your dialog flow as flexible and effective as possible
§About managing the step-by-step conversational process
§ Includes a lot of the design principles we’ve already covered, like
error handling, confirmation, shortcuts, context capture etc.
§Good DM adapts to the user effectively, for example:
Dialog Management
COMP3074-HAI Lecture 16, Advanced VUI Design
Hi there, welcome to Pearl’s Pizza. What kind of pizza
can I get you?
Um, I’d like a large pepperoni, please.
[At this point, we have filled in the number of pizzas
(1), and the toppings (pepperoni). Now, we know all
that remains is the address and phone number.]
Hi there, welcome to Pearl’s Pizza. What kind of pizza
can I get you?
Uhh…yeah, I wanna order some pizzas.
Great, that’s what I’m here for. How many would you
Two, please. …
§ (Voiceflow makes this easy)
§Need to recognize both object (e.g., calendar) and intent, i.e., what
the user wants to do with it (e.g., add, remove, view)
§Show me my calendar
§Add an event to my calendar
§Delete my meeting from my calendar
§ In Voiceflow, Intent and Object are usually combined in the concept
of ”utterances“
§Utterances allow you to map different ways the user might
express themselves to the same intent
§Objects can also be captured as slots (Entities in Voiceflow)
Capturing Intent and Objects
COMP3074-HAI Lecture 16, Advanced VUI Design
§ Negative responses are important
§ ““not,” “no,” “don’t,” and “neither”
§ Also, “except”, “dislike” etc.
§ Express what users don’t want
§ Getting it wrong means users get the
opposite of what they’re saying!
§ Can be handled by logical expressions (NOT)
Handling negation
COMP3074-HAI Lecture 16, Advanced VUI Design
How are you feeling today?
Not very good,
(matching on the word “good”) Great to hear!
§May make (partial) transcription errors visible/audible
§Allows the user to recover from errors, as they’re not left guessing
what went wrong
§ Is a good strategy if the user’s utterance must be recognized
correctly to fulfill the intent
§E.g., order details, delivery address etc.
§But can be distracting
§For example if the exact recognition of the user’s utterance isn’t
required to move on
§What strategy to adopt depends on the context of the VUI app and
where in the process the user is
Should the VUI display/say what it recognized?
COMP3074-HAI Lecture 16, Advanced VUI Design
§The process of computationally identifying and categorizing opinions
expressed in a piece of text
§whether the writer’s attitude towards a particular topic, product,
etc., is positive, negative, or neutral
§Negative/positive word lists available, e.g., http://mpqa.cs.pitt.edu/
§NLP needed! You already have some of the relevant skills
§Building BOW models from stemmed documents
§Computing similarity and distance
§ In VUIs, capturing the sentiment can be useful to generate
appropriate responses to the user on the fly
Sentiment Analysis
COMP3074-HAI Lecture 16, Advanced VUI Design
§Synthetic generated vs. recorded human speech
§TTS much better than what it was, good range of voices available
§E.g., https://www.cereproc.com/en/support/live_demo
§Speech Synthesis Markup Language (SSML) can be used to
tune prosody (pitch, rate (speed), volume etc.) and more
§Recording voice talent can make your VUI sound more natural
§Pronunciation, prosody adapted to contextual/local nuances
§But costly, not as flexible (limited to the recorded
words/syllables)
§Cumbersome, must record each word for neutral, rising, falling
intonation etc.
TTS vs. recorded speech
COMP3074-HAI Lecture 16, Advanced VUI Design
https://www.cereproc.com/en/support/live_demo
§ 19 pieces of recorded content required
for this sentence in the San Francisco
Bay Area traffic system
§Next time you’re taking a National Rail
train listen out for the announcement.
You can hear it’s stitched together
from different recorded snippets!
Human recorded speech example
COMP3074-HAI Lecture 16, Advanced VUI Design
As of 10:18 AM, there’s a slowdown on highway
101 northbound, between in
Belmont and Dore avenue in San Mateo. Traffic is
moving between 25 and 30 miles per hour.
§AKA voice biometric authentication
§Touched on voice ID in the context of speaker identification to
avoid triggering the VUI on someone else’s device
§Another use case is transcribing meeting / multi-party
conversations, voice ID to allocate turns to correct speaker
§But it can in principle also be used for authentication for security
§E.g., instead of typing passwords
§Probably unlikely to see this anytime soon for consumer-grade
smartspeakers
Speaker verification
COMP3074-HAI Lecture 16, Advanced VUI Design
§ “being aware of what’s going on around the conversation, as well
as things that have happened in the past”
§ take advantage of basic context to make your VUI seem smarter
as well as saving users’ time
§For example, you can greet the user according to time zone
(“Good morning,” “Good afternoon,” etc.), or when they last
logged on (“Welcome back”, “Haven’t seen you in a while”), use
their location, etc.
§Track the stages required to complete the transaction
§Can allow for more flexible DM, see the Pizza example earlier
COMP3074-HAI Lecture 16, Advanced VUI Design
§We have talked about hybrid strategies of voice and visual output
§Could also have hybrid strategies of voice and visual input
§E.g., allowing the user to point on a map “What’s the capital of
this state”
§Or pointing on a virtual chess board “move my knight here”
§Or have a VUI embedded in website, similar to chat common on
many websites
Advanced Multimodal
COMP3074-HAI Lecture 16, Advanced VUI Design
§Building your own language models, sources for bootstrapping
§Website data, e.g., FAQs, terminology, customer service forms
§Call Center data, e.g., transcriptions of customer support calls
§Data collection, when there isn’t any data available to start from
§ asking people the questions the VUI would ask
§ transcribing what people would say
§Wizard-of-Oz is a common method, simulating the VUI to get
the user to talk to it as it it were real
§Crowdsourcing, e.g., via AMT is another option
Bootstrapping Datasets
COMP3074-HAI Lecture 16, Advanced VUI Design
§VUI/VUX is not a solved problem
§Huge challenges around the gap between how people talk and
the capabilities of ASR and NLU to recognize intent, respond
appropriately
§There’s a lot designers can do to make VUI as effective as
possible despite limitations of ASR and NLU
§Dialog Management is the overarching concept
§Disambiguation, error recovery, handling negation, etc.
§Overall, effective DM helps the user to provide expected/well-
formatted input through good prompt and response design
that enables the user to progress to successful completion
Conclusions
COMP3074-HAI Lecture 16, Advanced VUI Design
Reminder: register
your Voiceflow
Voiceflow labs for
CW2 start this week.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com