计算机代考 COMP3074-HAI Lecture 16, Advanced VUI Design

HAI-Lecture16

Advanced VUI Design

Copyright By PowCoder代写 加微信 powcoder

Human-AI Interaction

Lecture 16

§Advanced VUI Design
§Making your VUI engaging, easy to use, successful
§Based on Chapter 5 of ’s book

§Branching
§Disambiguation

§Dialog Management and other advanced topics

This Lecture

COMP3074-HAI Lecture 16, Advanced VUI Design

Part 1. Branching and Disambiguation

§Think, discuss, share (3 mins).
§What are some examples you can think of where branching

based on voice input would be beneficial? Where should it be

§Approaches to branching
§Constrained responses
§Open Speech
§Categorisation of input
§Wildcards and logical expressions

Branching based on voice input

COMP3074-HAI Lecture 16, Advanced VUI Design

§Think, discuss, share (3 mins).
§What are some examples you can think of where branching

based on voice input would be beneficial? Where should it be

§Approaches to branching
§Constrained responses
§Open Speech
§Categorisation of input
§Wildcards and logical expressions

§Designing the prompt to constrain the response, so that the
response is likely to contain a single keyword to progress with, e.g.,
§ “Please tell me the name of the restaurant you’re looking for”

(Not, “where do you want to eat”)
§ “What city are you traveling to?” (Not, “where are you traveling”)
§ “What is your main symptom?”
§ “What song would you like to hear?”

§A good strategy is to use N-best lists based on the users following
constrained response
§Containing the N likely matches to find the correct result
§Then iterate over the list until the user confirms
§E.g., “I heard . Is that correct“?

Constrained responses

COMP3074-HAI Lecture 16, Advanced VUI Design

§To allow for flow when the response doesn’t need to be processed
(i.e., no branching)
§Provide a general reply
§Similar to the ‘generic confirmation’ strategy

§Can be used to obtain information in a ‘natural’, ’conversational’ way

§But in this case it’s important the user has given informed consent
that this sensitive information is shared with their doctor!

Open Speech

COMP3074-HAI Lecture 16, Advanced VUI Design

VIRTUAL NURSE
Please tell me a little more about the headaches you’ve been having.
Well, they usually start in the evening and last a couple of hours.
VIRTUAL NURSE
Thank you. I will share this with your doctor.

§Categorizing user input
§E.g., emotions

happy (“happy, joyful, great, excited, good”)
sad (“sad, depressed, bad, unhappy, miserable”)

§Works for cases where there is no need to confirm exactly what
user said, just acknowledge category and move on

Categorisation of input

COMP3074-HAI Lecture 16, Advanced VUI Design

VIRTUAL COMPANION
How are you feeling?
Well…kind of depressed, to be honest.
VIRTUAL COMPANION
I’m sorry to hear that. Want to tell me more?

§ Wildcards allows for more flexibility by allowing certain words to be repeated
without having to specify them explicitly
§ My computer is really* slow (“My computer is slow .” “My computer is really

slow .” “My computer is really really slow .”)
§ Logical expressions (AND/OR/NOT etc.) to link keywords/phrases together,

§ e.g., keyphrases for tech support VUI
§ Internet is not working
§ Forgot my password
§ Printer won’t print

§ Forgot AND password (“My dad forgot his password again,” “I don’t
remember my password…I forgot it .”)

Wildcards and logical expressions

COMP3074-HAI Lecture 16, Advanced VUI Design

§Different kinds of situations that require disambiguation
§Not enough information
§More than one piece of information, when only one is expected

Disambiguation

COMP3074-HAI Lecture 16, Advanced VUI Design

§ Not enough information
§ When there are more than one of a

§ E.g., ”What’s the weather in

Springfield?”
§ When information is omitted

§ “I’d like a large, please.”
§ When the intent is unclear:

Disambiguation

COMP3074-HAI Lecture 16, Advanced VUI Design

TECH SUPPORT ASSISTANT
Hi there, I’m Pat, . How can I help you today?
I need help with the Internet.
TECH SUPPORT ASSISTANT
Internet. Sure, I can help with that. Let me get some more information. I can help you set up your Wifi,
find information online, or fix your Internet connection. Which one do you need help with?

§More than one piece of information, when
only one is expected

§Case a) the user’s response is ambiguous
§Strategy: ask the user to pick the more

important one, e.g.,
§Case b) the user’s request is ambiguous

§Strategy, ask user to pick one of the
options, e.g.,

Disambiguation

COMP3074-HAI Lecture 16, Advanced VUI Design

HEALTH ASSISTANT
What is your main symptom?
I have a fever and a cough.
HEALTH ASSISTANT
Which one of those would you say is
bothering you the most right now?
The cough…it’s pretty bad.
HEALTH ASSISTANT
OK, let’s start with the cough. I can also
help with the fever symp- tom later. USER

Call HONE VUI
Sure, mobile or work?

Part 2. Dialog Management and other
advanced topics
What are some examples of good DM you can think of?
Think, discuss, share. 3 mins

§To make your dialog flow as flexible and effective as possible
§About managing the step-by-step conversational process
§ Includes a lot of the design principles we’ve already covered, like

error handling, confirmation, shortcuts, context capture etc.
§Good DM adapts to the user effectively, for example:

Dialog Management

COMP3074-HAI Lecture 16, Advanced VUI Design

Hi there, welcome to Pearl’s Pizza. What kind of pizza
can I get you?
Um, I’d like a large pepperoni, please.
[At this point, we have filled in the number of pizzas
(1), and the toppings (pepperoni). Now, we know all
that remains is the address and phone number.]

Hi there, welcome to Pearl’s Pizza. What kind of pizza
can I get you?
Uhh…yeah, I wanna order some pizzas.
Great, that’s what I’m here for. How many would you
Two, please. …

§ (Voiceflow makes this easy)
§Need to recognize both object (e.g., calendar) and intent, i.e., what

the user wants to do with it (e.g., add, remove, view)
§Show me my calendar
§Add an event to my calendar
§Delete my meeting from my calendar

§ In Voiceflow, Intent and Object are usually combined in the concept
of ”utterances“
§Utterances allow you to map different ways the user might

express themselves to the same intent
§Objects can also be captured as slots (Entities in Voiceflow)

Capturing Intent and Objects

COMP3074-HAI Lecture 16, Advanced VUI Design

§ Negative responses are important
§ ““not,” “no,” “don’t,” and “neither”
§ Also, “except”, “dislike” etc.
§ Express what users don’t want
§ Getting it wrong means users get the

opposite of what they’re saying!

§ Can be handled by logical expressions (NOT)

Handling negation

COMP3074-HAI Lecture 16, Advanced VUI Design

How are you feeling today?
Not very good,
(matching on the word “good”) Great to hear!

§May make (partial) transcription errors visible/audible
§Allows the user to recover from errors, as they’re not left guessing

what went wrong
§ Is a good strategy if the user’s utterance must be recognized

correctly to fulfill the intent
§E.g., order details, delivery address etc.

§But can be distracting
§For example if the exact recognition of the user’s utterance isn’t

required to move on
§What strategy to adopt depends on the context of the VUI app and

where in the process the user is

Should the VUI display/say what it recognized?

COMP3074-HAI Lecture 16, Advanced VUI Design

§The process of computationally identifying and categorizing opinions
expressed in a piece of text
§whether the writer’s attitude towards a particular topic, product,

etc., is positive, negative, or neutral
§Negative/positive word lists available, e.g., http://mpqa.cs.pitt.edu/
§NLP needed! You already have some of the relevant skills

§Building BOW models from stemmed documents
§Computing similarity and distance

§ In VUIs, capturing the sentiment can be useful to generate
appropriate responses to the user on the fly

Sentiment Analysis

COMP3074-HAI Lecture 16, Advanced VUI Design

§Synthetic generated vs. recorded human speech
§TTS much better than what it was, good range of voices available

§E.g., https://www.cereproc.com/en/support/live_demo
§Speech Synthesis Markup Language (SSML) can be used to

tune prosody (pitch, rate (speed), volume etc.) and more
§Recording voice talent can make your VUI sound more natural

§Pronunciation, prosody adapted to contextual/local nuances
§But costly, not as flexible (limited to the recorded

words/syllables)
§Cumbersome, must record each word for neutral, rising, falling

intonation etc.

TTS vs. recorded speech

COMP3074-HAI Lecture 16, Advanced VUI Design

https://www.cereproc.com/en/support/live_demo

§ 19 pieces of recorded content required
for this sentence in the San Francisco
Bay Area traffic system

§Next time you’re taking a National Rail
train listen out for the announcement.
You can hear it’s stitched together
from different recorded snippets!

Human recorded speech example

COMP3074-HAI Lecture 16, Advanced VUI Design

As of 10:18 AM, there’s a slowdown on highway
101 northbound, between in
Belmont and Dore avenue in San Mateo. Traffic is
moving between 25 and 30 miles per hour.

§AKA voice biometric authentication
§Touched on voice ID in the context of speaker identification to

avoid triggering the VUI on someone else’s device
§Another use case is transcribing meeting / multi-party

conversations, voice ID to allocate turns to correct speaker
§But it can in principle also be used for authentication for security

§E.g., instead of typing passwords
§Probably unlikely to see this anytime soon for consumer-grade

smartspeakers

Speaker verification

COMP3074-HAI Lecture 16, Advanced VUI Design

§ “being aware of what’s going on around the conversation, as well
as things that have happened in the past”
§ take advantage of basic context to make your VUI seem smarter

as well as saving users’ time
§For example, you can greet the user according to time zone

(“Good morning,” “Good afternoon,” etc.), or when they last
logged on (“Welcome back”, “Haven’t seen you in a while”), use
their location, etc.

§Track the stages required to complete the transaction
§Can allow for more flexible DM, see the Pizza example earlier

COMP3074-HAI Lecture 16, Advanced VUI Design

§We have talked about hybrid strategies of voice and visual output
§Could also have hybrid strategies of voice and visual input

§E.g., allowing the user to point on a map “What’s the capital of
this state”

§Or pointing on a virtual chess board “move my knight here”
§Or have a VUI embedded in website, similar to chat common on

many websites

Advanced Multimodal

COMP3074-HAI Lecture 16, Advanced VUI Design

§Building your own language models, sources for bootstrapping
§Website data, e.g., FAQs, terminology, customer service forms
§Call Center data, e.g., transcriptions of customer support calls
§Data collection, when there isn’t any data available to start from

§ asking people the questions the VUI would ask
§ transcribing what people would say
§Wizard-of-Oz is a common method, simulating the VUI to get

the user to talk to it as it it were real
§Crowdsourcing, e.g., via AMT is another option

Bootstrapping Datasets

COMP3074-HAI Lecture 16, Advanced VUI Design

§VUI/VUX is not a solved problem
§Huge challenges around the gap between how people talk and

the capabilities of ASR and NLU to recognize intent, respond
appropriately

§There’s a lot designers can do to make VUI as effective as
possible despite limitations of ASR and NLU
§Dialog Management is the overarching concept

§Disambiguation, error recovery, handling negation, etc.
§Overall, effective DM helps the user to provide expected/well-

formatted input through good prompt and response design
that enables the user to progress to successful completion

Conclusions

COMP3074-HAI Lecture 16, Advanced VUI Design

Reminder: register
your Voiceflow

Voiceflow labs for

CW2 start this week.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com