MULTIMODAL CONVERSATIONAL AGENT

Multimodal conversational agent: Activate distance control. Did you meant activate adaptive cruise control?Multimodal conversational agent: Activate distance control. Did you meant activate adaptive cruise control?

Brief

We were briefed to create the UI prototype of a vehicle specific multimodal conversational agent that provides a single, human-like interface to support and on-board users in their everyday use of their vehicles.

Day 1

Research & challenge definition

There are two types of in-vehicle voice assistants: Vehicle-specific assistants and third-party assistants. Only vehicle-specific assistants provide vehicle-related features. Currently third-party assitants are the most commomly used with

62% of smartphone voice assistant users use it in their car.

Level of mental distraction: how completing a task with voice interaction while driving can impact our reaction time?
When using in-car voice-activated systems; high cognitive load, long interaction time and errors are causing mental distraction which might slow users reaction time. Refer to the study showing mental distraction rankings of voice-activated system which points out shows that after finishing a task, drivers remain impaired for up to 27 seconds.

How to make in-car activated-system an enjoyable and safe experience?

1- Minimise cognitive load
Users need to be highly focused on traffic while driving — safety is the top priority.

2- Adapt to users' level of experience
Learn from the users' behaviours when using the agent to adapt the content to their current levels of experience.

3- Communicate system state
Provide transitory and responsive feedback to be transparent about system- and user-actions.

4- Provide a multimodal experience
Users need to be highly focused on traffic while driving — safety is the top priority.

5- Display content contextually
Show users features based on their current context of use to make them experienced users.

Day 2

Uses cases definition

chart of scenarioschart of scenarios

Set up complex feature

Situation: while driving
Request: user-driven
Level: beginner/expert
------
The user knows about a specific feature and wants to set it up. This setup requires multiple data input. While a beginner might use step-by-step guidance for setup, an expert would utter all information at once. 

In our scenario, a beginner wants to set up the pre-heating feature step-by-step.

Activate partially known feature

Situation: while driving
Request: user-driven
Level: experienced
------
The user wants to activate a known feature but can’t remember the exact name. The system gives (visual and voice) suggestions based on the user request.

In our scenario, an experienced user wants to activate Adaptive Cruise Control by saying “Distance control”.

Service notification

Situation: while driving or standing
Request: system-driven
Level: beginner
------
The system shows a notification when a service light appears in the dashboard to support the user in solving the issue by providing multiple options.

In our scenario, a beginner is notified that the level of the windshield cleaning fluid is low and wants to add it to his/her upcoming service appointment that is about to take place soon.

Installation of removable parts

Situation: while standing
Request: user-driven
Level: beginner
------
The user wants to install a removable part but doesn’t know or can't exactly recall the required steps and needs a step-by-step guidance.

In our scenario, a beginner wants to install a security net (knowing that this is a multi-step process), and wants to start with step one.

Feature-hint notifications

Situation: while standing
Request: system-driven
Level: beginner
------
The system displays a notification about a vehicle feature that the user could have used in a specific situation — in particular, if using this feature eases the interaction with the vehicle.

In our scenario, a beginner is notified about the existence of the motion sensor trunk release after the first five times opening the trunk without using this feature.

Day 3 to 5

Prototyping

Scenario 1: set-up pre-heating feature step-by-step - MMI home screen, the conversational agent is inactive

Scnenario 1: set-up pre-heating feature step-by-step - the user triggered the agent, the agent is now asking the first information needed to complete the task.

Scnenario 1: set-up pre-heating feature step-by-step - the user is answering the agent

Scnenario 1: set-up pre-heating feature step-by-step - the agent is processing the information and the user get a visual feedback of which informations he provided.

Scnenario 1: set-up pre-heating feature step-by-step - the agent confirms success by voice and ask the next information. While on the screen we see a sumup of the previous selection and the next one to be done.

Scnenario 1: set-up pre-heating feature step-by-step - at the end of the process, the agent give you a voice recap while the screen sum-up all given informations. The user as 5s. before it gets activated to change some settings.

Following week

In house testing

Picture of the user-test set-up, 2 people looking at a prototype of a car dashboard.Picture of the user-test set-up, 2 people looking at a prototype of a car dashboard.

For this user-test, we welcomed 5 participants (2 female, 3 male) between 29 and 62 years (mean age: 46.6 years). All participant had experience with voice interfaces (Alexa, Siri) and were 100% car owners

We had 30 min testing per participant. All testing was conducted in German by one of our in house researcher. The prototype ran on an iPad Pro 12.9". All voice input was simulated — the interviewer manually clicked a hidden button in the prototype to confirm the end of voice input and to trigger the turn-taking (so that the system could start the output)

For voice output, we pre-recorded audio generated from the Amazon Alexa text-to-speech engine were used. A simulated vehicle dashboard was shown. For while-driving scenarios, a TV showed a recorded drive in city traffic.

Main outcome

We found that their is a preference towards voice input when driving but the decision for voice or touch happens spontaneously depending what is quicker. Most of the participant didn’t change interaction modality mid-scenario.

Participants’ interaction behaviour while driving is based on how safe they feel but we need in-situation testing to confirm that: in general, it was difficult to differentiate the "while-driving" and "while-standing" states and respective behaviour, because participants were heavily looking at the MMI. In-situation testing would help learn more about the driving mode and standing mode (which was too early to test).

2 out of 5 participants ask for the realese date.

It’s important to take into consideration the learning curve: There is a bit of a learning curve to the display, so it might make sense to start out new users with simple interactions and requests and get more advanced as the user learns the system.

Users want to use natural language to steer the system.