Answer to Patrick Paroubek
Patrick Paroubeck asked:
>To what extent can the performance loss
reported by SpokenLanguage >Dialogue Systems developers when shifting from
labconditions to real >world conditions be attributed to
insufficientdefinition of measure taking >conditions?
I see three type of problems when moving from the lab to the
real world for spoken dialogue systems:
1. The measures
Technicians do tend to measure in more optimistic conditions
than practitioners in the real world: in Granada I have given two examples: the
erroneous confirmations that were counted as successes (because the database was
accessed) and playing with the system by the end-users (that the technicians saw
happen much more often than the callers recalled).
2. Service success rates
The second difference is that the
technicians evaluate their products (rightly) referring to the specifications
and the users consider the reaction of their customers: the callers. The
difference here is due to shortcomings in the specifications.
3. Caller behaviour
We do not know what the caller’s behaviour is in a real
world environment. We have observed so many differences between this behaviour
in the lab and in the real production world that (at this point in time) I want
to discard any result obtained in the lab. The major differences
were:
- 3% erroneous confirmations instead of 15% in the
lab
- 0% playing instead of 5-10% in the lab
- fewer correction turns in the dialogue (operator fallback
instead) than in the lab (where there was no operator fallback). This can be
attributed to the lab set-up where operator fallback was too expensive to
deploy, but this also remains a troublesome difference;
- the impact of explanations to the callers is much smaller
in the real world than in a lab environment;
- the irritation of the caller in case of a system error is
higher in a lab than in the real world.
Marc Blasband