Answer to Patrick Paroubek

Patrick Paroubeck asked:
>To what extent can the performance loss reported by SpokenLanguage >Dialogue Systems developers when shifting from labconditions to real >world conditions be attributed to insufficientdefinition of measure taking >conditions? 

I see three type of problems when moving from the lab to the real world for spoken dialogue systems:

1. The measures
Technicians do tend to measure in more optimistic conditions than practitioners in the real world: in Granada I have given two examples: the erroneous confirmations that were counted as successes (because the database was accessed) and playing with the system by the end-users (that the technicians saw happen much more often than the callers recalled).

2. Service success rates

The second difference is that the technicians evaluate their products (rightly) referring to the specifications and the users consider the reaction of their customers: the callers. The difference here is due to shortcomings in the specifications.

3. Caller behaviour

We do not know what the caller’s behaviour is in a real world environment. We have observed so many differences between this behaviour in the lab and in the real production world that (at this point in time) I want to discard any result obtained in the lab. The major differences were:

