Re: EAGLESEVAL: A question from George Doddington.
regarding Doddingtons question:
application or user testing and technical testing serve two completely
different purposes. The user is not interested in how you get there - only
how useful the thingame is that you are offering him. As developers and
scientists the technical testing is the testing. Neither can be said to be
better or easier - they are just apples and pears.
In most NLP testing (technical) we tend to have a wishful thinking aspect
and to some degree forget the state of the art - we want more from the
system than the technology can provide because we are mostly researchers.
When developing you need to know where you are at - somewhere the issue of
competition is healthy - however in science there are things that have to
brew for themselves, that actually take much more effort and perhaps a time
versus performance curve is not suited for. If we are going to improve the
performance of NLP systems we need basic research that is left in peace (not
completely - it NEEDS funding).
In a user test there is no issue of generality - what is useful for one user
can be unusable for the next. The second cannot be carried out by anyone who
doesn't have a need (or has specified their needs) - nor is that test of
much relevance to another user unless the working environment and the need
list is very similiar. A comparative test that compares different systems
performance is useless for a user unless these are carried out on data that
are relevant for the user. This is because generality in language is a myth
- there are no general solutions in NLP unless they preform equally on all
text and that simply isn't the case. This is the reason why no system could
beat Systran in EU commission test - Systran was in that case customized to
the EU texts.
I would like to get back to the first paragraph - you need to know where you
are at - also we need to learn from each other. The point of entry for good
communication between the people working in a field is I believe through
evaluation. There are negative aspects of the dog-runs the US are currently
operating but there are also good aspects - perhaps we can do them better
than them (succer for competition) - or take a completely different approach
like an panel of advicers evaluating the technology and telling us where we
are going wrong in stead of saying you are number 1 - something we feel
awfully proud of but doesn't say anything at all.
This is just brainstorming as usual - mostly I just want to let you know I
am still alive and nursing my mulitlingual baby ESTeam BTR on an island
called Syros (new company office :-) in the middle of the Mediteranian. The
baby has been tested in its skills and performs technically quite nicely for
a three year old. I am very proud of it but as all mothers I want it to do
Markou Botsari 15
144 62 Athens
tel +30 1 8085 704