"EAGLES and Current Evaluation Practices"
Workshop
ETI, University of Geneva
September 8-9, 1998
For those of you who were unable to attend the
workshop in person but would like to contribute with your views and experiences,
here is a fuller version of the final programme with links to some of the
input that the various participants presented. This is not a report on
the workshop itself - this will be made available at a later date.
September 8
1. Welcome, purpose of the workshop
-
This workshop is aimed at drawing together the discussions
of the EAGLES evaluation workshop held in Brussels in November 1997, of
the LREC conference in Granada in May 1998 and of the electronic discussion
list on evaluation and at making the conclusions concrete in the form of
a Handbook of Current Evaluation Practices.
2. Table of Contents of the EAGLES Handbook
3. Summary of the EAGLES Framework (Bente Maegaard)
-
A descriptive summary of the EAGLES framework as
outlined in the EAGLES-I
final
report was given
4. The new ISO standard (Maghi King)
5. The need for technology evaluation in language and speech engineering
(Patrick Paroubek)
-
Abstract The ELSE
project was presented and Patrick Paroubek subsequently provided a brief
summary
of his presentation.
6. User profiling and requirements analysis (Nancy Underwood)
-
Based on Appendix C (Requirements
analysis for linguistic engineering evaluation) of the EAGLES-I report,
the EAGLES model was described. An annotated .rtf version of the
slides is also available. The distinction between functional and non-functional
requirements was taken up: in EAGLES where we have concentrated on text
transformation systems functional requirements were seen as being bound
to the top level specifications on input/output. The question was raised
as to whether this definition of functionality can be extended to other
types of LE systems.
7. Metrics (Bente Maegaard)
-
Taking the EAGLES-I chapters on measures
and methods
as
a starting point, This presentation discussed the notions of internal validity
and external validity. The notion of typing of measures, (cf.
section
2.5.5) and its relation to typing of methods was also brought up
8. Formalisation and automation and tools (Louis des Tombe, Steven
Krauwer)
-
Based on work in both the EAGLES (cf. section
2.2) and TEMAAprojects,
the question of formalisation and automation were addressed, and a demonstration
was given of the parameterisable testbed (PTB).
10.Test materials (Nancy Underwood)
-
In the event, this presentation was not given at
the workshop due to lack of time. It is clear that this is closely related
to the question of sharing resources and data which was discussed on the
second day. The chapter on test materials in the handbook is expected to
be a somewhat updated version section
2.6.3 in the EAGLES-I report. However it may be that this should be
integrated into the section on sharing resources. (for those interested
an annotated .rtf version of the slides
is available).
September 9
Case studies:
11. Activities and results of the ARISE project in the field of validation
(Marc Blasband)
-
Abstract Marc Blasband
presented the validation in the ARISE project
12. Case studies:Spelling and grammar checkers (Nancy Underwood)
-
The application of the EAGLES/TEMAA framework to
a concrete evaluation of Danish spelling checkers was described (a full
description of this experiment as well as an evaluation of Italian spelling
checkers can be found in "An
experimental Application of the TEMAA Evaluation Framework: Spelling Checkers".
In addition preliminary work on applying the framework to grammar checkers
was presented.see also Paggio, P & N. L. Underwood (1998) Validating
the TEMAA LE evaluation methodology: a case study on Danish Spelling checkers.
Natural
Language Engineering, 4(3))
13. Discussion: Sharing resources - tools and data (introduction: Maghi
King)
-
As background to this see the paper "Language Resources
and Evaluation", Margaret King presented at the Moncton Conference and
available here as an .rtf file.
14. Current Practices
a. Short presentation of the Survey on Current Practices for the
Handbook (Maghi King)
b. Further contributions
MT Evaluation (contributions
from Litton PRC):
Methodology from DARPA MT evaluation:
The Machine Translation Functional Proficiency Scale
Parser Evaluation
(contributions from John Carroll/Ted Brisco)
-
Measuring parser accuracy is a difficult problem:
a number of approaches have been proposed in the literature, but all suffer
from drawbacks. In the paper Carroll, J., E. Briscoe & A. Sanfilippo
(1998) Parser evaluation: a survey and a new proposal. In proceedings of
the 1st International Conference on Language Resources and Evaluation,
Granada, Spain. 447-454. (Available online at ftp://ftp.cogs.susx.ac.uk/pub/users/johnca/lre98-final.ps)
we describe and justify a new (dependency-based)
technique which overcomes some of the shortcomings of previous proposals.
The technique draws on work on subcategorization standards developed within
EAGLES by the lexicon/syntax interest group
-
The proceedings of the workshop: 'The Evaluation
of Parsing Systems' at the 1st International Conference on Language
Resources and Evaluation, Granada, Spain, May 1998. are available (in printed
form only) as a technical report (Cognitive Science Research Paper 489,
School of Cognitive and Computing Sciences, University of Sussex, UK) click
here for the table of contents
Back to EAGLES-II
Home Page