TEMAA Final Report - Contents
Consortium
1 Introduction
1.1 Aims and contents of the report
1.2 Contributors
2 A model for NLP evaluation
2.1 The ISO 9126 evaluation framework
2.1.1 The EAGLES and TEMAA extensions to ISO 9126
2.2 Towards formalisation and automation
2.2.1 Key concepts in evaluation - a sketch for a formalisation
2.2.2 Parameterisable test bed
2.2.3 Concluding remarks
2.3 Components of the evaluation procedure
3 Case studies
3.1 Spelling checkers
3.1.1 Quality characteristics
3.1.1.1 Functionality
3.1.1.2 Reliability
3.1.3 Efficiency
3.1.1.4 Maintainability
3.1.1.5 Portability
3.1.4.6 Usability
3.1.1.7 Customisability
3.1.2 Reportable attributes
3.1.2.1 Functionality
3.1.2.1.1 Recall
3.1.2.1.2 Precision
3.1.2.1.3 Suggestion Adequacy
3.1.2.2 Usability attributes
3.1.2.3 Customisability attributes
3.1.3 Evaluation measures for spelling checkers
3.1.3.1 Recall measures
3.1.3.2 Precision measures
3.1.3.3 Suggestion adequacy measures
3.1.3.4 Usability measures
3.1.3.5 Customisability measures
3.1.4 Evaluation methods for spelling checkers
3.1.4.1 Methods for constructing basic word lists
3.1.4.2 Methods for constructing error lists
3.1.4.3 Usability methods
3.1.4.4 Customisability methods
3.2 Grammar checkers
3.2.1 Quality characteristics
3.2.1.1 Functionality
3.2.1.2 Reliability
3.2.1.3 Efficiency
3.2.1.4 Maintainability
3.2.1.5 Usability
3.2.1.6 Customisability
3.2.2 Problem checking attributes
3.2.2.1 Recall
3.2.2.2 Precision
3.2.2.3 Suggestion adequacy
3.2.3 Problem checking measures
3.2.3.1 Recall
3.2.3.2 Precision
3.2.4 Problem checking methods: tools, and test materials
3.2.4.1 Tools
3.2.4.2 Methods for the creation of test materials
3.3 Information retrieval
3.4 Commonality in case studies
3.4.1 Spelling checkers and grammar checkers
3.4.2 Information extraction projects
4 Overview of the Parametrisable Test Bed (PTB)
4.1 Introduction
4.2 Global structure of the TEMAA PTB
4.3 The internal structure of the PTB
4.3.1 Maintenance of the test bed
4.3.2 Collecting test data
4.3.3 Defining objects and users
4.3.4 Evaluating object instances
4.4 Specific software programs
4.4.1 PTB
4.4.2 ET: Evaluator's Tool
4.4.3 ER: Evaluation reporter
4.4.4 ASCC: Automated spelling checker checker
4.4.5 Errgen: a program for error generation
4.5 Concluding remarks
5 Reflections and perspectives
5.1 Reflections on the framework
5.2 User Profiles and Requirements
5.2.1 User profiles as a compositional tree of weighted attribute value specifications
5.2.2 Evaluation and the customer
5.2.3 Profiling, requirements, and reusability
5.3 PTB methodology
5.3.1 Infrastructure and technical issues
5.3.2 Support for method library (re)use
5.3.3 Different kinds of methods
5.3.4 Support for requirements capture
References