Evaluation Bibliography

EAGLES-II Evaluation Bibliography

This page contains bibliographic references of general interest to those involved in evaluation of LE products and systems. The bibliography is under development and will be periodically updated.

Ackerman, A., Fowler, P. & Ebenau, R. (1984). Software inspection and the industrial production of software, Software Validation. Proc. Symp. Software Validation pp. 13-40.

Ahmad, K., Holmes-Higgin, P., Rogers, M., Höge, M., Le-Hong, K., Huwig, C., Kese, R. & Mayer, R. (1993). User-driven software development: Translator's workbench - an exemplar case study., in M. Smith and G. Salvendy (eds), Proceedings of the fifth International Conference on Human-Computer Interaction, (HCI International '93), Orlando, Florida, August 8 - 13, Vol. 1, pp. 319-324.

Albisser, D. (1993). Evaluation of MT Systems at the Union Bank of Switzerland.Machine Translation, 8(1,2):2-28.

ALPAC (1966). Language and Machines. Computers in Translation and Linguistics, National Research Council. Publication 1416. National Research Council., Washington D.C. A Report by the Automatic Language Processing Advisory Committee. Division of Behavioural Sciences, National Academy of Sciences

Arnold, D., Humphreys, R. L. & Sadler, L. (eds) (1993). Special Issue on Evaluation of MT systems. Machine Translation, 8(1,2).

Arnold, D., Moffat, D., Sadler, L. & Way, A. (1993). Automatic Test Suite Generation. Machine Translation, 8(1,2):29-38.

Arnold, D., Sadler, L. & Humphreys, R. L. (1993). Evaluation: An Assesment. Machine Translation, 8(1,2):1-24.

Athappily, K. & Galbreath, R. ( 1986). Practical methodology simplifies DSS software evaluation process, Data Management 24(2): 10-28.

Balkan, L., Netter, K., Arnold, D. & Meijer, S. ( 1994). TSNLP - test suites for natural language processing, Proceedings of the Language Engineering Convention, ELSNET, Paris, pp. 17-22.

Bates, M. (1988). Reports on Evaluations of Natural Language Systems, Talk presented at the workshop on Evaluation of Natural Language Processing Systems.Wayne, Philadelphia December 8-9, 1988.

Bates M. & Ralph W. (1987). Evaluating Natural Language Interfaces Presented as a Tutorial at the 25th Annual Meeting of the Association for Computational Linguistics, July 6, 1987, Stanford University BBN Laboratories Inc.

Bates M. (1988) Draft Corpus for testing Natural Language db query interfaces Distributed at the workshop on evaluation of natural language processing systems. Wayne, Philadelphia December 8-9, 1988.

Battelle, (1977). The Evaluation and Systems Analysis of the SYSTRAN Machine Translation system, RADC-TR-76-399 Final Technical Report, Battelle Colombus Laboratories, Rome Air Development Center, Air Force Systems Command, Griffiss Air Force Base, New York.

Beerepoot-Sangen, Y. &Leentvaar-Leistra, G. (1991). Consument en produktkwaliteit. Kluwer, Deventer.

Belonogov, G. G., Kuznetsov, B. A. & Krichevskij, V.K. (1986). Evaluer l'efficacité d'un système de recherche documentaire á l'indexation automatique, (Evaluating the efficiency of an information retrieval system with automated indexing) Naucno-tehniceskaja informacija-Vsesojuznyj institut naucnoj i tehniceskoj informacii. Serija 2. Informacionnye processy i sistemy, ISSN 0548-0027, SUN, No. 8, pp. 6-13, CNRS-10522B.

Bevan, N. (1980). Human Factors in the Use of EURODICAUTOM and SYSTRAN. Second Report to the Commission of the European Communities CETIL/199/80. Luxembourg May,1980.

Bevan N. (1997). Quality in Use; Incorporating Human Factors into the software engineering lifecycle. Proceedings of the Third International Symposium and Forum on Software Engineering Standards, ISESS'97 conference, August 1997.

Bevan N. & Curson I. (1997). Methods of Measuring Usability. Proceedings of the sixth IFIP conference on human-computer interaction, Sydney, Australia, July 1997.

Billmeier, R, (1982). Zu den linguistischen Grundlagen von SYSTRAN. Multilingua 5(4):83-96, Mouton Publishers.

Boggio. G, & Spachis-Papazois, E. (eds) (1984). Evaluation of Research and Development. Methodologies for R&D Evaluation in the European Community Member States, the United States of America and Japan. Proceedings of the Seminar held in Brussels, Belgium, October 17-18, 1983. D. Reidel Publishing Company, Dordrecht.

Boisen, S. & Bates, M. (1992). A practical methodology for the evaluation of spoken language systems, Proceedings of the Third Conference on Applied Natural Language Processing, Trento, pp. 162-169.

Bourbeau, L. (1990). Élaboration et mise au point d'une méthodologie d'évaluation linguistique de systemes de traduction assistée par ordinateur (Rapport final). Secrétariat d'État du Canada, Secteur Langues Officielles et Traduction, Direction de la Planification, Gestion et Technologie, Québec.

Box, J. (1979). Konsument en informatie - de rol van vergelijkend warenonderzoek, Thesis, Delftse Universitaire Pers., Delft.

Bradford, J. (1982). A metric space defined on English and its relation to error correction. Proceedings of COLING-82, pp. 43-48.

Bruderer, H. E. (1978). Handbuch der maschinellen und maschinenunterstützten Sprachübersetzung. Verlag Dokumentation Saur KG, München.

Buchmann, B. &Warwick, S. (1985). Machine Translation. Pre-ALPAC History. Post-ALPAC Overview, ISSCO Working Papers Number 50, Fondazione Dalle Molle, Geneva.

Bukowski, J. (1987). Evaluating software test results: A new approach, Proceedings Annual Reliability and Maintainability Symposium, Philadelphia, USA, 27 -29. Jan, pp. 369-375.

Card, S., Moran, T. & Newell, A. (1983). The psychology of human-computer interaction. Lawrence Erlbaum Associates, Hillsdale, NJ.

Cary, R. & Sproles, G. (1978). Evaluating product testing methods: A theoretical framework, Home Economics Research Journal, 7: 66-75.

Caspari, G. (1987). Untersuchungen zu Bewertungskriterien für maschinell erstellte übersetzungen. Unveröffentliche Diplomarbeit. Universität des Saarlandes.

CETIL, (1979). Comments by Mr. Leamy on the B.M.v.D. evaluation report on Systran French-English, CETIL/159/79, Luxembourg.

CETIL, (1979). The Development Potential of Systran in the European Commission CEC Contract TH-17, Cambridge Research Unit, CETIL 153/79, Luxembourg.

CETIL, (1979). Systran Evaluation and Comparison. Summary Report of Rewisers' Comments on Machine Produced Translations. Working Document for the CETIL meeting 26 and 27 March 1979 CETIL/139/79, Luxembourg.

Chandler, R. (1989). Grammar problems?, Electric Word, Sept-Oct 1989.

Chinchor, N. (1991). MUC-3 evaluations metrics, Proceedings of the Third Message Understanding Conference (MUC-3), Morgan Kaufmann, San Mateo, CA, pp. 17-24.

Commission of the European Communities, (1986). Communication to the Council Concerning a Community Plan of Action Relating to the Evaluation of Community Research and Development Activities for the Years 1987 to 1991. Com(86) final, Brussels, 20 November 1986.

Crellin, J., Horn, T. and Preece, J. ( 1990). Evaluating evaluation: A case study of the use of novel and conventional evaluation techniques in a small company, in D. Diaper, D. Gilmore, G. Cockton and B. Shackel (eds), Human Computer Interaction - INTERACT '90, Elsevier, Amsterdam, pp. 329-335.

Crook, M. N. & Bishop, H. P. (1965). Evaluation of Machine Translation, Final Report. The Institute for Psychological Research, Tufts University, April 1965.

Cude, B. (1980). An objective method of determining the relevancy of product characteristics. ACCI-Proceedings 1980, pp. 111-116.

Cuthbert, J. (1979). Testing for consumers, Proceedings of the First North American Conference of Consumer Product Testing, Ottawa Consumers' Association of Canada, Ottawa, pp. 9-21.

Dahl, D. A., Hirschman, L. & Ball, C. N., (1988). Black Box Evaluation of PUNDIT. Talk presented at the workshop on Evaluation of Natural Language Processing Systems.Wayne, Philadelphia December 8-9, 1988.

Damerau, F. (1980). The transformational question answering system: Description, operating experience and implications, Report RC8287, IBM Thomas J. Watson Research Center, Yorktown Heights, NY.

Deutsch, M. (1982). Software Verification and Validation, Englewood Cliffs, NJ 07632.

EWG (1996). EAGLES Evaluation Group. Final Report. Center for Sprogteknologi, Copenhagen, Denmark.

Ericson, K. A. & Simon, H. A. (1984). Protocol Analysis: verbal reports as data, MIT Press, Boston.

Fagan, M. (1976). Design and code inspection to reduce errors in program development, IBM System Journal 15(3).

Falkedal, K. (1994). Evaluation methods for machine translation systems: An historical overview and critical account, ISSCO draft report, University of Geneva, Geneva.

Falkedal, K. (ed.) (1994). Proceedings of the evaluators' forum, Les Rasses, ISSCO, University of Geneva, Geneva.

Fasella, P. (1984). The Evaluation of the European Community's Research and Development Programs, in G Boggio et al (eds), Evaluation of Research and Development. Methodologies for R&D Evaluation in the European Community Member States, the United States of America and Japan. Proceedings of the Seminar held in Brussels, Belgium, October 17-18,1983. D. Reidel Publishing Company, Dordrecht, pp. 3-13.

Flank, S., Temin, A., Blejer, H., Kehler, A. & Greenstein, S. (1993). Module-Level Testing for Natural Language Understanding. Machine Translation, 8(1,2):39-48.

Flickinger, D., Nerbonne, J., Sag, I. & Wasow, T. (1987). Toward Evaluation of NLP Systems. Unpublished. Paper presented at Forum for the Association of Computational Linguistics, 6 July 1987, Stanford University.

Fulford, H. & Höge, M. (1989). Preliminary study of user requirements - methods of investigation, Internal report of the ESPRIT II project 2315 translator's workbench (TWB), University of Surrey, Stuttgart and Guildford.

Fulford, H., Höge, M. & Ahmad, K. ( 1990). User requirements study, Final report of the ESPRIT II project 2315 translator's workbench (TWB), EC, Stuttgart and Guildford.

Fundingsland, O. T. (1984). Perspectives on Evaluating Federally Sponsored Research and Development in the United States, in G Boggio et al (eds), Evaluation of Research and Development. Methodologies for R&D Evaluation in the European Community Member States, the United States of America and Japan. Proceedings of the Seminar held in Brussels, Belgium, October 17-18,1983. D. Reidel Publishing Company, Dordrecht, pp. 105-114.

Færch, C., Haastrup, K. & Phillipson, R. (1984). Learner Language and Language Learning, Nordisk Forlag, Copenhagen.

Geistfield, L. V., Sproles, G. B. & Badenhop, S. B. (1977). The concept and measurement of a hierarchy of product characteristics, Advances in Consumer Research IV: 302-307.

Gershman, A. (1988). Evaluation of Natural Language Processing Systems. Talk presented at the workshop on Evaluation of Natural Language Processing Systems.Wayne, Philadelphia December 8-9, 1988.

Gervais, A. (1980). Evaluation du système-pilote de traduction automatique TAUM-AVIATION. Rapport final, Bureau des traductions, Secrétariat d'État, Ottawa, Canada.

Granger, R. H. (1980). When expectation fails: towards a self-correcting inference system. AAAI-80, pp. 301-305.

Granger, R. H. (1983). The NOMAD system: expectation-based detection and correction of errors during understanding of syntactically and semantically ill-formed text. American Journal of Computational Linguistics, 9(3-4):188-196 .

Groenenveld, J. (1984). Simple tests manual, Consumentenbond/IOCU, 's-Gravenhage.

Grosjean, F. (1988). Evaluating Natural Language Processing Products. Laboratoire de traitement du langue et de la parole, Université de Neuchâtel.

Grosjean, F. & Dommergues, J-Y. (1988). Evaluation du système de reconnaissance de parole RDP8-A de Systèmes G. Laboratoire de traitement du langue et de la parole, Université de Neuchâtel.

Gruber, T. (1989). The acquisition of strategic knowledge, Academic Press, San Diego.

Guida, G. & Mauri, G. (1984). A Formal Basis for Performance Evaluation of Natural Language Understanding Systems. Computational Linguistics, 10(1):15-30.

Habermann, F. W. A. (1987). Erfahrungen mit maschinelle Uebersetzungen im Kernforschungszentrum Karlsruhe. Talk presented at the Jahrestagung der Internationalen Vereinigung Sprache und Wirtschaft, 1987.

Habermann, F. W. A. (1986). Provision and Use of Raw Machine Translation. Terminologie et traduction. Numéro spécial "World Systran Conference" , 1:29-43.

Harman, D. (in press). The first text retrieval conference (trec1), National Institute of Standards and Technology special publication 500-207, NIST, Gaithersberg, MD.

Hausen, H. (1984). Comments on practical constraints of software validation techniques, Proceedings of symposium on software validation., pp. 323-333.

Hausen, H. & Müllerburg, M. ( 1982). Kombination von verfahren fur die software-prufung, Internationaler Kongress fur Datenverarbeitung und Informationstechnologie (IKD) pp. 111-125.

Hausen, H., Müllerburg, M. & Schmidt, M. (1987). Uber das prufen, messen und bewerten von software. methoden und techniken der analytischen software-qualitatssicherung, Informatik Spektrum 10(3): 123-144.

Hays, D. G. & Mathias, J. (eds) (1976). FBIS Seminar on Machine Translation. Summary proceedings of a Seminar held at Rosslyn, Virginia, on 8-9 March 1976, organized by MRM Inc. for the U.S. Government Foreign Broadcast Information Service. American Journal of Computational Linguistics, Microfiches 46, 51

Hayward, S., Breuker, J. A. & Wielinga, B. J. (1987). The KADS methodology: Analysis and design for knowledge based systems, ESPRIT P1098 Deliverable Y1, STC Technology Ltd., Alborg.

Heid, U. (1990). Evaluation und Verbesserung der Sprachrichtung Französisch-Deutsch des Maschinellen Übersetzungssystems SYSTRAN. Bericht des IMS für den Zeitraum 1.7.89 - 30.4. 1990. Vorversion.

Heid, U. (1988). Evaluation der französisch-deutschen SYSTRAN-übersetzung. Vorhabenskizze, IMS, Stuttgart.

Hendry. D.G. & Green, T. R. G. (1993). Spelling mistakes: how well do correctors perform? in: Adjunct Proceedings of InterCHI'93.

Henisz-Dostert, B., Macdonald, R. R. & Zarechnak M. (1979). Machine Translation, Trends in Linguistics. Studies and Monographs 11. Mouton Publishers.

Hildenbrand, E. & Heid, U. (1990). Ansätze zur Ermittlung der linguistischen Leistungsfähigkeit von maschinellen übersetzungssystemen. Zur Entwicklung von Französisch-Deutschem Testmaterial für SYSTRAN. Talk presented at Linguistisches Kolloquium, Paderborn, September 1990.

Hobbs, J. R. (1998). A Canonical Corpus. Talk presented at the workshop on Evaluation of Natural Language Processing Systems, Wayne, Philadelphia December 8-9, 1988.

Hofmann, U. & Heino, H. (1992). Maschinelles übersetzen -- vorteile und grenzen, TEKOM Nachrichten der Gesellschaft für technische Kommunikation .

Höge, M., Hohmann, A. & Le-Hong, K. (1993). User-centered software development and evaluation, Poster Sessions. Abridged Proceedings of the fifth International Conference on Human-Computer Interaction, (HCI International '93), August 8 - 13, 1993, Orlando, Florida, p. 166.

Höge, M., Hohmann, A. & Mayer, R. (1992). Evaluation of TWB - operationalization and test results, Final report of the ESPRIT II project 2315 Translator's Workbench (TWB), Fraunhofer Society IAO and Mercedes-Benz AG, Stuttgart.

Höge, M., Hohmann, A., van der Horst, K., Evans, S. & Caeyers, H. (1993). User participation in the TWB II project - the first test cycle, Report of the ESPRIT II project 6005 Translator's Workbench II (TWB II), Mercedes-Benz AG, SITE and CEC Language Services, Stuttgart, Paris, Luxembourg.

Höge, M. & Kroupa, E. (1991). Towards the design of a translator's workstation - organisational background and user implications, in H.-J. Bullinger (ed.), Human Aspects in Computing: Design and Use of Interactive Systems and Information Management, 18B. Proceedings of the Fourth International Conference of Human-Computer Interaction, Stuttgart, Germany, Elsevier, Amsterdam, pp. 1036-1040.

Höge, M., Wiedenmann, O. & Kroupa, E. (1991). Evaluation of the TWB -- theoretical framework and practical application, Report of the ESPRIT II project 2315 translator's workbench (TWB), EC, Stuttgart.

Hohmann, A., Le-Hong, K. & van der Horst, K. (1994). User participation in the TWB II project - the second test cycle, Report of the ESPRIT II project 6005 Translator's Workbench II (TWB II), Mercedes-Benz AG and CEC Language Services, Stuttgart and Luxembourg.

Howden, W. (1980). Functional program testing, IEEE Transactions on Software Engineering 6: 162-169.

Humphreys, R. L. (1988). User-Oriented Evaluation of MT Systems. Working Papers in Language Processing: 16, Department of Language and Linguistics, University of Essex, December 1988.

Hutchins, W. J. (1986). Machine Translation, past, present, future. Ellis Horwood Limited, New York.

Ingria, R. J. P. (1989). Grammar Construction and Grammar Evaluation in the BBN Spoken Language System. Presented as a Tutorial at the Pre-Glow Working Days in Computational Linguistics, OST, Utrecht State University, April 3rd, 1989. BBN systems and Technologies Corporation, Cambridge, MA.

IOCU (1977). Comparative Testing Guide, IOCU Testing Committee IOCU, The Hague.

IOCU (1985). Guide to the Principles of Comparative Testing, IOCU Testing Committee IOCU, Penang.

Isabelle, P. & Bourbeau, L. (1988) TAUM-AVIATION: Its Technical Features and Some Experimental Results. in J. Slocum (ed), Machine translation systems. Cambridge University Press.

ISO (1991). International Standard ISO/IEC 9126. Information technology -- Software product evaluation - Quality characteristics and guidelines for their use, International Organization for Standardization, International Electrotechnical Commission, Geneva.

Jackson, M. (1995). Problems and requirements, Proceedings of the Second IEEE International Symposium on Requirements Engineering, York, England, IEEE Computer Society Press, Los Alamitos, California, pp. 2-9.

Jarke, M., Turner, J.A., Stohr, E.A., Vassiliou, Y., White, N. H. & Michielsen K. (1985). A Field Evaluation of Natural Language for Data Retrieval. IEEE Transactions on Software Engineering, SE-II(1): 97-113.

Jarke, M., Krause J.& Vassiliou, Y. (1984). Studies in the Evaluation of a Domain-Independent Natural Language Query System. Cooperative Interactive Information Systems, Springer-Verlag.

JEIDA, (1989). A Japanese View of Machine Translation in the Light of the Considerations and Recommendations Reported by ALPAC, USA. Japan Electronic Industry Development Association, Tokyo.

JEIDA (1992). JEIDA methodology and criteria on machine translation evaluation, JEIDA, Tokyo.

Jordan, P., Dorr, B. J. & Benoit, J.W. (1993). A First-Pass Approach for Evaluating Machine Translation Systems. Machine Translation, 8(1,2):49-58.

Jelinek, F. (1988). Evaluation of Grammar Quality. Distributed at the workshop on evaluation of natural language processing systems. Wayne, Philadelphia December 8-9, 1988.

Karat, C. (1990). Cost-benefit analysis of iterative usability testing, in D. Diaper, D. Gilmore, G. Cockton and B. Shackel (eds), Human Computer Interaction - INTERACT '90, Elsevier, IFIP, pp. 351-356.

Karlgren, H. (1987). Good Use of Poor Translations. Introduction. Forum Inf. and Docum., 12 (4): 23-29.

Kawada, T, Amano, S. & Sakai, K. (1980). Linguistic error correction of Japanese sentences. COLING-80, pp. 257-261.

Kelly, I. D. K. (ed.) (1989). Progress in Machine Translation. Natural Language and Personal Computers. Papers from the International Confrence in Machine Translation held by the Natural Language Translation Specialist Group of the British Computer Society at Cranfield Institute of Technology in February 1984. Sigma Press, Wilmslow, UK.

King, M. (ed.) (1987). Machine Translation Today. Edinburgh Information Technology Series 2, Edinburgh University Press.

King, M. (1989). A Practical Guide to the Evaluation of Machine Translation Systems. ISSCO, Geneva.

King, M. (1990). BABEL-Research: Auditor's Report. ISSCO, Geneva.

King, M. & Falkedal, K. (1990). Using Test Suites in Evaluation of Machine Translation Systems. Proceedings of COLING-90, Helsinki.

King, M. (1990). A Workshop on Evaluation: Background Paper. Proceedings from The Third International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Language, Linguistic Research Center, University of Texas at Austin, 11-13 June 1990, pp. 255-259.

Kingscott, G. (1989). Applications of Machine Translation. Study for the Commission of the European Communities. Praetorius Limited, for the CEC, Nottingham, UK. September 1989.

Klein, F. (1988). Factors in the Evaluation of MT: A Pragmatic Approach. in Muriel Vasconcellos (ed.) Technology as Translation Strategy, American Translators Association Scholarly Monograph Series II, State University of New York at Binghamton (SUNY), pp. 198-202.

Knowles, F. (1979). Error analysis of systran output - a suggested criterion for the 'internal' evaluation of translation quality and a possible corrective for system design. in Snell, Barbara M. (ed.) Translating and the Computer, North-Holland Publishing Company, pp. 109-134.

Krause, J. (1980). Natural Language Access to Information Systems: an evaluation study of its acceptance by end users. Univ. Regensburg, abt nichtnumer. Datenverarbeitung/regensburg 8400/DEU, Inf. Syst., ISSN:0306-4379 Vol. 5, No. 4, pp. 297-318.

Krauwer, S. (1993). Evaluation of MT Systems: A Programmatic View. Machine Translation, 8(1,2):59-66.

Kukich, K. (1992). Techniques for Automatically Correcting Words in Text. ACM Computing Surveys, 24(4): 377-438.

Lancaster, F.W., Rapport, R.L. & Penry, J.K. (1972). Evaluating the effectiveness of an on-line, natural language retrieval system. Grad. Sch. Libr. Sci., Vol, 8, No. 5. University of Illinois, Urbana, Illinois, pp. 223-245.

Laurian, A-M. (1984). Machine Translation: what type of post-editing on what type of documents for what type of users. COLING-84: Proceedings of the Tenth International Conference on Computational Linguistics, Stanford University, pp. 236-238.

Lawson, V. (1979). Tigers and Polar Bears. The Incorporated Linguist, 18(3).

Lawson, V. (ed.) (1982). Practical Experience of Machine Translation. North-Holland Publishing Company.

Leavitt, A. W., Gates, J. L. & Shannon, S. C. (1971) Machine Translation Quality and Production Process Evaluation. RADC-Technical Report-71-206, October 1971.

Le-Hong, K., Höge, M. & Hohmann, A. ( 1992). User's point of view of the translator's workbench, Translating and the Computer. Quality Standards and the Implementation of Technology in Translation. ASLIB, 10-11 November 1992 14: 25-31.

Lehrberger, J. & Bourbeau, L. (1988). Machine Translation. Linguistic characteristics of MT systems and general methodology of evaluation. John Benjamins Publishing Company.

Leick, J. M. & Schroen, D. (1978). Quelques résultats statistiques d'une évaluation sommaire du système de traduction automatique. Systran Information document, CETIL, CCE.

Lesmo, L. & Torasso, P. (1984). Interpreting syntactically ill-formed sentences. COLING-84: Proceedings of the Tenth International Conference on Computational Linguistics, Stanford University, pp. 534-539.

Levy, M. (1988). Implementation of a Computer-aided Translation Project at the Federal Government Translation Bureau in Canada, Presentation given at the 29th Annual ATA Conference, Seattle, October 1988.

Levy, M. (1989). Consolidating a Machine Translation Project at the Post-implementation Stage. Presentation given at the 30th Annual ATA Conference, Washington D. C., October 11-15, 1989. Secretary of State Department of Canada.

Lewis, D. D. (1988). Evaluation in Information Retrieval. Talk presented at the workshop on Evaluation of Natural Language Processing Systems, Wayne, Philadelphia December 8-9, 1988.

Lewis, J., Henry, S. & Mack, R. ( 1990). Integrated office software benchmarks: A case study, in D. Diaper, D. Gilmore, G. Cockton and B. Shackel (eds), Human Computer Interaction - INTERACT '90, Elsevier, Amsterdam, pp. 337-343.

Loffler-Laurian, A-M. (1983). Pour une typologie des erreurs dans la traduction automatique. Multilingua 2 (2):65-78, Mouton Publishers.

Macklovitch, E. (1989). Recent Canadian Experience in Machine Translation. in I. D. K. Kelly (ed.) Progress In Machine Translation. Natural Language and Personal Computers. Sigma Press Wilmslow, UK, pp. 59-67.

Macleod M. (1996) Performance measurement and ecological validity. in P Jordan (ed.) Usability Evaluation in Industry. Taylor and Francis, London.

Maegaard, B. (1997). Evaluation of Language Tools. Translating and the Computer, 19, ASLIB, London, 5p.

Manzi, S., King, M. & Douglas, S. (1996). Working towards user-oriented evaluation. Proceedings of the International Conference on Natural Language Processing and Industrial Applications (NLP+IA 96), Moncton, New-Brunswick, Canada, pp 155-160.

Maybury, M. T. (1990). Evaluation Spaces: A Framework for Evaluating Natural Language Generation Systems. AAAI-90 Workshop in Evaluating Natural Language Generation Systems.

Menzel, W. (1987). Automated reasoning about natural language correctness. Proceedings of the Third Conference of the European Chapter of the Association for Computational Linguistics, (EACL-87), University of Copenhagen, Denmark.

Miller, E. (1984). Quality managment technology: Practical applications, Software Validation pp. 255-266.

Miller, E. & Howden, W. (eds) (1981). Intorial: Software Testing and Validation Techniques, IEEE, London.

Miller, G. A. & Beebe-Center, J. G. (1958). Some Psychological Methods for Evaluating the Quality of Translations. Mechanical Translation, 3:73-80.

Minnis, S. (1993). Constructive Machine Translation Evaluation. Machine Translation, 8(1,2):67-76.

Moll, T. & Ulich, E. (1988). Einige methodische fragen in der ananlyse von mensch-computer interaktion, Zeitschrift fur Arbeitswissenschaft 42(2): 70-76.

MUC-3 (1991). Proceedings of the Third Message Understanding Conference (MUC-3), Morgan Kaufmann, San Mateo, CA.

Murine, G. & Carpenter, C. (1983). Applying software quality metrics, Proceedings from the ASQC Quality Congress, Transactions, Boston.

Musa, J., Iannino, A. & Okumoto, K. ( 1987). Software Reliability, Measurement, Prediction, Application, McGraw-Hill Book Co., New York.

Nagao, M., Tsuji, J. & Nakamura, J. (1988). The japanese government project, in J. Slocum (ed.), Machine translation systems, CUP, Cambridge.

Neal, A. & Simons, R. (1985). Playback: A method for evaluating the usability of software and its documentation, Proceedings of the Anniversary Meeting 1985, User Friendly Computing September 23-27, 1985, Vol. 2, pp. 1051-1075.

Neal, J. G., Feit, E. L. & Montgomery, C. A. (1993). Benchmark Investigation/Identification Project. Machine Translation, 8(1,2):77-84.

Nerbonne, J., Flickinger, D. & Wasow, T. (1988). The HP Labs Natural Language Evaluation Tool. Talk presented at the workshop on Evaluation of Natural Language Processing Systems, Wayne, Philadelphia December 8-9, 1988.

Nerbonne, J., Netter, K., Diagne, A. K., Klein, J. & Dickmann, L. (1993). A Diagnostic Tool for German Syntax. Machine Translation, 8(1,2):85-108.

Nirenburg, S. (ed.) (1987). Machine Translation. Theoretical and Methodological Issues. Studies in Natural Language Processing, Cambridge University Press.

Norman, D. (1985). Four stages of user's activities, Proceedings of Human-Computer Interaction -Interact'84.

Nuebel, R. (1997). End-to-end Evaluation in Verbmobil 1. Proceedings of Machine Translation Summit VI, San Diego, California, 29th October-1st November 1997.

O'Connell, T., O'Mara, F. & White, J. (1994). The ARPA MT evaluation methodologies: Evolution, lessons and further approaches, Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, U.S.A.

Orr, D. B. & Small, V. H. (1967). Comprehensibility of Machine-Aided Translations of Russian Scientific Documents. Mechanical Translation and Computational Linguistics, 10: 1-10.

Osterweil, L. (1984). Integrating the testing, analysis and debugging of programs, in H. Hausen (ed.), Software Validation, Amsterdam, North-Holland, pp. 73-93.

Paggio P. & Underwood, N.L. (1998) Validating the TEMAA LE evaluation methodology: a case study on Danish spelling checkers. Natural Language Engineering. Cambridge University Press, in press.

Pallett, D. S. (1988). Types of evaluation methodology. Talk presented at the workshop on Evaluation of Natural Language Processing Systems, Wayne, Philadelphia December 8-9, 1988.

Palmer, M. & Finin, T. (1990). Workshop on the Evaluation of Natural Language Processing Systems. Computational Linguistics, 16(3):175-181.

Pankowicz, Z. L. (1978). Facts of Life in Assessment of Machine Translation, CEC, Luxembourg.

Pankowicz, Z. (1967). Commentary on ALPAC Report ("Language and Machines; Computers in Translation and Linguistics"). Griffiss Air Force Base, Rome Air Development Center, New York.

Pfafflin, S. M. (1965). Evaluation of Machine Translations by Reading Comprehension Tests and Subjective Judgements. Mechanical Translation, 8:2-8.

Pigott, I. M. (1989). Operational Machine Translation System, iesnews, 21, Luxembourg.

Raghavan, V. V., Bollmann, P. & Jung, G. S. (1989). Retrieval System Evaluation Using Recall and Precision: Problems and Answers. Proceedings of the 12th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval (SIGIR89), pp. 59-68.

Rahmstorf, G. & Rabinovitz, R. (1993). Better writing through electricity, PC Magazine May 1993: 147-200.

Read, W., et al. (1988). Evaluating Natural Language Systems: A Sourcebook Approach. COLING 1988: Proceedings of the 12th International Conference on Computational Linguistics, Budapest, pp. 530-534.

Roudaud, B., Puerta, M., C. & Gamrat, O. (1993). A Procedure for the Evaluation and Improvement of an MT System by the End-User. Machine Translation, 8(1,2):109-116.

Roukos, S. (1988). Performance evaluation in speech processing. Talk presented at the workshop on Evaluation of Natural Language Processing Systems, Wayne, Philadelphia December 8-9, 1988.

Rowe, N. (1982). On some arguable claims in B Shneiderman's evaluation of natural language interaction with database systems. SIGMOD Record 13 (1):92-97.

Rushinek, A. & Rushinek, S. (1985) . Accounting and auditing software evaluation with knowledge based expert systems: An empirical multivariate model, Fourth Annual International Conference on Computers and Communications '85, Conference Proceedings, March, 20-22, 1985, pp. 250-254.

Russo, J. E. (1988). Information processing from the consumer's perspective, Proceedings of the International Conference on Research in the Consumer Interest, pp. 185-217.

Sager, J. C. (1979). Text quality and cost-efficiency of translation (some tentative suggestions for diversification of the translation effort). Information Paper for CETIL CCE.

Salton, G. & Buckley, C. (1990). An Evaluation of Text Matching Systems for Text Excerpts of Varying Scope, Technical Report no. TR~90-1134, June 1990, Department of Computer Science, Cornell University, Ithaca, N.Y.

Schmied, W.-S. & Winkler, H. (1989) . Software-Qualität. Ausgewählte Methoden und Werkzeuge der Softwareprüfung, Siemens-Schriftenreihe data praxis, Siemens, München.

Schuster, E. & Finis, T. W. (1985). VP2: the role of user modelling in correcting errors in second language learning. AISB-85 pp. 187-195.

Schuster, E. (1986). The role of native grammars in correcting errors in second language learning. Computational Intelligence 2(2):93-98.

Shinghal, R. (1982). An error correcting contextual algorithm for text recognition. Proceedings of the Fourth Biennial Conference of the Canadian Society for Computational Studies of Intelligence, pp. 66-70.

Shiwen, Y. (1993). Automatic Evaluation of Output Quality for Machine Translation Systems. Machine Translation, 8(1,2):117-126.

Silberer, G. (1985). The impact of comparative product testing upon consumers. selected findings of a research project, Journal of Consumer Policy 8: 1-27.

Sinaico, W. H.& Klare, G. R. (1971). Further Experiments in Language Translation: Readability of Computer Translations. Institute for Defence Analyses. Arlington, Va. August and December 1971.

Slage, J. & Wick, M. (1988). A method for evaluating expert system applications, AI Magazine 9.

Slocum, J. (1988). Evaluating Machine Translation Systems: a business viewpoint. Talk presented at the workshop on Evaluation of Natural Language Processing Systems, Wayne, Philadelphia December 8-9, 1988.

Slocum, J., et al. (1985). An Evaluation of METAL: the LRC Machine Translation System. Proceedings of the Second Conference of the European Chapter of the Association for Computational Linguistics, Geneva, pp. 62 - 69.

Slocum, J. (ed.) (1988). Machine translation systems. Cambridge University Press.

Sneed, H. (1987). Software-testen - state of the art, Software Entwicklungs-Systeme und Werkzeuge, 2 Kolloquium, 8-10, September 1987 .

Snell, B. M. (ed.) (1979). Translating and the Computer. North-Holland Publishing Company.

Snow, J. A. (1984). Research and Development: Programs and Priorities in a United States Mission Agency. in G Boggio et al (eds), Evaluation of Research and Development. Methodologies for R&D Evaluation in the European Community Member States, the United States of America and Japan. Proceedings of the Seminar held in Brussels, Belgium, October 17-18, 1983. D. Reidel Publishing Company, Dordrecht, pp. 95-114.

Steenkamp, J. B. E. M. (1989). Product quality: an investigation into the concept and how it is perceived by consumers. van Gorcum, Assen/Maastricht.

Sundheim, B. (1991). Overview of the third message understanding evaluation and conference, Proceedings of the Third Message Understanding Conference (MUC-3), Morgan Kaufmann, San Mateo, CA, pp. 3-24.

Sondheimer, N. K. (1981). Evaluation of Natural Language Interfaces to Database Systems: A Panel Discussion. Proceedings ACL 1981, p 29.

Sparck Jones, K. & Galliers, J.R. (1996). Evaluating Natural Language Processing Systems. Springer Verlag.

Sydeserff, H. A., Caley, R. J., Isard, S. D., Jack, M. A., Monaghan, A. I. C. & Verhoeven, J. (1991). Evaluation of speech synthesis techniques in a comprehension task. Eurospeech 91: Proceedings of the Second European Conference on Speech Communication and Technology, Genoa.

Tansley, D. S. W. & Hayball, C. C. ( 1993). Knowledge Based Systems Analysis and Design: A KADS Developer's Handbook, Prentice Hall, Englewood Cliffs, NJ.

TEMAA (1996). TEMAA Final Report, LRE-62-070. March 1996. (LRE: 62–070). Center fo Sprogteknologi, Copenhagen, Denmark. Electronic version also available from: http://www.cst.dk/temaa/D16/d16exp.html

Tennant, H. (1979). Experience with the Evaluation of Natural Language Question Answerers. Proceedings of the Sixth International Joint Conference on Artificial Intelligence, Tokyo, pp. 874-876.

Tennant, H. (1981). What Makes Evaluation Hard? Proceedings of ACL 1981, pp. 37-38.

Thaller, G. (1993). Qualitätsoptimierung der Software-Entwicklung. Das Capability Maturity Model (CMM), Verlag Vieweg, Braunschweig/Wiesbaden.

Thaller, G. (1994). Verifikation und Validation. Software Tests für Studenten und Praktiker, Vieweg, Braunschweig.

Thompson, B. H. (1981). Evaluation of Natural Language Interface to Data Base Systems Proceedings of ACL 1981, pp. 39-42.

Thompson, H. S. (1989). Evaluation of phoneme lattices: Four methods compared. Proceedings of the Workshop on Speech Input/Output Assessment and Speech Databases, European Speech Communication Association, Brussels.

Thompson, H. S. (1991). Automatic evaluation of translation quality: Outline of methodology and report on pilot experiment. in Kirsten Falkedal (ed.) Proceedings of the Evaluators' Forum, ISSCO, Geneva.

Thompson, H. S. (ed.) (1992). The Strategic Role of Evaluation in Natural Language Processing and Speech Technology. Technical Report, May 1992, University of Edinburgh Record of a workshop sponsored by DANDI, ELSNET and HCRC.

Thompson, H. (1994). TEMAA : A testbed study of evaluation methodologies : Authoring aids, Proceedings of the Language Engineering Convention, ELSNET, Paris, pp. 147-148.

Thorelli, H. B. (1979). The future for consumer information systems, in W. L. Wilkie (ed.), Advances in Consumer Research, Vol. 6, Association for Consumer Research, Ann Arbor, pp. 227-232.

Toma, P. and LATSEC, Inc. (1976). SYSTRAN '76: A Brief Description of the Status, Applications, Configuration, and Components of the SYSTRAN Machine Translation System. SYS/001/76/5, LATSEC, Inc. La Jolla, California.

Turk, C. (1984). A correction NL mechanism. ECAI-84 pp. 225-226.

Turner, J. A., Jarke, M. Stohr, E. A.,Vassiliou, Y. & White, N. H. (1982). Using Restricted Natural Language for Data Retrieval: A Plan for Field Evaluation. Presented at NYU Symposium on User Interfaces, May 1982.

Vainio-Larsson, A. (1990). Evaluating the usability of user interfaces: Research in practice, in D. Diaper, D. Gilmore, G. Cockton and B. Shackel (eds), Human Computer Interaction - INTERACT '90, Elsevier, Amsterdam, pp. 323-328.

Van Slype, G. (1978). Analyse des résultats de l'opération-pilote de pré-traduction automatique anglais-français, de janvier á mars 1978. Bureau Marcel van Dijk, CCE .

Van Slype, G. (1978). Note sur la méthodologie de notre deuxième évaluation de Systran anglais-français, Bureau Marcel Van Dijk, Bruxelles and CCE.

Van Slype, G. (1978). Second Evaluation of the SYSTRAN Automatic Translation System, Draft Report, Bureau Marcel Van Dijk, Bruxelles and CCE.

Van Slype, G. (1979). Critical study of methods for evaluating the quality of machine translation, Bureau Marcel Van Dijk, Bruxelles and CCE.

Van Slype, G. (1979). Evaluation de la qualité de la traduction automatique, Raport final sur Contract ML 9, Bureau Marcel Van Dijk, Bruxelles and CCE.

Van Slype, G. (1979). First evaluation of the SYSTRAN French-English automatic translation system of the Commission of the European Communities. Draft Report, CCE, Luxembourg.

Van Slype, G. (1979). Première évaluation du système de traduction automatique SYSTRAN anglais - italien de la Commission des Communautés Européennes. Rapport final sur Contract ML 9, Bureau Marcel Van Dijk, Bruxelles and CCE.

Van Slype, G. (1982). Conception d'une méthodologie générale d'évaluation de la traduction automatique. Multilingua 1 (4): 221-237, Mouton Publishers.

Vasconcellos, M. (ed.) (1988). Technology as Translation Strategy. American Translators Association Scholarly Monograph Series, Vol II, State University of New York at Binghamton (SUNY).

Watters, P.A. & Patel, M. (1998). The iterative semantic processing paradigm: A dynamical systems metaphor for machine translation. Technical Report C/TR 98-05, Department of Computing, Macquarie University, Australia. Electronic version also available from: http://www.comp.mq.edu.au/~pwatters/ctr-9805.pdf

Whittaker, S. & Stenton, P. (1989). User studies and the design of Natural Language Systems. Proceedings of the Fourth Conference of the European Chapter of ACL, (EACL-89), Manchester, pp. 116 - 123.

Whittaker, S. & Walker, M. (1989). Comparing two user-oriented database query languages: A field study, Technical report HPL-ISC-89-060, Hewlett Packard Laboratories, Bristol.

Wilks, Y. and LATSEC Inc. (1979). Comparative Translation Quality Analysis. Final Report. Contract F33657-77-C-0695, LATSEC Inc. La Jolla, California.

Wojcik, R. H., Harrison, P. & Bremer, J. (1993). Using bracketed parses to evaluate a grammar checking application. Proceedings of ACL93.

Woods, W. (1973). Progress in NLU -- an application to lunar geology, AFIPS Conference Proceedings 42, pp. 441-450.

Zarechnak, M. (1979). The history of machine translation in Bozena Henisz-Dostert et al.(eds) Machine Translation Trends in Linguistics. Studies and Monographs 11, Mouton Publishers, pp. 1-87.

Back to EAGLES-II Reference Page

Back to EAGLES-II Home Page