A possible metric for Machine Translation?
During the Moncton conference, a few of us were discussing the
evaluation of machine translation systems.
One thing that occurred to me is that, in practice, comparison with
human translation, on which many rather shaky metrics are based, is
usually irrelevant and just tends to confuse the issue. Often, all one
is really interested in is whether the translation output is good enough
for some specific purpose.
However, it is also quite difficult, and perhaps costly, to set up a
valid evaluation based on whether the output allows a certain task to be
accomplished. And when it can be done, it's not exactly maximally
informative, since it only tells you about that specific task and gives
you no real clues about other tasks. (John and Kathi, this is an open
invitation to throw stones!)
So I was wondering if there might not be some feature (quality
subcharacteristic in ISO jargon) of MT systems that was just a minimal
indicator of quality, was pertinent to most applications/uses of MT and
could be tested relatively easily.
One that springs to mind is preservation of truth values. I'm saying it
that way rather than saying anything grandiose like "fidelity" because I
really mean just whether positives stay positive and negatives stay
negative. If "The candidate assured the meeting that he would not agree
to a rise in taxation" gets translated as "The candidate promised not to
raise taxes" truth values are preserved, even though one could argue
about whether the translation was faithful. And I want to go for the
more modest truth-preservation because I am not at all convinced that
one could find a good metric for fidelity as such. (And am prepared to
argue the point if necessary, but not now).
It might be quite easy to set up a valid metric for truth preservation.
Using a population which is standardized for a reasonable level of
intelligence, ask them to read the translation and then answer yes/no
questions of the sort "Did he say he would raise taxes?"
Reasonably cheap to set up, doesn't involve knowing the source language,
reasonably easy to validate as a metric.
Comments anybody?
Maghi
--
Please note my new e-mail address (old address was king@divsun.unige.ch)
Maghi King | E-mail: Margaret.King@issco.unige.ch
ISSCO, University of Geneva | WWW: http://issco-www.unige.ch/
54 route des Acacias | Tel: +41/22/705 71 14
CH-1227 GENEVA (Switzerland) | Fax: +41/22/300 10 86