Bart Jongejan
I have worked as a software developer at CST since
1997. I have a degree in physics from Utrecht University, but apart from one year
as a gymnasium teacher, I have never worked
as a physicist. Yet I think physics is great,
and I would choose to study physics again if I was given the opportunity. As a branch
of science, physics is populated with people saying extremely weird things in all
earnest, far surpassing Harry Potter. Have a look at
this link, if it (and you) survive long enough.
Since the second half of the eighties I have mainly worked with developing software
that can handle natural language, which is fun because it always has an element
of unpredictability. The success of a program in Language Technology is almost always
measured in degrees, and it is great to deliver software that works, and works better
than anybody expected. This has happened a few times to me.
You can read my CV (English)
(Danish).
Software Development Projects (current and past):
(Private)
- Bracmat
-
Symbolic Computing.
Since I learned programming around 1980 I made attempts at writing a program for
doing Computer Algebra.
First in Fortran, then in Algol 60, Simula 67 and finally in Basic. I needed such
a program for doing lengthy symbolic computations in General Relativity. The first
working version was written in Basic for the
Amstrad 464, but in 1987 I became the proud owner of an
Acorn Archimedes 310 computer.
With the Archimedes and a real
ANSI-C compiler I was able to program Bracmat as a true 32-bit application
that could utilise all 4 megabytes of memory. I also made a 16-bit version that
run in 640 KB under DOS. Soon the focus on Computer Algebra shifted to doing
symbolic computing in general. So today I use Bracmat for some language technology
tasks at CST.
Have a look at Bracmat's documentation
that has been converted to HTML using Bracmat. I also have written a
quick introduction.
You can freely download Bracmat
. But be warned if you consider making changes to the source code: Bracmat's source
code is harder to understand than any other program I've written.
The name
Bracmat stems from a play by the Norwegian-Danish writer Holberg.
A Journey to the World Under-Ground
From this Land of Atheists, I travell'd
on over a steep Mountain to the City of
Bracmat, which was situated in the Plain at
the Foot of the Mountain. The Inhabi-
tants are Junipers. The first Person I met,
came directly rushing at me, and threw me
backwards. I did not well understand this,
and asking the Reason of it, the Juniper
begg'd my Pardon a thousand Times. Pre-
sently after, another with a Staff he had in
his Hand, gave me a Blow upon the Reins
that almost took away my Senses : But in
the same Moment he made a long Harangue
to me in Excuse of his Imprudence. Sus-
pecting, therefore, this People to be either
totally blind, or very weak-sighted, I took
Care to avoid every one I met. In fact, all
this arose from the exquisite Sense of Sight
which some are here endued with. They
can clearly discern remote Objects, which
are impenetrable to vulgar Eyes; but then
they do not see what is nearer and almost at
hand. These are call'd Makatti ; and they
devote themselves principally to the Studies
of Metaphysicks and Astronomy. They
are of very little Service in the World, by
reason of their too delicate Vision. They
make very pretty minute Philosophers ; but in
solid Matters and Things of daily Use, they
commit innumerable Blunders. However,
the Government makes some Use of them,
and sends them to the Mines for the Disco-
very of Metals. For tho' they see scarce
any Thing upon the Surface of the Earth,
their Sight exerts itself upon any Thing be-
neath it. I concluded from hence, that
there are some who are blind from too great
a Delicacy in the Organs of Vision, and
that they would see better if their Eyes
were worse.
AMEV insurance company
- Name recogniser
-
In my first job as a programmer I developed a program that would take a text string
containing the name of one or more persons, the name of a company, of a foundation
or a club, or a combination of these. This little piece of free text also could
contain one or more persons' initials and titles.
This project gave me a taste of Language Technology. At AMEV we used PL/1 as the
corporate programming language, although the actuarians that lived two floors higher
up used Basic. PL/1 is fine, especially compared to its closest competitor: COBOL.
Did you know that you can define a string with a negative length in PL/1? Excellent
for eating the last characters in a string: just concatenate your string with the
negative-length string. It reminded me of antiparticles in physics. Except that
concatenating a string and an anti-string doesn't release a devastating amount of
energy.
The program grew in a very fertile cooperation with a systems analyst who came with
the test data and pointed out cases on which the program could be sharpened. Together
we all the time pushed the quality to a higher level, resulting in a bigger reduction
of manual work than foreseen.
Utrecht University, Faculty of Humanities
- Celeste
-
A program for collating two texts word-by-word, jumping over text that has no counterpart
in the other text and able to jump backwards if a text fragment seems to have been
swapped with another.
The program runs under DOS, but in a graphical mode: CGA, EGA, VGA or Hercules.
It is almost my first C-program and reflects my being new to DOS. For example, I
had heard rumours of a program on one of the University's computer centre's computers
that could access the Brown Corpus and that program was "multitasking", so I thought
multitasking was the right thing. So Celeste multitasks. You can for example read
"help"-text while the program is tugging its way through the text in the background.
Or you can move a cursor (it has two cursors!) to a position on the screen and read
the small fragments of text using the cursor's x and y coordinates
as offsets in the two documents. The most glorious aspect of the program, though,
is that you can peek in what the program is thinking, because it sprinkles little
stars over the screen at places were a match of the two texts might be considered,
and removes them again where such considerations are discarded. The name of the
program is derived from this attention-heightening effect.
- Iconclass browser
-
Iconclass is a subject-specific classification
system that only existed as a series of books. I developed the backend software
for accessing the classification tree. It has been in use many years after I set
the last semicolon on Friday 30 November 1990, the day before I moved to Denmark.
CRI, Computer Resources International (Denmark)
- SIMPR
-
Structured Information Management: Processing and Retrieval.
I rewrote Fred Karlsson's Constraint Grammar Parser in C++. My version is called
the "academic version", which seems flattering, but isn't. I have read that the
"production version", written sometime later by Pasi Tapanainen, is several times
faster. So either this guy has taken corners somewhere or he simply has outsmarted
me. I'm afraid the latter is the case. But I am quite happy with the fact that my
version is a few times faster than the original version, which was in LISP.
- KAVAS-2
-
Knowledge Acquisition, Visualization and Assessment System.
My introduction to Windows programming. I took care of integrating all project partners'
Windows programs into one application, so that the user had the feeling of interacting
with just one single program with a multiple document interface.
CST, Copenhagen University
- Scarrie
-
Scandinavian Proofreading Tools.
- TransRouter
-
TransRouter is a management tool that will assist translation managers to decide
the best approach by which to carry out their translation projects. For this project
I developed the repetitiveness
checker.
- STAGING
-
Multimodal Communication in a Virtual Farm
For this project I developed, among other things, the communication manager, which
keeps track of the dialogue that is going on between the user and the virtual agent
on the screen, the farmer. Staging supported speech , touch screen and a data glove
as inputs and shows a farm in a very simple graphical interface. The communication
manager was mainly written in Bracmat, the programming language described above.
- TQPro
-
Translation Quality for Professionals.
Once more I used Bracmat, this time to do a partial parse of a POS-tagged text in
order to find constructs that are notoriously difficult to handle by machine translation
software.
- Lemmatizer
-
CST's lemmatizer is an
example of brute force Language Technology. You don't provide the lemmatizer with
a list of lemmatization rules. Instead you let the lemmatizer deduct rules from
a full form word list that maps full forms onto the corresponding lemma form (e.g.
lasts -> last or children -> child).
Originally developed for Danish, the program is also used for a number of other
languages: Greek, English, Swedish, Norwegian, Icelandic and German. Theoretically
the lemmatizer's algorithm is not good for German (and Dutch), though - at least,
that was the situation before the
- Tvärsök 2
-
project, where I developed a new training algorithm that handles prefixes, infixes
and suffixes alike. For most inflected languages for which plenty of training data
are available the results are quite good.
- MELFO
-
(Mobil e-Læring for Ordblinde - Mobile e-Learning for dyslexics)
At last a Language Technology project for a PDA-platform. My task was the implementation
of the software for accessing a bilingual term base.
- MELFA
-
(Mobile E Learning For Africa)
An exciting Danish - South African initiative offering Mobile solutions for Literacy
Training and Skills Development.
- DK-CLARIN
-
(Common Language Resources and Technology Infrastructure)
I am interested in the foundations of physics. As part of my physics study I looked
at whether or not definitions of distance in cosmology allow for additivity (they
do) and at the direction of time (A-time) and the difference between past and future
(B-time) and the roles these concepts play in physics (almost none). In 1978 I followed
a course about the foundations of Quantum Physics. During the examination following
the course I could not explain Bell's proof of the non-existence of local hidden
variable theories that reproduce the predictions made by Quantum Mechanics. I got
stuck in that part of Bell's reasoning where he introduced counterfactual preparations
of measuring instruments. How can you do that without making assumptions about the
frame of reference from which to derive the coordinates of the counterfactual set-ups?
My work with cosmic distances, from which I had learned that one has to handle with
utmost care frames of reference in curve space-time, was still very fresh on my
mind.
So I became hooked to the interpretation
of the formalism of quantum mechanics. Since that summer of 1978, I have not
been able to get rid of my doubts regarding Bell's proof. My main point of criticism
is that Bell's reasoning, which now has become mainstream, makes tacit but unwarranted
assumptions about the backcloth of the
Einstein-Podolsky-Rosen experiment (also known as the
Bohm-Aharonov experiment), space time. As a counterexample of a local hidden
variable theory that reproduces QM I have developed a very primitive model of a
spin ℏ/2 particle as a space time metric structure.
It is this model of a particle with spin that spurred my interest in Symbolic Computing
and that made me write Bracmat.
I am still following developments related to the issue raised by Einstein, mostly
by a quick daily check of http://arxiv.org/archive/quant-ph.
If you like, you can read my internet page that is dedicated to the EPR-experiment
and Bell's proof, the
final EPR experiment, which starts with a fictive experiment at a
truly galactic scale.
Links on arXiv.org:
- Space-Time Structure as Hidden
Variable. You can also download
this paper in a two-column format that includes the hi-res figures (stereograms!
Don't cross your eyes, stare: left picture = left eye, right picture = right eye.).
- On Bell's Paradox
Publications
See my Curriculum Vitae.
- Last Modified: 12 November 2008
Bart Jongejan
Center for Sprogteknologi,
Københavns Universitet,
Njalsgade 80,
DK-2300 Copenhagen S,
Denmark
bart at cst dot dk