A Knowledge Base and Question Answering System Based on Loglan and English
by
Sheldon Linker
A dissertation submitted to the Graduate Faculty of
Auburn University
in partial fulfillment of the
requirements for the Degree Doctor of Philosophy
Auburn Alabama
May 9, 2011
Keywords: knowledge base, data base, artificial intelligence
?2005-2011 by Linker Systems, Inc.
Some material ?1982-1995 by The Loglan Institute, Inc. (used with permission)
Patent pending
Approved by
Cheryl Seals, Chair, Associate Professor, Department of Computer Science and Software
Engineering
David Umphress, Associate Professor, Department of Computer Science and Software
Engineering
Sviatoslav Braynov, Assistant Professor, Computer Science Department, University of
Illinois at Springfield
? ? ii
Abstract
One of the "holy grails" of computational linguistics has been to have a machine
carry out a conversation, and to have some idea of what it is talking about. Loglan's
(Brown, 1960 & 1975) machine grammar (Linker, 1980) was a first attempt to carry out
such a project using a grammar which was unambiguous, yet able to encompass the
whole of human discourse. Writing a logical, speakable language, with a SLR-1 (simple
left-to-right parsing, with one look-ahead) grammar, and then reducing that to a
functional form results in a language which is hard to use for spoken logic, and is hard to
translate into. A more useful way to go is to use the symbols of predicate, first-order
logic, second-order logic, and higher-order logic, to use the word-classes of Loglan, to
build a functional form from those in combination, and then to work backward from such
a functional form to a speakable language, as much like English and Loglan, in priority
order, as possible. Such a language is feasible, speakable, understandable, and useful
(Linker, 2007). The result was the JCB-English language.
The thesis presented herein is that JBC-English can be improved by a number of
means, making the language easier to learn and speak, more concise, and faster to
process. The research and development projects detailed herein are to produce an
improved version of the language, and the language processing system, which can be
effectively used for human and machine discourse, and a demonstration system, which
converses in this language, in such a way as to be useful in business and academia.
? ? iii
Acknowledgements
First and foremost, I'd like to thank my wife, who convinced me that I could do
this.
Just as important, I'd like to refresh the memory of Professor James Cooke
Brown, who started this project in the year I was born.
Last, I'd like to thank the informative and encouraging faculties of Thomas Edison
State College, the University of Illinois at Springfield, and Auburn University.
? ? iv
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Other Research into the Sapir-Whorf Hypothesis . . . . . . . . . . . . . . . 6
Logical Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Patents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Question Answering Systems and Related Programming . . . . . . . . . . . 11
Specifically Prolog-Based Systems . . . . . . . . . . . . . . . . . . . . . . . 16
Prolog and the Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
The Loglan Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Other Grammar Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
My Thesis, Put Simply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
The Work Done . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
The Research Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
? ? v
Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
User-Interface Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Grammatical Improvement over previous JCB-English . . . . . . . . . . . . 27
New Logic Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Lexical Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Better Proof of Correctness and Speed than available in the Previous JCB-? 31
Additional Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Performance Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Veracity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
The Choice of Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Knowledge Storage Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
? ? vi
Server Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Speakability Improvement Comparisons . . . . . . . . . . . . . . . . . . . . 73
The Final Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
The language definition, in YACC . . . . . . . . . . . . . . . . . . . . . . . 76
The language definition, in BNF, with explanation following . . . . . . . . . 80
Explanation of the BNF grammar . . . . . . . . . . . . . . . . . . . . . . . 82
Acceptance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Tests that Cannot be Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Tests that were Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Completion Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Research and Development . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Appendix I ? Institutional Review Board Approval . . . . . . . . . . . . . . . . . 123
Appendix II ? A Loglan Primer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Appendix III ? Transformation of Loglan Into Functional Form . . . . . . . . . . . 126
The Starting Point: Loglan Grammar . . . . . . . . . . . . . . . . . . . . . 126
Changes Already Made . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Appendix IV ? Previous Redesign of the Language . . . . . . . . . . . . . . . . . 137
The Semantics of Predicates . . . . . . . . . . . . . . . . . . . . . . . . . . 154
? ? vii
Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
? ? viii
List of Tables
Summary/Comparison with other systems . . . . . . . . . . . . . . . . . . . . . . . 22
Pretty Little Girls School examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Statements and Results (Conversations) . . . . . . . . . . . . . . . . . . . . . . . . 40
Test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Loglan Translations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
The Symbols of Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Items Inherited, or not, from Loglan . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
? ? ix
List of Figures
Proof flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
An Overview of Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
? ? x
List of Abbreviations
BNF Backus-Naur Form
FAQ Frequently Asked Questions
JCB James Cooke Brown
KBMS Knowledge-base management system
PLGS The "Pretty Little Girls School" ambiguity problem
QA Question Answering system
SLR-1 Simple left-to-right parsing with a single look-ahead
SQL Structured (or Standard) Query Language
SVO A Subject-Verb-Object(s) sentence
VO A Verb-Subject-Object(s) sentence or fragment
YACC Yet Another Compiler Compiler
? ? 1
Chapter 1 ? Introduction
Since the concept of machine intelligence was first made popular in English-
speaking countries by Karel Capek (1920) in the play Rossum's Universal Robots, people
have been interested in the possibility of conversing with machines. In a seemingly
unrelated development, Sapir and Whorf are said to have developed a psychological
hypothesis, that "there may be a linkage between the language one speaks and one?s
patterns of thought. " (Beeman, 1987). James Cooke Brown (1960 & 1975) invented the
Loglan language as a tool to investigate the Leibnitz conjecture by testing the Sapir-
Whorf hypothesis (see below). Loglan was to be a language which was complete enough
to express every thought-form expressible in every human language, have a sound set
pronounceable by everyone, be adjustable to test parts of the hypothesis, and be totally
unambiguous.
It appears that Dr. Brown's Loglan article in Scientific American in 1960 was a
turning point in linguistics. Dr. Brown, in his article, cites Leibnitz as the instigator of
this line of work:
The central notion underlying Leibnitz's vision may be stated in a question.
Is it true that the "rational power" of the human animal is in any significant
measure determined by the formal properties of the linguistic game it has
been taught to play?
Many years later, Dr. Brown, Michael Urban, and this author started an association aimed
at making Loglan provably unambiguous, and to create a SLR-1 (which stands for
"Simple left-to-right parsing with one look-ahead") machine grammar for Loglan
? ? 2
(Linker, 1980). Since then, others have continued this line of research (Brown, 1999) ?
both at the Loglan Institute, with its Loglan language, and the "rebel" offshoot, the
Logical Language Group (2007), with its competing language, Lojban ? but have gone
only as far as verifying and graphing the grammar of this human-spoken language. The
Loglan institute and the Logical Language Group were both effectively working on the
same research project, but had differences of opinion. Dr. Brown, the original principal
investigator had one idea of where the research and design of the language should be
headed, and those who left to form the Logical Language Group had a differing opinion.
This author had (and maintains) yet a third opinion, but did not bring it to the public light
during Dr. Brown's lifetime.
What of the Sapir-Whorf hypothesis? This author saw three significant effects
from learning Loglan and its grammar: (1) The ability to learn languages better and more
quickly, (2) the ability to work out a greater variety of problems without benefit of paper,
and (3) the sometimes-unfortunate ability to see dozens of ambiguous meanings in what
others see as having exactly one reasonable meaning. The best example of this sort of
ambiguity is what Dr. Brown called the "Pretty Little Girl's School" problem, in which he
points out that the reference could be to a school for girls who are little and pretty, or a
pretty school for little girls, or 24 other meanings, including such meanings as a school
owned by a very small girl.
What about communications with computers? The basis of this author's research
was designed to use the predicate calculus basis of Loglan to create the foundation of a
knowledge base integrated with a logic inference engine that answers questions directly
from its knowledge base or, when required, make logical inferences and answer
? ? 3
questions. The previous research design was limited with respect to a design to deal with
a "back end", which communicates in a language of its own. At this point, the JCB-
English system can sustain a conversation with a person, build a representative
knowledge base, understand the interrelationships, and provide answers to questions.
In this system, a full English-language conversation with the computer involves
the following steps:
? For some time to come, the person conversing with the computer will formulate
statements, commands, and question in English. Example: "I like my cat."
? The user then translates these items into the controlled English described herein, known
as JCB-English. There are many similarities between JCB-English and naturally spoken
English; therefore there is little translation to be done. Example: "i like my cat".
Whenever JCB-English is used herein, it will be shown in the Chalkboard font (as
shown), to distinguish it from English.
? The JCB-English is currently entered into the system via keyboard input, but the
program is built to operate as a service. Operating as a service, the C program accepts
request packets from various front ends, one of which will be a web interface written as
a Java Servlet.
? A parser then translates the language into a data structure, which directly represents the
entered utterances.
? Rules of transformation then restructure the representation of the utterance into a more
basic logical structure in some cases of grammar detailed below. Example: "i like my
cat", when spoken by a user named "sheldon", internally becomes "both "SHELDON"
*OWN 1 *CAT// and "sheldon" *LIKE 1 *CAT//"
? ? 4
? Next, certain optimizations may take place to prune the data structure. Example:
Changing "both x and true" to "x". Commands are executed at this point.
? Statements and questions are evaluated by a knowledge-base management system
(KBMS). This results in statements and questions being rejected as false, rejected for
storage because the statement is known to be true, answered, or stored. A series of
provers is used. The Instant prover evaluates a statement to be possibly true or false on
its own. The Fast prover looks for simple facts that directly prove, disprove, or answer
a statement. The Slow prover performs a more complete proof.
? Lastly, the result is translated into JCB-English.
In this author's previous research for this work, a reformulation of the grammar of
Loglan into an even more formal form was made, to prepare for the description and
implementation of a single-user, functional-form conversational knowledge base.
The JCB-English system was produced. As with most prototypical systems,
certain aspects of the language and system providing the language were observed. This
paper serves to correct those inadequacies, and to provide the speed, functionality, and
ease of use that will make JCB-English both a usable tool and a basis for even further
investigation and improvement.
? ? 5
Chapter 2 ? Literature Review
Introduction
In reading about question-answering systems, it seems that most of the work
concentrates around fitting data into an existing slot, as is the case in most database
systems, with manuals and descriptions too numerous to mention, or with finding text
containing the data, deciding which are likely to be relevant, and then delivering a list of
such documents, delivering the most likely text, or even trying to construct an answer
based on the likeliest data. It seems that there is almost no information, though, on the
idea of taking an unclassified datum, and storing it into a knowledge base, in a self-
defining manner. For instance, one can store "Patrick Henry said 'give me liberty or give
me death,'" as plain text, but the likelihood of a system making enough sense of it to
make it useful is slim. If one takes a little effort to translate it to something more formal,
"Patrick Henry desires that the British give him either liberty or death", or, in JCB-
English, ""patrick henry" desire the event "britain" give "patrick henry" the
event either "patrick henry" free or "patrick henry" dead".
There is much to be said, too, about the artificial formal languages, such as
Loglan and Lojban. Unfortunately, no real-world use has been made of these languages.
On the subject at hand, conversing with the computer in speakable formal
language, there is virtually nothing to be found. As Isaac Newton is oft quoted, "?I stand
on the shoulders of giants." I stand there only that I may take a flying leap into new
? ? 6
territory. Below, the literature review is begun in sections: Patents, Question Answering
Systems, and Languages.
Other Research into the Sapir Whorf Hypothesis
One of the best examples of the Sapir Whorf hypothesis was shown in
experiments by Phillips & Boroditsky (2003). In the experiments, they showed that,
given pictures of various objects, people who spoke (in addition to English) a language in
which the object has a female "gender" in their native language saw the object has having
female qualities. Similarly, people who spoke a language in which the same object has a
male gender in their native language saw the object as having male qualities. Thus
showing that language and thought are closely related, and in this type of case, the native
language does indeed drive certain thought processes.
The Sapir-Whorf hypothesis is also being investigated as it relates to teaching.
Gao (2008) points out that since culture and language are intermixed, they should both be
part of a foreign language course, as language won't come easily without culture.
Logical Languages
The subject of machine conversation with people has been rampant in fiction.
However, true conversation on a meaningful basis with a computer requires both a formal
language and a machine understanding of context. The idea of a speakable formal
language was started by Prof. James Brown of the University of Florida, and described as
an article in Scientific American, in 1960 (Brown, 1960). Later, books on the subject were
published (Brown, 1975 and 1978), and became the subject of web-based publication and
update (Jennings, 2006). Rather than being the culmination of a research project, things
had only begun. Other projects, such as Lojban (LeChevalier, 2006) and Guaspi (Carter,
? ? 7
1991) branched out. Despite grand plans after the first machine grammar was delivered
(Linker 1980 and 1981), there had been many half-attempts (Goertzel, 2005), but a
machine conversationalist, running a limited subset of Lojban finally was written (Speer
& Havasi, 2004). This was a great advance, but still had its limitations, in that full Lojban
was not yet supported, and (of course), the common English speaker would not be able to
make use of the system. It is also pleasing to note that the Lojban-speaking program is
happy with its work, as is evidenced by its use of the word "ua". Proposals for a language
partway between Lojban and English, such as "Lojban++" have also been proposed
(Goertzel, 2006).
This paper makes heavy use of the interrelatedness of Loglan, thought, and
higher-order logic. An excellent overview of all of these, plus the Sapir-Whorf hypothesis
appears in L?gica y Lenguajes (Logic and Languages, Laboreo, 2005).
Relatively recently, Norbert Fuchs and his colleagues (Fuchs, et al., 1999) at the
Institute for Information at Zurich University have developed a number of schemes, each
layered on another, to bring English to usability in logic. They have taken English, and
applied a large number of rules against it, and limited it to a subset they call Attempto
Controlled English, or ACE (Fuchs, et al., 1999). ACE is a limited subset of English. It is
very powerful for a limited subset. The description of the subset takes the form of a
manual, part grammar lesson and part programming manual. Attempto Controlled
English follows English as closely as its creators could manage; thus there are a very
large number of context rules. This is a very strong point in ACE, in that anyone can read
ACE; however, the weak point is strong reliance in context rules, and the inherent ability
to get into trouble with a misplaced phrase. Given their language specification, the
? ? 8
authors next set out to develop the specifications for a Reasoner for ACE, or RACE
(Fuchs, et al., 2002). A thesis project was done by Hirtle (Hirtle, 2006) bringing these
components together. Unfortunately, ACE has no usable query language at this time, and
is limited to the description of facts. Hirtle's project and this one have the possibility of
being used together in the future, in that there is a possibility of compiling Attempto into
statements acceptable to this project's language or data-store. If so, this project's query
language might one day be used to extract information from the resulting, combined
knowledge base.
Patents
Some of the patents claiming to involve "universal languages" actually pertain to
internal computer coding, and not spoken languages, such as an intermediate compile-to
language (Goss et al., 1987), or suggested data structures for storing data for later
analysis (Jung, 2005).
There are also patents and patent applications which claim to disclose new
information or teach new methods, but do not, even though they have intriguing
introductions. These include universal language parsing, in which we are told how to
write a compiler (Bralich et al., 1999), a "cognitive processor" of knowledge bases or
which understands information (Stier & Haughton, 2002; Suda, 1994; Suda &
Jeyachandran, 2003), a logical agent (Jowel & Kessock, 2006), answering in English
(Chang, 2006), and even how to build your own fully functional android (Datig, 2002 and
2005).
There are a number of descriptions of question-and-answer methodologies, in
which the user presents a command or query, and the computer responds with a series of
? ? 9
questions of its own to narrow down the exact nature of the initial command or query.
These include pattern-matching to determine the information content and question, much
like a game of Twenty Questions (Schramm, 1987; Yano et al., 2002; Matheson, 2006;
Zhang & Yang, 2002).
Many patents discuss methods of speeding up processing, some more obvious
than others. These include parallelism (Dixon et al., 1992), optimization of the
information store (Kautz & Selman, 1993), and hashing (Miller et al., 2002; Brassell &
Miller, 2003). In a similar vein, several patents (Eldredge et al., 2004; McConnell &
Barklund, 2006; Spiegler & Gelbard, 2002) describe indexing systems that can be applied
to existing textual or other forms of data, for quicker retrieval.
A very common method of using natural language is to apply grammatical rules,
or picking out sentence fragments or key words to use the utterance or writing to serve as
the basis for a formulation of a query, SQL or otherwise, which is then used to query a
data base or table (Lamberti et al., 1994; Schwartz, 1993 and 1998; Machihara et al.,
2001; Wyss et al., 2002; Hsu & Boonjing, 2002; Metcalf & Dingus, 2004; Sheu &
Kitazawa, 2004; Nakamura & Tate, 2005; Ejerhed, 2006; Rosser & Sturges, 2006; and
others too numerous to mention) or to perform some command activity (Firman, 1994;
Namba et al., 1996; Salanha at al, 2004; Hogenhout & Noble, 2006; Diederiks & Van De
Sluis, 2001; Ross et al., 2002 and 2005; Fain & Fain, 2002; Beauregard & Armijo-
Tamez, 2006; Dusan & Flanagan, 2002). A similar technique is to verify that the
utterance or writing matches a preformulated query template, and then to retrieve query
elements from the matching template zones (Appelt, 2003; Harrison et al., 2003; Agarwal
& Shahshahani, 2004; Williams & Hill, 2005).
? ? 10
Many systems retrieve information, pulling either entire documents or facts from
the documents. This searching can take place based on noun, keyword, or phrase
matching (Fujisawa et al., 1991, 1995, and 1996; Haszto et al., 2001; Ho & Tong, 2002;
Brown et al., 2003; Fung et al., 2004; Ejerhed, 2006; Tsourikov, 2002), actual grammar-
fragment matching (Kupiec, 1996 and 1997; Ford, 2003), statistical likelihoods of having
meaningful data (Ahamed, 1998), or a plurality of these techniques (Weber, 2002;
Scheneburg et al., 2002; Brody, 2004; Bennett, 2005). An adjunct method to these,
reading documents, placing search tags in them, and coming back for matching tags later
(Kasravi & Varadarajan, 2006; Pustejovsky & Ingria, 2001) is also described.
There are also a number of inventions dealing with presentation, allowing the
computer to present an improved user interface (including drawing faces [Guo et al.,
2003]) to humanize the conversation. This includes "human-like responses" (Armstrong,
1998; Hagen & Stefanik, 2005), emulation of an understood system, similar to the Eliza
program of the 1970s (Klipstein, 2001), text to speech (Epstein, 2002; Kobayashi et al.,
2004; Wang, 2006), and keeping the dialog on track (Coffman, 2003). As an alternative
approach, research has been done towards using more natural input. There are an ever-
increasing number of products that perform speech recognition (Gould et al., 1999;
Strong, 2001 & 2004; Romero, 2002; Bangalore et al., 2003; Wang et al., 2004). The use
in speech recognition would be nice in a follow-on to this project, but is not required.
Some systems (Moser et al., 2001; Sukehiro et al., 2004) rather than storing,
understanding, or retrieving information, simply translate it to another language.
One system (Hawkinson & Anderson, 2004) uses a tree-structured set of classes,
in which numeric class IDs exist in ranges, so that, for instance, Dog might have an index
? ? 11
in the Mammal range, and Mammal might have an index in the Animal range (as would
Dog). However, full prepositional logic is impossible in this sort of configuration. Here is
a simple example: Consider Charles, Prince of Wales. He is a member of the classes
Men and Royalty. Any simple indexing scheme using numbers will not do. Prince
Charles cannot be assigned any numeric index, which will be both numerically in the
Men range and in the Royalty range. The current project, in contrast, uses logic reasoning
statement to derive such knowledge. For instance (shown in the English equivalent) "A
prince is the son of a king. A son is a male child. A king is royalty."
One system (Tunstall-Pedoe, 2006) uses "objects", which are for the most part
nouns or nominative phrases, but which may also be verbs or verb-like phrases. Objects
can be grouped to form facts, and facts are queryable. This sort of system can find facts
and negative facts quickly, but cannot reason out more complex problems.
Virtually all patent work dealing with question answering systems in the last few
years deals with delivering pages, paragraphs, or sentences from a library of purportedly
factual documents, rather than formulating answers from the facts themselves.
Question Answering Systems and Related Programming
In their review, Andrenucci and Sneiders (2005) point out a list of approaches
being used in question answering systems:
? Natural Language Processing maps user questions into a formal world
model and ensures the most reliable answers.
? Information Retrieval powered QA, together with NLP, focus on fact
extraction from large amounts of text.
? Template-based QA matches user questions to a number of templates
that cover often queried parts of the knowledge domain.
Despite these trends, this project will be using near-natural language data and queries.
? ? 12
The Pegasus language processor (Kn?ll & Mezini, 2006) uses natural language
(currently English and German), and interprets what is said as a series of imperatives. It
translates those imperatives to Java. The grammar parsing, of course, is dependent on the
input language. Within Pegasus, the basic unit is called an "idea". All in all, the result
looks much like COBOL.
The idea that there is a need for a special information retrieval language, perhaps
based on Loglan (Ohlman, 1961) or a logical subset of English (Cooper, 1964), is not
new. Indeed, these were proposed over 40 years ago.
There have been a number of discussions on the idea of retrieving documents, or
fractions thereof, based on input questions (Cardie et al., 2000; Radev et al., 2001; Brill et
al., 2002; Ramakrishnan et al., 2003; Sekine & Grishman, 2003). Some take the
additional step of allowing the user to narrow down the responses (Small. 2003), or by
template (Srihari, Rohini & Li, 2000) or pattern (Roussinov & Robles, 2004) matching,
or even reformulating the question to use other such systems as agents (Agichtein et al.,
2001 and 2004). Although there are many more articles in this category, the same data
tends to repeat, so no further mention of such articles will be made herein.
In the QA1 system, first-order predicate calculus was used to deal with list-
controlled data (Green & Raphael, 1968). List-controlled data is likely a good idea, since
the true structure and interrelatedness of the data (as relevant) will not be known until the
query is issued. However, it seems unlikely that first-order logic would be sufficient for
any but the most basic questions. QA1 has the advantage that is can quickly categorize
data, and thus search quickly, but the disadvantage of needing to categorize data at input
time. The language itself is very Lisp-like. A very similar system, MRPPS (Minker,
? ? 13
1977) uses a more traditional descriptive technique. In a latter paper (1978), Minker
describes, in great detail, how one might go about writing a theorem prover into an
analysis engine. Since Prolog is available, this project will use Prolog for the time being,
but replacing Prolog with Minker's method would yield far greater control in tuning
performance. There are also methods whereby some rules can be quickly excluded in
such a prover (Joshi, 1977). Yet another system of this nature was created by Furbach et
al. (2008), but with the typical limitation of first-order logic, there is still much to be
desired.
Some systems (Reiter, 1977; Waltz, 1978; Kang, 2002) attempt to retrieve data
from a relational data-base system using natural language queries.
Similarly, but perhaps more clever, one system, QuASM, gathers data from
tabular information found on the web (Pinto, 2002). Such a technique could be used to
drop data into a strict-grammar system, in that a crawler could gather the tables, and a
template could be entered manually for the data, thus effectively adding all of that table's
data to the knowledge base. The technique is not a part of the current project, but could
be used to enhance future versions.
It has been pointed out (Lita & Carbonell, 2004) that entering data into a
question-answering system can be a time-consuming enterprise, and that the data will
thus be limited. A method of gaining large amounts of reliable data is proposed. While
this technique is not being incorporated into the current project, automatic data gathering
could be a useful, later addition. A similar technique, based on similarities
(Ramakrishnan, 2004) is also possible.
? ? 14
One innovative system, Cogex (Moldovan, 2003), transforms front-end natural-
language queries into a logical first-order predicate form. It does the same with back-end
documents. It then uses a theorem prover to find substantial matches, and then returns the
source sentence.
The writing of theorem provers has been proposed a number of times, in a number
of different ways. However, Prolog is cited as being useful directly as a theorem prover
(Loveland, 1986), albeit with a little work. One important source of Prolog techniques, as
well as a good list of pitfalls to avoid (such as ways a logic specification can infinite-
loop) appears in Prolog Programming for Artificial Intelligence (Bratco, 2001).
If a system is going to answer questions, then it needs to be trustworthy, or at least
report the trustworthiness level of its data, as Chen et al. (2006) point out. This is true
whether questions are answered by an automatic system, or by a person. It is for this sort
of reason that this project will involve rating the veracity of the information and its
answers.
Obviously, if the question-answering system can parse English (or some other
source language), then it will be fairly precise. Hao et al. (2006) show how to reasonably
parse a certain subset of English questions, giving reasonable answers to the most-often
type of questions asked. This shows, more by omission than anything else, that a precise
language syntax is required for truly precise answers. That is the reason that the current
project first attempted the use of Loglan, and has since switched to Loglan-like English.
For the purposes of database retrieval, very limited subsets of English have been
proposed as a query language (Bernardi et al., 2007).
? ? 15
Some of the question-answering systems are really just FAQ systems, saving pairs
of questions and their answers, and either answering directly, or answering a question-
answer pair when the new question and the saved question are close enough (Wang et al.,
2006).
Burek et al. (2005) describe breaking a sentence down into components, based on
linked phrases. They give as an example, "What researchers work in the ALPHA project
financed by the Argentine government?" They show that this can be broken down into a
sentence describing the ALPHA project, and a question about the researchers on the
project. Of course, such method can be applied recursively. This is the type of technique
which will be used when linking words like "my" appear in statements or questions in
this project.
Bererle (1993) describes a language, Lilog, which is similar to Loglan, first order
logic, and the limited English presented below. An example of some Anglicized Lilog is
"foral X: BUILDING(exists Y: TOWER part-of(X,Y) impl
CASTLE(X))". The same sentence in Loglan would look similar, but would be far
simpler to say and write. For instance "forall x:" is "Ba go" in Loglan. In the limited
English presented below, the concept presented above, "for all X, if X is a building, and
there is some tower Y, which is a part of X, then X is a castle" would be expressed as
"for all x: if both x building and x contains at least 1 tower then x castle".
One system, Chanel (Kuhn & Di Mori, 1995), attempts to learn semantics and
grammar on the fly. Herein, though, a fixed grammar will be used.
? ? 16
Specifically Prolog-Based Systems
Baral and Tari (2006) present a project in which grammatical parsing is used to
formulate the data and the question into Prolog, and Prolog is then used to formulate the
answer to the question. This is an example of an application-specific question-answering
system, but what is still desired is a general-purpose question-answering system.
Additionally, if the data is to be large, the use of Prolog or its equivalent should be a last
resort.
In the work of Marchiori (2004), a project somewhat similar to the current project
is discussed. Like this one, and English-like syntax is developed, and Prolog is used to
perform the logic. However, the grammar presented is at once too loose ("JOHN IS
'tal like a tower'.") and too specific ("VERB represents
'http://www.w3.org/2003/m2#verb'."), so that general discourse would be
almost impossible. The same problem was found in attempts to use Loglan grammars.
Also, using Prolog for the first cut means that all responses are too slow. Thus, in this
project, a more extensive yet fixed grammar will be used, but with a self-extending
vocabulary. Additionally, there will be a fast (but limited) prover as the logic engine,
with Prolog as a back-up.
Similarly, Greetha & Subramanian (1990) describe a limited English sentence
structure that is not only understood in Prolog, but parsed in Prolog. An example given in
the work is "John opened the door with a key". Using Greetha & Subramanian's method,
the Prolog structure which is first developed is "sentence(agent(np(propernoun(
john)), (vp(verb(opened), (object(np(det(the), noun(dor) ),
(instrument(pp(prep(with), np(art(a), noun(key)))))", which is then
simplified to "sentence(agent(np(john), verb(opened), object(np(
? ? 17
the door)), instrument(pp(with a key)))". In the project described below,
the same original sentence is presented as 'before now open "john" 1 door/ 1 key',
because each predication (verb, for most practical purposes) carries positional arguments.
In this example, the translated phrase would be presented to Prolog in a form equivalent
to "time(T<0), open(john, qtty(1,door), qtty(1,key))". In Greetha &
Subramanian's work, the introduced function abbreviations "np", "vp", "det", "pp",
"prep", and "art" stand for Noun Phrase, Verb Phrase, DETerminant, Prepositional
Phrase, PREPosition, and ARTicle, respectively.
The LogAnswer system (Furbach, 2008) parses the question for meaning and
formulates its own search plan using a ProLog program, and then does Google-like work,
in that it searches documents. Rather than formulating answers, it retrieves sentences or
passages, rating each for "Qualit?t" (quality).
Prolog and the Alternatives
Prolog is an obvious and popular choice for logic programming. Prolog is very
different from most other languages, in that it is almost entirely declarative rather than
procedural. This makes Prolog difficult to use, even for most experienced programmers.
A Prolog manual does not give an explanation of how one might go about actually using
Prolog for a project such as this. However, the book Prolog Programming for Artificial
Intelligence (Bratco, 2001) does just this. Some key points from apropos to this project
from Bratko's book are:
? That if a Prolog program or knowledge base defines a rule on itself, directly or in a
loop, that the program may fail in an infinite loop. A trivial example of self-reference
is "a:-a." A trivial example of a loop is "a:-b. b:-a." Thus, in a project of this
? ? 18
nature, the program should guarantee that no such loop is passed to Prolog. This may
or may not prove practical in the given time constraints. If impractical, it should be a
future goal. (?2.6.1)
? Because of the way in which Prolog recurses, it is possible that ordering of the clauses
passed to it in the knowledge base can involve recursive, depth-first goals, which may
prove unsolvable. If a program and query are passed to Prolog, and Prolog infinite-
loops for this reason, the result will be a stack overflow message of some sort. In these
cases, the driving program (this project) can rearrange the Prolog program such that
recursion will be breadth-first. (?2.6.2)
? Exclusive paths ? paths, which, if followed, preclude other paths from being tried ?
can be used to speed program execution, using the "!" operator. (?5.1.1)
? If exclusive paths are used, then criteria following the exclusive paths may be omitted,
much the same way that in C, one can change "if (x>0) y(); else if
(x<=0) z();" to "if (x>0) y(); else z();". (?5.1.2)
? Prolog uses a closed-world system. Anything that cannot be proven or disproved is
considered false. Thus, its use is limited. (?5.4) More on this below.
? Rather than having to generate a new program if the knowledge base is changed,
Prolog predicates "assert" and "retract" can add and remove facts and rules.
Additionally, and for the purposes of optimization, "asserta" can add facts and rules
at the beginning of the consideration list. (?7.4)
? Prolog, in some implementations, has the ability to define a parser. It remains to be
seen whether the parsing ability is robust enough for the purposes of this project. If so,
some or all of the parser might be written in Prolog, rather than Java. (?21)
? ? 19
? Programs are often written in Prolog for rapid prototyping, and then rewritten in other
languages to execute quickly, once the methods or rules are locked in. (?23.1)
? A meta-level executive ? in which the Prolog program controls the execution of
another Prolog program ?can be written almost trivially in Prolog. Use of this type of
facility allows various sorts of tracing, explanations of the methods and/or facts used
in a proof or determination, and the direction or limitation of depth of exploration.
(?23.2.1)
? For full theorem proving, rather than just determination of found or not found, Prolog
may have to be supplied with a transform function, giving it the explicit rules of
double negation, elimination, distribution, sub-expressions, and De Morgan's laws.
(?23.6)
Although Prolog is well-known, it suffers from the major drawback of being a
closed-world system, in which Yes is "yes", Maybe and No are "no". This makes Prolog
unsuitable as the full-fledged engine behind a conversational knowledge base. Prolog is,
however, optimal for a first-cut system, and the debugging thereof, because it is known to
work. A better alternative (Boley & Sintek, 1995) is RelFun. RelFun solves the problem
of the need for tri-state logic ("yes", "no", and "unproven") neatly:
Queries to RelFun differ only as follows: they return the truth-value "true"
instead of printing the answer "yes"; they signal failure by yielding the
truth-value "unknown" instead of printing "no". When we stay in the
relational realm of RelFun this makes not much of a difference since
"true" can be mapped to "yes" and "unknown" can be mapped to "no".
However, when proceeding to RelFun's functional realm, queries will be
able to return the third truth-value "false": this is to be mapped to those of
Prolog's "no" answers for which the closed-world assumption is justified.
In general, however, RelFun does not make the closed-world assumption,
and in the absence of explicit negative information modestly yields
"unknown" instead of "omnisciently" answering "no".
? ? 20
Other alternatives exist, too. However alternatives such as CP (for Conceptual
Programming) provide open-world facilities, but in a completely different manner. For
example (as shown by Hartley, 1986):
<>
<- [STATE: (PERSON: John] -
(POSS) -> [BOOK: * b]],
<- [EVENT: [GIVE] -
(AGT) -> [PERSON: John]
(OBJ) -> [BOOK: * b]
(RCPT) -> [PERSON: Mary]
<- [STATE: [PERSON:Mary ] -
(POSS) -> [BOOK: * b]].
Another such alternative is OWL (for Open World Logic), which can be accessed from
Prolog (Matzner & Hitzler, 2006). Even more so than Prolog, OWL's differences from
the rest of the procedural and declarative languages makes it difficult to use without a lot
of OWL experience.
Yet another example is the Lisp-like PowerLoom (Chalupsky, 2005). Although
PowerLoom differs significantly from Prolog, PowerLoom actually has a simpler syntax,
and a program written for Prolog could be quickly converted to PowerLoom. PowerLoom
has the advantage of running on a variety of platforms including Macintosh OS X. Given
these data, an extension of the program by porting from Prolog to PowerLoom must be
considered for a later phase (or the current project, if time allows).
The Loglan Grammar
The Loglan grammar deserves a full citation in this literature review because it
was, to a large extent, two of the steps in writing this proposal. It was the culmination of
the project that led to this one, and it was the basis for the first cut at this proposal (in
which Loglan was going to be the language in use), and served as the basis for the
planning of the English subset in this proposal. The grammar (Brown, 1960; Linker,
? ? 21
1980; Prothero et al., 1994) encapsulates the whole of human language capabilities, in a
very small space. Rather than taking the space inline, the grammar and its derivations
appear in the indices which follow. The 1994 Loglan Machine Grammar is ?1982-1994,
and is used herein with the express written permission of the publisher.
Other Grammar Work
Loglan and JCB-English (as defined below) both have a very limited set of
prepositions. In the future a great number of prepositions could be added to JCB-English.
Under the current design, descriptions of placement can be made, but not easily. In her
paper (2009), Lockwood describes a great number of ways in which language handling of
prepositions can work.
Work on the logic of tenses began thousands of years ago by Diodorus Chronus
(Galton, 2008), and has been formalized more recently by a number of researchers,
beginning with Prior in 1957. Such temporal logic is included here.
The tenses of possibility, such as "will", "may", "can", "must", and the like, are
sometimes known as "modes".
? ? 22
Table 1 ? Summary/Comparison with other systems
Summary/Comparison with other systems (major examples only ? not a complete list)
JCB Easy to learn, Speakable, fully functional for logic
definition and theorem proof. Now has a Machine Speaker
Loglan,
Lojban
Hard to learn, Speakable, fully functional for logic
definition and theorem proof
Attempto Very contextual, so very easy to violate the rules. Has a
very limited Machine Speaker, RACE.
ProLog Hard for most people to learn and use; not speakable;
fully functional for logic definition and theorem proof
Languages
SQL Hard for most people to learn and use; not speakable; very
fast for retrieval and association, but can't apply logic.
JCB Uses unambiguous parsing. Has statements and questions.
Answers questions with distinct answers. Is not domain-
specific.
Search
engines, such
as Google
Retrieve documents based on words given. Questions are
used to pick words from.
Template
matchers
Retrieves answer templates based on certain linguistic
"hits". Attempts to fill in the template from data.
Structure-
based systems
These systems use a number of grammar rules, but since
English (and German) grammar is fluid, they take their
best guess (highest grammatical point score) or
statistically good guesses (from past satisfaction values) to
take their guesses on matches.
Q&A
systems
Q&A boards,
such as Yahoo
Answers
These systems rely on users to answer questions.
? ? 23
Chapter 3 ? My Thesis, Put Simply
It is possible and feasible to produce a language suitable for a briefly-trained
layman to use to enter knowledge into a knowledge base, and to retrieve knowledge from
that knowledge base. Further, it is possible and feasible to produce a language processor
matched to that language.
The motivation for this work is simple ? to help realize the long-sought
conversational computer; but at the same time, to produce a system to surpass the
capabilities of simple search engines or data bases.
? ? 24
Chapter 4 ? The Work Done
The Research Phases
In the first phase of research into the design of JCB-English, a design course was
followed that didn't work out well, and that path was abandoned.
In the second phase of research into the design and implementation of JCB-
English, the design goals were met, and the outcome was successful. That design an
implementation led to this author's previous paper.
Once the author's Master's thesis was completed, further usage tests were
performed, in the way of usability and speed research. This research and these tests
indicated that a number of improvements could and should be made, and are listed below.
The research, design, and implementation goals formed the basis for the present effort.
Completeness
As originally designed, the JCB-English system had a plan calling for a "fast
prover" and a "slow prover". Then, when an utterance had been received, the fast prover
runs. If the fast prover returns True, False, or Answerable (with a proof text), then the
result is returned to the user right away. The fast prover operates by checking direct
implications. For each fact in the knowledge base, if that fact can (through direct
matching, and not logic manipulation other than decomposition) prove or disprove a
statement, or answer a question or query, then a result is in hand, and execution stops.
The slow prover was an uncompleted plan. It developed a Prolog program and query to
carry out the required logic, but never went so far as to deliver them to RelFun. RelFun is
? ? 25
much like Prolog, but rather than True and Unprovable, adds a False response. A
complete description can be found at http://relfun.org. Here, the "Slow Prover" was
completed, calling RelFun-like code for proof work. Originally, the Slow Prover was
built to translate the knowledge and question or candidate new knowledge into the Prolog
language for submission to RelFun for external processing. However, there were
problems in doing so. This author and the main author of RelFun worked together
telephonically to devise a solution. Some parts of the JCB language could not be handled
in RelFun, such as "There exists" clauses. The conclusion was that some of the
techniques used in RelFun and some of the techniques present in JCB-English would
have to be combined, resulting in the current Slow Prover. The Slow Prover, despite its
name, can operate fairly quickly. JCB-English has an Instant Prover component, used to
see if a statement is on its face true or false, which operates at O(1). The Fast Prover
operates at O(n), and is fairly incomplete. The Slow Prover can operate as slowly as
O(n!), but typically operates near O(n
2
). The Slow Prover can invoke the Medium
Prover to handle ? and ? statement evaluations within the broader investigation of the
statement in concert with the knowledge base.
Speed
As Gl?ckner (2008) points out, speed is a major issue in question answering
systems, especially in systems that use parsing and/or proofs to do their work. Two
possible methods of increasing speed have been identified.
One speed improvement method was to add a "Medium Prover". The medium
prover is a step between use of RelFun-like code to execute full proofs (a slow process)
and the quick check provided by the "Fast Prover" as described above. The Medium
? ? 26
Prover evaluates ? and ? and the negation of these items by enumerating all known items
into test sentences, and then calling the Fast Prover for each iteration.
Although it was planned that the Slow Prover should run on a second server while
the Medium and Fast Provers run, the Instant and Fast Provers that now run first can
complete in less time than it takes to move data to a secondary server, so the use of a
secondary server does not increase speed.
User-interface improvement
In the original testbed project, knowledge and input were both read from disk
files. The output elicited from the input was delivered to the Java console window, and
then knowledge was written to a new disk file for inspection. The second version, after
the previous paper was complete was a web interface, in which each user operates in an
independent "world". The goal at this stage was to have a usable web and service
interface, in which any number of users could log into the system and use it at the same
time, each sharing knowledge, but controlled by trust levels. Knowledge is read during
start-up, and rewritten to disk on a regular schedule, and once again on shut-down.
In the final system, there are three interfaces. The service program, running on a
server, can accept input from a single user, as if service-request packets were arriving,
and answer them one at a time. This allows for debugger-based testing. The service can,
of course, act as a true service, fielding packets and responding to them. The front-end
program appears as a web-page by responding with a web page to HTTP GET and POST
requests, acting as a broker for the service program. User state is maintained purely in
HTML, and the server program need maintain extremely little state information. The
knowledge base is currently stored as an array of objects in contiguous memory, and so
? ? 27
can be read and written very quickly. In this author's tests, read and write time were
unnoticeable.
Grammatical improvement over previous JCB-English
There is a very common form of speech in which we list a string of facts. For
instance, let's say we want to give facts A, B, C, and D. In English, the three main ways
of doing this are as four sentences, four paragraphs, or as a single-sentence list. It doesn't
matter which we use. In JCB-English, there were also three ways: As four transmissions,
four sub-utterances separated by the word "execute", or as a list of facts in a single
utterance, either as "both both both A and B and C and D" or "both A and both B
and both C and D". Either way is cumbersome. There is a difference between using
"execute" and "both", in that the "execute" method will accept A as true (in which
case the statement will be ignored), false (in which case the statement will be rejected), or
plausible (in which case the statement will be retained as knowledge), and then evaluate
B, C, and D in turn in the same way. The use of "both" means that the four putative facts
are evaluated as a single compound statement. If the statement is plausible, then each of
the four facts will be added to the knowledge base separately. If, as a whole, the
combined statement is false, then all four sub-facts will be rejected together. In order to
make the "both" form simpler to use, "also" is introduced, which acts through the
introduction of an additional production into the grammar.
This allows the four facts to be written as "A also B also C also D", and has the
same meaning as if "both" and "and" had been used. It is grammatically unambiguous
because it occurs at the outermost level of the grammar only, and thus cannot bind too
soon.
? ? 28
In his original research on speakable unambiguous languages, James Brown
brought forth the "Pretty Little Girls' School" example, in which the phrase, known in the
Loglan and Lojban communities as "PLGS", has meanings. The various meanings arise
from English's ambiguity in binding adjectives to other adjectives or nouns. For instance,
one meaning of "pretty little girls' school" has "pretty" modifying "little", meaning "little
in a pretty sort of way", and that construct modifying "girls", so that we mean "girls who
are little in a pretty sort of way", and finally having all of that modify "school", so that we
get "school for girls who are pretty in a little sort of way". In Loglan, where "pretty" is
"bilti", "little" is "cmalo", "girl" is "nirli", and "school" is "ckela", this first meaning is
translated as "bilti cmalo nirli ckela". In the previous version of JCB English, this
would translate as "adjective adjective adjective pretty modifies little modifies
girl modifies school". This is cumbersome.
Loglan provides for other orders of adjectival effects by providing "ge" and "gu".
In Loglan, adjectives normally associate from left to right. But, "ge" and "gu" form
parenthetical markers to limit or rearrange this association. Loglan allows for a missing
"gu" when it would occur after the predicate. For instance, "bilti ge cmalo nirli ckela
gu" and "bilti ge cmalo nirli ckela" are equivalent.
In order to make use of more manageable simple adjectives, an additional form
has been added, allowing the "adjective predicate" form, in addition to the previous
"adjective predication affects predication" form.
In any string of two or more predications, a predication to the left of another
modifies it as an adjective, binding right-to-left. When left-to-right associations are
? ? 29
desired, or arguments are required for adjectival phrases, the older, more verbose form
must be used.
Below are three of Loglan's 26 examples of "Pretty Little Girls' School" usage:
Table 2 ? Pretty Little Girls School Examples
Standard English Functional form
Previous JCB
English
New JCB English
Pretty little girls'
school
(((pretty little) girl)
school)
Adjective
adjective
adjective pretty
affects little
affects girls
affects school
Adjective
adjective pretty
little affects
girl affects
school
Pretty little girls'
school
(pretty (little (girl
school)))
Adjective pretty
affects adjective
adjective little
affects girls
affects school
Pretty little girl
school
Pretty little girls'
school
((pretty (little
girls)) school)
Adjective pretty
affects adjective
little affects girl
affects school
Adjective pretty
affects little
girl affects
school
New logic areas
For ease of writing the language, everything in older JCB-English was in Verb-
Subject-Object format, even though English is in Subject-Verb-Object format. In order to
simplify sentence writing, and have JCB-English look more like standard English,
subjects appear at the beginning of a sentence. However, when a predication is used as an
argument or adjective, or appears following a tense, it will continue to be in Verb-
? ? 30
Subject-Object argument form. In the first two cases, this is because English uses similar
formations. In the last case, this is to avoid ambiguity. Thus, "I like potatoes" could be
expressed as "i like the class potato" or "i *like the class *potato//"
Lexical improvement
In its first operational version, JCB English accepted facts (in particular) and the
basic concepts of the universe (in general) in the same manner ? as statements. For a
complete language, such as English, this is often cumbersome. For instance, for the
concepts of big (or large) or little (or small), we might have to state the following (shown
in English, for clarity):
For all X, Y, and Z, all of the following is true: If X is bigger than Y, then
X is larger than Y. If X is smaller than Y, then X is littler than Y. If X is
bigger than Y, then Y is not bigger than X. If X is smaller than Y, then Y
is not smaller than X. If X is bigger than Y, then Y is smaller than X. If X
is smaller than Y, then Y is bigger than X. If X is bigger than Y, and Y is
bigger than Z, then X is bigger than Z. If X is smaller than Y, and Y is
smaller than Z, then X is smaller than Z.
For the concept of membership and exclusion, we might have to state items like these
(again shown in English):
For all X, all of the following is true: If X is a cat, then X is not a dog and
X is not a rabbit. If X is a dog, then X is not a cat and X is not a rabbit. If
X is a rabbit, then X is not a dog and X is not a cat. If X is a dog then X is
an animal. If X is a cat then X is an animal. If X is a rabbit then X is an
animal.
? ? 31
These are straightforward, but cumbersome. They would take time to load as part of the
knowledge base, and would take time to use as a part of the knowledge base during
evaluation. To avoid this verbosity and time, and to thwart other problems, the following
additions have been made to JCB-English:
Besides storing a knowledge base, JCB-English stores an additional set of
knowledge dictionary-like items describing the language and the basic concepts of the
universe, apart from facts about the world. This has been implemented as a series of
dictionary-defining commands. These commands allow the definition of chaining rules,
such as a>b>c and a=b=c, synonyms, antonyms, exclusive membership sets, and strict
dictionary items. (See below for a complete description of all grammar items, including
commands.) The defining rules shown above in English are greatly simplified using the
new dictionary-defining commands:
>bigger execute synonyms bigger larger execute antonyms bigger
smaller execute antonyms bigger littler execute set animal cat
execute set animal dog execute set animal rabbit
Better Proof of Correctness and Better Speed than available in the Previous JCB-English
The compiler originally used for this project was written by this researcher purely
in Java and tested using a test plan. The compiler was supplied with inputs, and behaves
in a manner matching the test plan. For this phase of research, the modern equivalent of
YACC, Bison (Free, 2010), was used to check the grammar for ambiguity (as it is
required that the grammar be unambiguous), and to check whether the grammar is
actually in the SLR-1 (simple left-to-right parsing with a single look-ahead) class of
languages. (Bison supports general language parsing, so SLR-1 is no longer required; but
? ? 32
whether or not a language is SLR-1 is a good measure of its simplicity.) The new
compiler was written in C, as was everything but the front-end Java servlet.
A novel technique was used to get the speed required for the system to be usable.
An advantage that C has over Java is that C programs can allocate millions of objects at a
time. In the tests run in this project, it was not unusual for a simple test run to generate
hundreds or even thousands of computational objects during a proof. In C, an object can
contain an array without needing a separate array object. If the objects themselves are
stored in arrays, and allocated a million at a time, then the C-based prover will perform a
memory allocation once per million objects used, as opposed to the Java program's
2,000,000 allocations (for the base class, plus the enclosed array) per million objects
used. The objects used in the JCB-English server have another difference from the
standard C++ and Java objects. Objects here are polymorphic. For instance, in the
previous version, Dyadic-Predication(Exclusively, ConstantPredication(True), Constant-
Predication(True)) is simplified by creating a new predication, Constant-
Predication(False), and by back-tracking to any predication that linked to the original,
and changing that link to bear the object number of the newly created object. Using
polymorphic objects, the object which started this process as a Predication-Predication is
changed into a Constant-Predication in place, saving even low-level allocation, some of
the garbage collection, and saving the need to back-track.
Additional Languages
An investigation was made into having JCB-English both take input in, and
produce output in, Spanish ("espa?ol-JCB"). One way of making a multilingual parser in
YACC is to have the lexer emit one or more tokens to the parser defining the language to
? ? 33
use (for instance ENGLISH or SPANISH). The lexer could also switch key-word tables.
However, a problem arises in making an attempt to input an unambiguous Spanish-like
language.
One major problem in accepting such a Spanish-like language is the manner in
which Spanish handles negation. In English, "I like tomatoes" is the opposite of both "I
like no tomatoes" and "I don't like tomatoes", and the same as the grammatically horrible
"I don't like no tomatoes". In Spanish and related languages, negatives apply throughout,
so "Me gustan tomates" and "No me gustan ningunes tomates" are opposites. Using only
a single negative strikes a Spanish-speaker as malformed and ambiguous.
Another problem in accepting a Spanish-like language as unambiguous is
Spanish?s lack of prefix and postfix operators. In English, one can say "apples or
tomatoes" and be clear. Similarly, "manzanas o tomates" in Spanish. However, the
English "apples or tomatoes and bananas or oranges" is not clear, because we have no
rule to determine the "and" and "or" order. To make this clear in English, we need either
prefix operation ("both either apples or tomatoes and either bananas or oranges") or the
unwieldy postfix operators ("apples or tomatoes, either, and bananas or oranges, either,
both"). No such operators exist in Spanish.
Yet another problem in accepting an unambiguous Spanish-like language is the
problem of word order. In English, it is possible to rearrange words to suit, and a
grammar can be formed that is English-like enough for an English speaker to recognize
the words in the grammar for what they are intended to be, for instance, noun-like
predications, verb-like predications, or adjective- or adverb-like predications. Additional
descriptive phrases can be used. For instance "red apple" or "adjective red affects
? ? 34
apple". In Spanish, the adjectives and adverbs come after the main word, as do the
arguments. Any additional words that appear naturally in Spanish to be used to point out
the roles of the words would not actually disambiguate the situation.
For these reasons, it was decided that although a Spanish vocabulary could be
used for Spanish input, a Spanish-like grammar cannot. Thus, no Spanish input facility
has been provided. However JCB-English provides both unambiguous JCB-English
output, and ambiguous English output. Ambiguous Spanish output has been added. This
required the addition of one extra command, "spanish", followed by the spanish word
and the English word. For instance, "spanish gato cat" defines the translation.
Evaluation
Evaluation is at several levels ? speakability of the language, unit test, and actual
usability of the system. Speakability and unit test have been combined to some extent, in
that unit test contains a wide variety of concepts.
? Speakability of the language: Does the speakability of the language actually improve?
As can be seen in examples above and below, fewer words are required to say the
same thing, and there is now more flexibility in the language.
? Unit test: One very workable method of testing a piece of software is to check each
sentence of the manual or description, and see that the feature is present and correct.
Another is the aerospace method ? checking that every new or changed instruction
believed to be reachable is actually used. Some instructions are present for exceptional
cases, but are not believed to be actually reachable. Both methods have been used.
? ? 35
? Actual usability of the system: Can the system be used to store data, accept new data,
reject data, answer questions, and report that questions cannot be answered? Can
normal people do this?
? Timing: The original JCB-English system took a noticeable fraction of a second to
read or write a knowledge base, and could take noticeable time to form a proof. "Slow
Proofs" were not even implemented. The current system reads, writes, and proves so
quickly as to be completely unnoticeable when the fast prover is used, and in a
reasonable amount of time when the slow prover is used.
Performance Benchmarks
As a last step, tests were run on various types of statements and queries. In each
case not involving chaining logic, JCB-English responded in less than a tenth of a second.
? ? 36
Chapter 5 ? Design Considerations
Veracity
In any system which takes in data, and gives results, one must avoid the Garbage
In, Garbage Out syndrome. Data base systems typically do this by having trusted users,
untrusted users, and excluded users. Trusted users are allowed to write data into the
system, update it, and remove it. Untrusted users can only query the system. In a system
of this type, those categories are insufficient for a multi-user system. Many times, one
hears of one person's reality versus another's. If nothing else, Einstein proposed that the
truth, under certain circumstances, can be relative (in that no complete agreement can
ever be made on the relative coordinates of two events, not even on their relative times of
occurrence (Einstein, 1916)). So, unlike a standard data base system, in which each fact is
accepted universally or not, the situation here is of each fact being accepted universally,
or pertinent only to a given user and to those users who accept the given user's viewpoint.
In JCB-English, the operator speaks with absolute authority. Any user may state an
opinion. When proofs are made, only the operator, the user, and other users with
sufficient current trust are considered. This may be best shown by example.
The operator says that his cat is large. In so speaking, the operator states absolute,
inarguable fact. Bob, a regular person, says that his dog is small. Although Bob is given a
great deal of trust, Bob's normal statement is accepted as Bob's belief. Bill, a regular,
untrusted user, later says that the operator's cat is small, and that Bob's dog is big.
? ? 37
In the case of Bill's data entry, the statement about the operator's cat being small
should be rejected, because the fact that the operator's cat is large is absolute. However,
as a regular user, and from Bill's perspective, Bob's dog is indeed big. Thus, if Bill asks
for information on Bob's animals, he should be told that Bob's cat is large (as Bob made
known as absolute truth), and that Bob's dog is big (as is true from Bill's perspective).
Limitations
Development of the type of system being described here is an open-ended affair.
There will always be room for increased functionality and increases in the ability to make
logical transformations and solutions. Thus, in implementing a pilot program to
demonstrate the concept's viability, certain limitations must be placed on the program.
These limitations may include:
? There is currently no shift of pronouns, other than from "you", "i" and "me" to name
form. The context rules behind words like "he", "she", "it", "them", "they", and like are
difficult to unravel.
? Each utterance (a linguistic term meaning one or more sentences, spoken or entered at
once, which can be statements, questions, imperatives, informal languages, and
sentence fragments) must be either a collection of statements, or contain exactly one
predication containing exactly one question-word. Mixed statements and questions,
complex questions, and imperatives are not accepted.
Comparisons
The knowledge base and logic engine accept as input four types of utterance:
? A statement of fact, which may be composed of one or more predications and/or
logical connectives,
? ? 38
? a yes/no question,
? a fill-in-the-blank question, and
? direct system directives (commands).
The knowledge base is composed of stored fact-type utterances.
? In answering a yes/no question, the system attempts to prove or disprove the question's
predication.
? In the case of a fill-in-the-blank question (using a word, such as "when" or "which"),
the logic engine searches the knowledge base for a match (a proof with wild-cards), so
that the question can be answered.
? If a new statement is to be added to the knowledge base, and the same knowledge is
already present, the new statement is discarded as redundant.
? If a contradiction to the new statement is found (i.e., the new statement can be
disproved), then the new statement is not accepted. Note that some users will speak
globally, where others speak only for their own frame of reference.
? If a new statement is received, and is plausible (neither provable nor disprovable), then
the system accepts the statement into the knowledge base (at the current level of trust).
Here, the rules for what constitutes a match are explained (with the aid of a chart
on the following page). The system uses several types of logic. There is an Immediate
Prover, which checks whether a statement is true on its face (such as "x=x") or false on
its face (such as "x?x"). There is a Fast Prover, involving only direct matches, and a
Slow Prover, which makes use of inference. Proofs are tried in order: Immediate, Fast,
and then Slow Provers. If two statements exactly match in form and content, they are
? ? 39
equal. (For instance, "I am Sheldon" matches "I am Sheldon" in form and content.
Similarly, "I am who" matches, because "who" matches everyone.) If two statements
match in form, but do not match in content, the two statements are unequal. (For instance,
"I am Sheldon" does not match "I am Bob".) If one statement matches another in
content, but not in form, they may still match, by virtue of one of the statements implying
the other. For instance, "I have a cat" matches the statement "I have a happy cat". The
statement "I have a cat" is a generalization of the statement "I have a happy cat" because
they both imply "I have a cat". The second statement refines this knowledge. Thus, the
following table of conversations: (Note that these conversations are in English for
purposes of clarity of explanation, and not in the language used in a following section.)
Figure 1 ? Proof Flow
? ? 40
Table 3 ? Statements and Results (Conversations)
Statements Result
I have a cat.
I have a happy cat.
Do I have a happy
cat?
(accepted as new information)
(accepted as new information; "I have a cat" may be discarded as
redundant)
Yes
I have a happy cat.
I have a cat
Do I have a happy
cat?
(accepted as new information)
(ignored as redundant information)
Yes
I have a cat.
Some cat is happy.
Do I have a happy
cat?
(accepted as new information)
(accepted as new information)
(Unknown, because we have not identified it as the same cat.)
I have a cat.
I have no cat.
(accepted as new information)
(rejected as contradictory to known information)
Vocabulary
In the existing open-ended mode, the vocabulary was the list of words defined in
the grammar proper, plus the words which are recognized in context as predication
words. A word which is in context as a predication word is a word that appears in a
position which guarantees that it is a verb, and any word with an asterisk immediately
? ? 41
before it. In the strict vocabulary mode, only words which are defined as predications and
words defined in the grammar are allowed.
Input to the program is (a) persistent information, including dictionary items,
grammar nodes, and user accounts, and (b) user input packets, delivered directly or
through the Java servlet front end.
? ? 42
Chapter 6 ? Implementation
The Choice of Language
The prototype system was written in Java, and generates Prolog. The Prolog
component had not fully come into use, but was to be a part of this effort, with RelFun to
be used to run the Prolog code. The current version, designed for greater speed and
efficiency was written in C languages without the overhead of a generalized garbage
collector. In a production setting, this is a project that will process a large amount of data,
with the need to respond in a very short amount of time. Objects are used, but they are
polymorphic, in that they can change class after creation.
Knowledge Storage Design
The internal storage uses the rules of grammar as the basic storage structure.
Many grammar productions are stored as an object representing that grammar rule. Many
other grammar productions are transformed into some other grammar rule's storage class.
For instance, ?(?A??B) can and will be stored as A?B. Favoring speed over simplicity,
the prover not only deals with ? and ? directly, but also ?, ?, and ?. For external
storage, it was found that storing the data as an array of objects, plus a separate symbol
table was the fastest (but not the most conservative of disk space).
A novel type of object-oriented programming was used in this project to boost
speed. The objects used have several properties different than in normal object-oriented
systems:
? ? 43
? To reduce computational time, objects are allocated a million at a time, rather than
one at a time.
? Most objects are kept the same size as others, even if this involves considerable
padding, so that they can be kept in arrays without needing an array of pointers.
? It is possible to call a member method on a null object. In such a case, the method (in
the base class of the pointer's defined type) has a "self" which is itself null. This
allows calling a method without having to check before each use whether the object
ID is null or valued. Instead, the member routine can contain a single check for the
null case.
? Certain specialty objects, all immutable singletons, are never allocated, but have a
special object ID which identifies them by class as well as ID. For instance, the
"Anything" object is an Argument object with no property fields. No "Anything"
objects are ever allocated, but the "Anything" object ID can be used as an object ID,
and its methods can come into play.
? Certain immutable objects that have property fields, such as "Name arguments" and
"Literal text arguments" have only a single instance per unique value. For instance,
the equivalent of 'new NameA("fred")' and 'new NameA("ethel")' would return
different object IDs, but two uses of 'new NameA("fred")' would not. Likewise, the
"clone" method on these immutables returns the ID of the original object.
Because no current programming language has these desired features, C was used, and
these object behaviors were implemented at the application level.
? ? 44
Figure 2 ? An Overview of Processing
Inputs
The program needs to read the knowledge base, and then receive a number of
utterances. The knowledge base will serve to establish the vocabulary, chaining rules,
? ? 45
user list, and the sum of gleaned knowledge (empty at first). Input packets, delivered as
HTTP POST messages or and/or direct socket connections, will each contain one
utterance, which may contain items separated by the keyword "execute".
Processing
When the program initializes, it must first read the knowledge base. If no
vocabulary is present during initialization, then the system has no vocabulary of
predicates to start, and any word may be used as a predication. If no knowledge is
present during initialization, then the system starts devoid of knowledge, and everything
is initially plausible.
The system then starts accepting service requests. There will be a pure service
socket, and requests will also be accepted via HTTP transactions. The HTTP page (a
Java Servlet) will reformat the users' requests to a form acceptable to the pure service
socket. When a message (page or packet) is received, it will process the utterance. The
system performs I/O so quickly that it can afford to rewrite after every change.
Each utterance must be checked for a number of things: The utterance must be
syntactically and lexically correct, and within the vocabulary if vocabulary is constrained.
There can be at most one question word in each utterance. An utterance may also be a
command.
Once the preliminary checking is done, the utterance can be evaluated. If the top
level is a "both" or "also" compound, then the sub-utterances can be separated, and
considered separately. This type of separation is applied recursively. Each is added to the
knowledge base one at a time. This involves several steps: The sentence or question is
refactored and optimized to have the least possible logic, and least implication possible.
? ? 46
(For instance, "X=X" is always true, and thus should not be added to the data base. As
another example, "? ?X" is the same as "X". The sentence is then evaluated against the
knowledge base for veracity. If the sentence can be shown to be false, then the entire
utterance must be discarded as false. If the sentence can be shown to be true, then the
sentence must be discarded as redundant. If the sentence passes, then each of the
sentence's and-separated components must be added to the knowledge base. If there is a
question present, then it must be answered.
The logic applied above is able to handle adjectives. The subject of adjectives
does not usually come up in discussions of logic. So, if the knowledge base held "X is a
cat", and the new sentence is "X is a male cat", then it checks out as true. However, it
should also be noted that "X is a cat" would be replaced if the sentence is accepted. If the
knowledge base held "X is a male cat" and the sentence is "X is a cat", then the new
sentence is redundant. If the new sentence is "Y is a dog", then the sentence is plausible,
and the system must assume the sentence is true (in the viewpoint of the speaker). Given
"X is a cat" as the knowledge base, then these fill-in-the-blank questions would match:
"? is a cat", "X ? a cat", "X is ? cat", "X is a ?", and "X is ?".
Outputs
Output from the program occurs at the end of each transmission, responding with
acceptance, rejection, or the answer to any given question. Outputs to questions will be in
the language shown below. The service socket responds to its requestor. In the case of
an HTTP web request, the server socket will return the result to the Applet, which will
then wrap the result in a web page and send it to the user.
After modification, the knowledge base is rewritten.
? ? 47
Performance
Performance for the fast-prover is rather quick. Time required is linear with the
size of the knowledge base and the size of the input (O(n)). The Slow Prover's
performance can be as bad as O(n!), but in practice is closer to O(n
2
) when chaining
words (such as "big", meaning "bigger than") are not used. Server transaction times for
these tests, on a 2.5GHz processor, with both native and Java debuggers running, was
observed to be 10 to 67ms in such non-chaining cases.
? ? 48
Chapter 7 ? Tests
Tests consist of three types.
? Black box testing exercises each element of the grammar, with an expected result.
? White box testing checks each line of code believed to be executable for proper
execution. Exception lines not believed to be executable are not tested.
? Speakability tests involves doing translations, and looking for areas hard to understand
or too wordy.
Server Test Results
The tests shown below, are the results of execution. Each shows the input
presented directly to the JCB-English server. Where appropriate, the server's outputs,
storage, and/or test notes are shown.
Table 4 ? Test Cases
Test 1
Input: either "streak" say 'meow' or "streak" say 'woof'
Output: This information is being saved
Notes: The statement is already in simplest form, compared against an empty
knowledge base. It should be stored as is.
? ? 49
Test 2
Input: (packet has bad credentials)
Output: Account/Password doesn't match any user
Notes: Each packet bears credentials, which means that the system doesn't have to
maintain any user states.
Test 3
Input: facts
Output: # Fact
7
either "STREAK" *SAY 'meow' / or "STREAK" *SAY 'woof' /
Notes: This command dumps memory. In some instances, the entire dump will be
shown. In other instances, only the significant facts are shown. Although the
fact was entered without special start ("*") or end ("/") markers, it is displayed
with these markers. Knowledge is retained in pure form, without regard to its
entry form.
Test 4
Input: (packet header calls for output in English) my facts
Output: # Fact
7
either "STREAK" SAY 'meow' or "STREAK" SAY 'woof'
Notes: The packet called for output in English, rather than JCB-English, so the
disambiguating language markers are not present. This test also involved hard-
stopping the server without warning, and restarting it. Because the knowledge
base is maintained clean, and rewritten in a fraction of a second, maintaining the
knowledge base intact should not be a problem.
Test 5
Input: "streak" say either 'meow' or 'woof'
? ? 50
Output: This was already known
Notes: The input, after passing through the optimizer, becomes exactly the statement
about Streak we already have in the knowledge base, and is thus known.
Test 6
Input: for all x y sub 2: true
Output: This was already known
Notes: This is ?x,y
2
: True, which optimizes to True.
Test 7
Input: be my cat
Output: Parse error: syntax error, unexpected WORD_N, detected at just before
line 1, column 1
Test 8
Input: neither me be you nor i be your 1 cat execute
facts
Output: This information is being saved
# Fact
19
neither "OPERATOR" *BE "J C B" / nor both "OPERATOR" *BE
1 *CAT / / and "J C B" *OWN 1 *CAT / /
Notes: Here, "me", "you", and "i" have been mapped to proper names, "operator" for
the speaker, and "j c b" for the JCB-English system. "your" has been mapped
to a separate statement of ownership. In subsequent tests, storage will be shown
directly, even though it will be generated with facts commands behind the
scenes.
? ? 51
Test 9
Input: forget 19
Output: (none)
Memory: (gone)
Test 10
Input: forget "streak" say either 'meow' or 'woof'
Output: (none)
Saved: (gone)
Notes: Again, note that the predication here and in memory did not exactly match,
although they had the same meaning, and thus matched.
Test 11
Input: certainly be 'dozen' the sum of 5 and 7 egg
Saved: certainly *BE 'dozen' 12 *EGG//
Test 12
Input: on good authority be "streak" the difference between the product of
3 and 3 and the quotient of 16 and 2 cat
Saved: on good authority *BE "STREAK" 1 *CAT//
Test 13
Input: i believe the statement cute x is implied by cat x execute
i guess if dog x then friend x
Saved: i believe if *CAT x/ then *CUTE x/
i guess if *DOG x/ then *FRIEND x/
? ? 52
Test 14
Input: according to "descartes" if and only if *exist i then think i
Saved: acording to "DESCARTES" if and only if *EXIST "OPERATOR"/
then *THINK "OPERATOR"/
Notes: Here, "exist" needs an asterisk to prevent it from being considered a keyword.
Test 15
Input: likely exclusively dog "streak" or cat "streak" execute
not i cat
Saved: likely exclusively *DOG "STREAK"/ or *CAT "STREAK"/
not "OPERATOR" *CAT/
Test 16
Input: adjective jointly black and brown affects "streak" cat
Saved: adjective jointly *BLACK/ and *BROWN/ affects "STREAK" *CAT/
Test 17
Input: who cat
Output: Answer: "STREAK"
Test 18
Input: blank "streak"
Output: Answer: "STREAK" *CAT /
Test 19
Input: "nemo" blank
Output: There is insufficient information to say
? ? 53
Test 20
Input: adjective go anything emphasis affects "auburn" school also "streak"
be the item 1 cat with property i love
Saved: adjective *GO anything emphasis/ affects "AUBURN" *SCHOOL/
"STREAK" *BE 1 *CAT//
"OPERATOR" *LOVE 1 *CAT//
Notes: The statement about Auburn is that Auburn is a school, modified by go's second
argument. Since "go" is conventionally go(mover,destination,source), Auburn
is being modified as being a destination school, or the "go to" place. The
property "i love" expands to a separate statement as a way of applying the
property to the "cat" argument.
Test 21
Input: most of the class human/ gullible also all of the class politician/ liar
Saved: most of the class *HUMAN/ *GULLIBLE/
all of the class *POLITICIAN/ *LIAR/
Test 22
Input: i desire the event the class force/ *with you
Saved: "OPERATOR" *DESIRE all of the event al of the clas *FORCE/
*WITH "J C B"//
Test 23
Input: i like the number 3.1415926 also all the class cat/ nice also 1000000
dog/ nice execute
how many of the class cat/ nice execute
how many dog/ nice
? ? 54
Output: This information is being saved
Answer: all of the class *CAT /
Answer: 1000000 *DOG /
Saved: "OPERATOR" *LIKE the number 3.14159/ ?
Test 24
Input: i like both 1 aardvark and 1 bear also i like neither 1 cat nor 1 dog
also i like the item 1 elephant is implied by 1 fox also i like if 1
gazelle then 1 hamster also i like if and only if 1 iguana then 1 jaguar
also i like exclusively 1 kangaroo or 1 llama
Saved: neither "OPERATOR" *LIKE 1 *CAT/ nor "OPERATOR" *LIKE 1
*DOG//
if "OPERATOR" *LIKE 1 *FOX/ then "OPERATOR" *LIKE 1
*ELEPHANT//
if "OPERATOR" *LIKE 1 *GAZELE/ then "OPERATOR" *LIKE 1
*HAMSTER//
if and only if "OPERATOR" *LIKE 1 *IGUANA/ then "OPERATOR"
*LIKE 1 *JAGUAR//
exclusively "OPERATOR" *LIKE 1 *KANGARO/ or "OPERATOR"
*LIKE 1 *LLAMA//
"OPERATOR" *LIKE 1 *AARDVARK//
"OPERATOR" *LIKE 1 *BEAR//
Test 25
Input: at noon go i my 1 home also on tuesday go i 1 school
Saved: during 2010-08-10 through 2010-08-10 23:59:59 *GO "OPERATOR"
1 *SCHOOL//
? ? 55
at 2010-08-13 12:0 both *GO "OPERATOR" 1 *HOME/ and *OWN
"OPERATOR" 1 *HOME//
Test 26
Input: after september work "rachael" 1 job also on or after next week go i
1 school execute
when work "rachael" 1 job
Output: This information is being saved
Answer: after 2010-08-31 23:59:59 true
Saved: after 2010-08-31 23:59:59 *WORK "RACHAEL" 1 *JOB//
on or after 2010-08-15 *GO "OPERATOR" 1 *SCHOOL//
Test 27
Input: potentially burn the class wood execute
when burn the class wood
Output: This information is being saved
Answer: potentially true
Saved: potentially *BURN all of the class *WOOD//
Notes: The answer is "potentially true" because in the JCB-English language, a tense
("potentially") must be applied to a predication ("true").
Test 28
Input: during last month through this year magic a frog
Saved: during 2010-07-01 through 2010-12-31 23:59:59 *MAGIC 1
*FROG//
? ? 56
Test 29
Input: beginning after today at "auburn u" whatever also before now
located 3000000 meters from "auburn u" be "irvine" also on or
before yesterday up to a meter from "auburn u" be an egg
Saved: at "AUBURN U" beginning after 2010-08-18 23:59:59 *WHATEVER/
located 3000 meters from "AUBURN U" before 2010-08-18
03:40:04 *BE "IRVINE"/
up to 1 meters from "AUBURN U" on or before 2010-08-17
23:59:59 *BE 1 *EGG//
Test 30
Input: ending before tomorrow located 10 to 100 meters from "auburn u"
verbOne also during 11:00 am through 2:00:00 pm at least twenty
meters from "auburn u" verbTwo also at 6:00:03 near "auburn u"
verbThree
Saved: located 10 to 10 meters from "AUBURN U" ending before 2010-
08-19 *VERBONE/
at least 20 meters from "AUBURN U" during 2010-08-18 1:0
through 2010-08-18 14:00 *VERBTWO/
near "AUBURN U" at 2010-08-18 06:00:03 *VERBTHREE/
Test 31
Input: at midnight far from "auburn u" whatever execute
where whatever
Output: ?Answer: far from "AUBURN U" true
Saved: far from "AUBURN U" at 2010-08-19 *WHATEVER/
? ? 57
Test 32
Input: during 15 april whatever execute
tense whatever
Output: ?during 2010-04-14 through 2010-04-14 23:59:59 *WHATEVER /?
Test 33
Input: at 1 year 2 days from january 1 2010 verbOne also at 1 year before
january 1 2010 verbTwo also at may 1 2000 bce verbThree
Saved: during 200-05-01 bce through 200-04-30 0:0:01 bce
*VERBTHREE/
at 2009-01-01 *VERBTWO/
at 2011-01-03 23:59:59 *VERBONE/
Notes: Note that facts are not necessarily stored in the order received.
Test 34
Input both beginning 1997-04-01 noon cat "streak" and ending noon january
1 2004 big dog "fido" also "whitehouse" white house
Saved: beginning 1997-04-01 12:00 *CAT "STREAK"/
ending 2004-01-01 12:00 adjective *BIG/ affects "FIDO" *DOG/
adjective *WHITE/ affects "WHITEHOUSE" *HOUSE/
Test 35
Input: "streak" cat execute
is "streak" cat execute
is "arrow" cat
Output: This information is being saved
? ? 58
Yes
There is insufficient information to say
Test 36
Input: some of the class politician/ honest also little of the class student/
stupid also x blue quantity y house
Saved: some of the class *POLITICIAN/ *HONEST/
little of the class *STUDENT/ *STUPID/
subject x *BLUE quantity x *HOUSE//
Test 37
Input: there exists x such that false
Output: That would contradict information already held
Notes: The optimizer evaluates this, and the prover does not run.
Test 38
Input: there exists at least 1 x such that x happy
Saved: there exists at least 1 x such that x *HAPPY/
Test 39
Input: there exists 1 x such that x be "luna" also there exists up to 27 x
such that x go "luna" also there does not exist x such that both x go
"disneyland" and x sad
Saved: there exists 1 x such that x *BE "LUNA"/
there exists up to 27 x such that x *GO "LUNA"/
for all x: not both x *GO "DISNEYLAND"/ and x *SAD/
? ? 59
Notes: As of this test, the entire question and statement grammar-space has been tested.
Tests which follow cover command grammar and program logic.
Test 40
Input: user "abe" password 'aaa' execute
trust "abe" 1 execute
user "bob" password 'bbb' execute
trust bob 0.5 execute
user "sales" password 'sss' execute
trust "sales" 0 execute
drop "sales "
Notes: Effects were verified by inspection of the user table's storage.
Test 41
Input (Present credentials as "abe") cat "streak"
(Present credentials as "bob") "fido" dog
(Presentation of "sales" credentials was rejected)
(Present credentials as "operator") consider facts execute
who cat execute
who dog execute
consider opinion execute
who cat execute
who dog
Output: Answer: "STREAK"
? ? 60
There is insufficient information to say
Answer: "STREAK"
Answer: "FIDO"
Notes: With only certain facts in play, we can't take Bob's statement into consideration,
because his trust level is only 50%. We can take Abe's statement into
consideration, because his trust level is 100%.
Test 42
Input: consider opinion at 0.4 execute
who dog execute
consider opinion at 0.6 execute
who dog
Output: Answer: "FIDO"
There is insufficient information to say
Notes: At a consideration level of 0.4, Bob's 0.5 trust level is useful. At a
consideration level of 0.5, Bob's trust rating falls under the line.
Test 43
Input: forget "bob" facts
Saved: subject "STREAK" *CAT/
Test 44
Input: password 'ooo'
(present new credentials)
? ? 61
Test 45
Input: "streak" cat also for all x: if x cat then x happy execute
who happy
Output: ?Answer: "STREAK"
Notes: This is the first test showing inference, rather than data retrieval.
Test 46
Input: after noon go i
Output: This information is being saved
Input: after 13:00 go i
Output: This information is being saved
Input: after 11:00 go i
Output: This was already known
Notes: This and the nearby following tests are exercising otherwise untested areas of
the Fast prover.
Test 47
Input: after x seconds before noon go i
Output: This information is being saved
Input: after y seconds before noon go i
Output: This information is being saved
Input: after x seconds before noon go i
Output: This was already known
? ? 62
Test 48
Input: ending before monday go i
Output: This information is being saved
Input: ending before sunday go i
Output: This information is being saved
Input: ending before tuesday go i
Output: This was already known
Test 49
Input: during this month happy i
Output: This information is being saved
Input: during this week happy i
Output: This was already known
Test 50
Input: located 1000 to 1500 meters from "comfort inn" be "intuit"
Output: This information is being saved
Input: located 1500 to 1000 meters from "comfort inn" be "intuit"
Output: This was already known
Input: located 1000 to 1500 meters from x be "intuit"
Output: This information is being saved
Input: according to "ted" located 1000 to 1500 meters from x be "intuit"
Output: This information is being saved
Test 51
Input: potentially go i
? ? 63
Output: This information is being saved
Input: go i
Output: This information is being saved
Test 52
Input: i like the sum of 2 and x cat execute
i like the product of 2 and x cat
Notes: The Fast Prover correctly recognized the difference.
Test 53
Input: i like either a cat or a dog execute
either i like 1 cat or i like 1 dog
Output: This information is being saved
This was already known
Notes: The dyadic "or" was evaluated properly, using code that recognizes any
operator. "And" is a special case, though, in that the facts for "and" are stored
separately.
Test 54
Input: not "streak" dog execute
not "streak" fish execute
i like the number 3.1415 execute
i like the number 2.7818 execute
there exists x such that x happy execute
there exists x such that x happy
? ? 64
Notes: Each new fact was accepted. The repeated fact was called out as such.
Test 55
Input: there exists at least 3 x such that x go
Output: This information is being saved
Input: there exists up to 6 x such that x go
Output: This information is being saved
Input: there exists at least 4 x such that x go
Output: This information is being saved
Input: there exists at least 2 x such that x go
Output: This was already known
Input: there exists 5 x such that x weird
Output: This information is being saved
Input: there exists 6 x such that x weird
Output: That would contradict information already held
Test 56
Input: "x y z " like the number .3/
Saved: "X Y Z" *LIKE the number 0.3/
Test 57
Input: at 1900-5-5 go i also at 1904-5-5 go i also during september go i also
during january go i also at last century go i
Saved: during 1900-05-05 through 1900-05-05 23:59:59 *GO "OPERATOR"/
during 1904-05-05 through 1904-05-05 23:59:59 *GO "OPERATOR"/
during 2010-09-01 through 2010-09-30 23:59:59 *GO "OPERATOR"/
? ? 65
during 2011-01-01 through 2011-01-31 23:59:59 *GO "OPERATOR"/
during 1900-01-01 through 1999-12-31 23:59:59 *GO "OPERATOR"/
Test 58
Input: there exists x y z such that x go y z
Output: This information is being saved
Input: there exists z y y x such that x go y z
Output: This was already known
Test 59
Input: located 0 to 100 meters from b like i c also i like the sum of 0 and d
aardvark also i like the sum of e and 0 baboon also i like the
difference between f and 0 cat also i like the product of 0 and g dog
also i like the product of 1 and h elephant also i like the product of j
and 0 fox also i like the quotient of 0 and k gazelle also i like the
quotient of L and 1 iguana
Saved: "OPERATOR" *LIKE quantity d *AARDVARK//
up to 100 meters from b *LIKE "OPERATOR" c/
"OPERATOR" *LIKE quantity e *BABOON//
"OPERATOR" *LIKE quantity f *CAT//
"OPERATOR" *LIKE 0 *DOG//
"OPERATOR" *LIKE quantity h *ELEPHANT//
"OPERATOR" *LIKE 0 *FOX//
"OPERATOR" *LIKE 0 *GAZELLE//
"OPERATOR" *LIKE quantity L *IGUANA//
Notes: These are optimizer tests.
? ? 66
Test 60
Input: i like twelve both bagel and donut
Saved: "OPERATOR" *LIKE 12 *BAGEL//
"OPERATOR" *LIKE 12 *DONUT//
Test 61
Input: not not i like 1 aardvark also not neither i like 1 baboon nor i like 1 cat also not
either i like 1 dog or i like 1 elephant also not if and only if i like 1 fox then i
like 1 gazelle also not exclusively i like 1 hare or i like 1 iguana
Saved: "OPERATOR" *LIKE 1 *AARDVARK//
either "OPERATOR" *LIKE 1 *BABON/ or "OPERATOR" *LIKE 1
*CAT//
neither "OPERATOR" *LIKE 1 *DOG/ nor "OPERATOR" *LIKE 1
*ELEPHANT//
exclusively "OPERATOR" *LIKE 1 *FOX/ or "OPERATOR" *LIKE 1
*GAZELLE//
if and only if "OPERATOR" *LIKE 1 *HARE/ then "OPERATOR"
*LIKE 1 *IGUANA//
Test 62
Input: x be x
Output: This was already known
Test 63
Input: There exists at least 0 x such that x be a unicorn
Output: This was already known
? ? 67
Test 64
Input: (Output in ambiguous English) x sub 1 be '''< &' execute
facts
Output: This information is being saved
# | Fact |
1 | x1 *BE ''< &' / |