Andrew Todd and William Todd "Notes towards a Semantic Simulation of a Fragment of Child Language": An Example of Prior Art in Statistical Semantics and Artificial Intelligence Systems. =============================================================== Abstract: A description of a system of formal statistical semantics based on the common occurrence of words in sense-percepts (such as sentences). This system tolerates inconsistent data, assigning reduced probabilities in that case. The system therefore exhibits robustness, and given sufficient data, does not generate absurd results. First published in 1987. =============================================================== This article is copyrighted by the Ohio State University. The University has granted me, Andrew D. Todd, one of the authors, permission to "make any use whatsoever of the text and ideas... that [I] wish" Out of this grant of authority, I hereby subgrant a General Public Licence, according to the customary terms of the Free Software Foundation. Specifically, the article may be reproduced and distributed freely by all persons whatsoever, provided royalties are not levied for this specific article, and provided the circumstances of its authorship, original publication, and copyright ownership are not misrepresented. The Ohio State University retains the right to make such other uses and grants of permission as it may see fit. Anyone discontented by the terms herein contained is cordially invited to apply to the Ohio State University for such additional permissions as he may require. =============================================================== Modern Introduction by Andrew Todd (2000) The paper which follows was written in about 1985-86, and published in May, 1987, by which time I was fully occupied in the first year and core courses of the graduate Anthropology program at Oregon. As the junior author, I worked up the algorithm which was the meat of the paper, and the senior author provided inspiration, background, and context, as well as writing or redrafting most of the actual prose. In short, it was the standard variety of professor/student collaboration, intended in large part for the training of the junior author. In this particular case, it was intended to give me something to do in the year before a graduate school slot would open up. For the algorithm, I used a correlational method similar to that employed by certain later researchers whom I am informed may have recently patented the method. The main difference between their approach and mine was that I envisioned the correlation machine sitting on top of a syntax machine and processing the syntax machine's output. The use of correlation did not represent any particular act of invention or imagination. It was simply one of the standard intellectual tools of a recent engineering school graduate. It was "immediately obvious to one familiar with the state of the art," as the patent lawyers say. The senior author, my father, William L. Todd (Prof. of Philosophy, retired, U. of Cincinnati) was one of Arthur W. Burks' graduate students at Michigan in the late 1950's, and I understand that his dissertation involved the same basic kind of approach to linguistics that these later researchers have recently discovered, that is, the basic Chomskyian idea of generative systems without most of Noam Chomsky's presuppositions, conducted in the light of computers. This dissertation was eventually published as Analytical Solipsism (Martinus Nyhoff, The Hague, 1968). A further publication of my father's was Language Acquisition: A Speculative Model (1974). This book was written with one of his own graduate students at Cincinnati, Lisbeth (Lisa) Retchin, who afterwards gained employment at California State University (Chico). It was privately printed by my father under the imprint of "Ehling Croyden Press," a name which later had to be changed to avoid trademark conflict. My father was editor and publisher of the Journal of Philosophical Linguistics from 1969-73, a journal he founded to publish papers in this new field which the regular Philosophy Journals refused on the grounds that they were Linguistics and the regular Linguistics journals refused on the grounds that they were Philosophy. My father was able to close JPL down when the regular journals reconsidered their position, and started accepting such papers. The authors published in JPL included George Lakoff and Charles Caton, who have since become recognized as founding fathers of the field. My father's advisor at Michigan, Prof. Burks was one of the members of the original ENIAC team, and his graduate students tended to have special access to computers. They held Top Secret security clearances, worked on military projects, and had the opportunity to think about what computers might be good for, years before most other scholars did. In my father's case, this meant working at the Naval Operations Research Office, where they had him programming an early video terminal in connection with nuclear war gaming. He came away from the experience with a strong belief in the possibilities of computer simulation, which has persistently informed all his writings. Of course the ideas about what computers were good for ran anything up to forty years in advance of economic feasibility. That is a common pattern. Very little of what is done with computers today was not envisioned in considerable detail, circa 1960. Most of these projects were done as thought experiments for want of computer time, and were eventually put aside when they had been carried as far as they could be carried without actual computer access. This was especially the case as the kind of computer which could carry out these algorithms was necessarily a very powerful one. To turn a thought experiment into a working computer system would very likely have involved tens of billions of dollars. The statistical approach to artificial intelligence did not come out of nowhere: it emerged from a living tradition linking the birth of computers to the birth of modern linguistics. By contrast, a great many recent "inventors" seem to have been profoundly uninterested in the field because there was no money to be made in it. Their insistent claims of complete originality are ultimately reflective of their superficial, commercial grasp of the subject matter. The approach ultimately embodied in this paper was original, but compromised by reliance on Classified (Top Secret) material, in the 1950's; original and publishable in the 1960's; suitable for the training of an advanced graduate student in the 1970's; suitable for the training of a very advanced undergraduate student in the 1980's; and, depending on one's viewpoint, either suitable for the training of a high-school student, or for commercial exploitation, in the 1990's. It is at long last economically possible to carry out the theoretical ideas of the 1950's and 1960's as practical tools. We should be very careful not to allow recent "inventors" to claim patents on techniques which are actually very old. The following version of the paper has been partly retyped from the published version, and partly reconstructed from intermediate drafts. I must express my gratitude to the Ohio State University for permission to republish this article, and to Prof. Peter W. Culicover, Professor and Chair, Department of Linguistics, for his good offices in arranging this permission. I would also like to thank Ms. Debra Riffon, Morgantown, WV for her extraordinarily accurate typing. Andrew D. Todd 1249 Pineview Dr., Apt 4 Morgantown, WV 26505 U46A8@WVNVM.WVNET.EDU September 29, 2000 ================================================================= Citation of Original: Andrew Todd and William Todd, "Notes towards a Semantic Simulation of a Fragment of Child Language," pp. 49-58, _Ohio State University Working Papers in Linguistics_, No. 35, "A Festschrift for Ilse Lehiste," ed. Brian D. Joseph and Arnold M. Zwicky, May 1987 (The Ohio State University, Department of Linguistics). I understand that copies are on deposit in both the Ohio State University Library and the Ohio State University Archive. No patents were filed in connection with this work, and the novel substance of the paper is therefore in the public domain. ================================================================= (_start_p.49_) Notes toward a Semantic Simulation of a Fragment of Child Language Andrew Todd and William Todd University of Oregon and University of Cincinnati Scenario A boy of three, out with his mother, sees a strange dog some thirty yards away. He likes dogs and wishes to approach and pet it. He is also afraid that it will bite him, and, to a lesser degree, that it will jump up and lap his face. At this point, his mother says to him, 'That dog is old.' Since the sentence is a simple one, it can easily be parsed, and there are many parsing programs that will handle it quickly. The problem we wish to address is a semantic one. Once the child has resolved the sentence into its components, how will he interpret them? That is, how will he process them, and what difference will that processing make to his beliefs, intentions, and behavior? While these questions are extremely difficult, we will suggest some ways in which a computer simulation of this aspect of the boy's functioning might be approached. We will then engage in some speculations about the reality to be simulated. Before proceeding to the semantics, there are some important phonetic assumptions that must be made. The mother's utterance will make no difference unless it is uttered within a certain range of tones of voice. Moreover, there may be some tones that would effectively forbid the boy to approach the dog, or which give him permission to do so, regardless of the words that are uttered. In these cases there will be no semantic processing. We hope to interest Ilse Lehiste, who is far more competent in this area than ourselves, in answering questions of this sort. Let us here assume that the sentence is uttered in such a way that the child listens to it and takes it seriously, but still feels free to decide how to deal with the dog. It must now be recognized that the boy already has a great many beliefs about the world in general, and about dogs in particular. The instant he sees the dog, he will begin to apply as many of these beliefs as possible to the present case. Our simulation will therefore assume an existing database and a method of generating predictions about the dog. The importance of "That dog is old", as received and parsed, is that it will alter these previously existing beliefs in ways to be discussed. If one felt compelled to ask what the sentence means (in a philosophical way), or what its semantic content is, one would be asking for a generalization about the ways in which it affects the existing beliefs of individuals. Such questions are not particularly useful. A simulation of the child must contain a parser which is capable of isolating the subject, no great problem in the case of such a simple sentence. Once "that dog" is returned from the subject search, the general problem would be to find out what it refers to on this occasion. We here hypothesize that the child's problem is much simpler than this might seem. He cares only about the question he already has in mind, whether to approach the dog. He is not interested, at such a moment, in storing general information which may, or may not, be useful later on. He thus assumes that "that dog" refers to the object of his current interest, the dog, and will make only a minimal check. In order (_start_p.50_) to do this he must have a database in which "dog" is associated with some of the observable features of a dog. If a certain proportion of these features are not observed, the whole sentence is thrown out as being of no current interest. The most important of the boy's beliefs about the dog probably do not concern such things as its color and size. They are expectations concerning the behavior of the dog when approached in various ways. One way of putting it is that there is a procedure which the child expects the dog to follow. It would seem that very young children can have rather elaborate expectations about the behavior of persons and animals. Most important from our point of view, these expectations can be modified by verbal input. There are, at this point, two ways of looking at the situation. One can think of the child as expecting the dog to follow a program with many sub-routines, each of which concerns the behavior of the dog in some hypothetical situation. On the other hand, one can think of each sub-routine merely as representing a dispositional property on the part of the dog. For example, "bad-tempered" means, more or less, that the dog will bite in a certain range of circumstances, growl in others, and so on. In one sense, it makes little difference whether we speak of a dispositional property or a program. On another level, however, it makes a great deal of difference. If we stick to properties, the program that the child follows can simply chain them together, allowing that the links in the chain are only statistical, and much less than foolproof. When verbal input, such as "That dog is old" comes along, it can be allowed to affect the chains, that is, the data. Alternatively, if we have sub-routines instead of dispositional properties, we are likely to have fewer of them. One sub-routine is altered in certain ways to make it represent a new and different dispositional property. For example, an extremely bad-tempered dog follows the same basic program as a bad-tempered one, except that it takes less provocation to make him growl and bite. It might seem, then, that it is more economical to choose a few sub-routines which, with seemingly minor modifications, will represent a large number of dispositional properties. If, on the other hand, each dispositional property is taken as independent, the master program that the child follows will not "know" about the connections between those properties (and the programs corresponding to them). There is, however, one great difficulty in the program approach. It is extremely difficult to set up a general program to modify sub-programs. It may be virtually impossible to get the degree of generality to handle economically the changes the child would have to bring about in the sub- routines when he gets verbal input, as in our example. It is much easier to effect alterations in chains. It will be more economical, in the long run, to ignore or "lose" a certain body of information (the relative degree of similarity or overlap between dispositional properties), but, at the same time, avoid the pitfalls of writing programs to alter other programs. Let us take the following set of items as a fragment of out database. [OLD][~YOUNG] [YOUNG][ACTIVE] [ACTIVE][MAKE NOISE] /DOG\[ACTIVE][JUMP UP] (_start_p.51_) /DOG\[ACTIVE][BITE] /DOG\[BARK][MAKE NOISE] /DOG\[JUMP UP][LICK] [FRIENDLY][~BITE] The input from the mother (root item) will be in subject- predicate form, and the subject, here DOG, may well refer to a particular dog. However, the words appearing in database terms refer only to general properties, and the item itself is merely the record of one or more observed co-occurence of those properties. The order of the words in a database item (but not a root item) will thus make no difference. we also assume that the child makes no distinction between the general and particular uses of DOG. Nothing in the procedures to come will depend on it, and we are suggesting the most rudimentary and fastest-acting system best fits the needs of the child at a certain stage. One could certainly hypothesize that there is another (perhaps later) database containing information in subject- predicate form, but we will look first to the minimal model. Even this database does contain a feature which does some of the work of predication. Anything enclosed in/\'s is a non-exchangable matching word which must appear in the string under consideration if this particular item is to be used. The chaining algorithm uses these items to generate transformations of the original input. It works along the following lines (entries from the database are enclosed in {}'s): [DOG][OLD] (root) {[OLD][~YOUNG]} [DOG] [~YOUNG] {[YOUNG][ACTIVE]} [DOG] [~ACTIVE] {[ACTIVE][JUMP UP]} [DOG] [~JUMP UP] {/DOG\[JUMP UP][LICK]} [DOG] [~LICK] We have in effect allowed the inference from {[YOUNG][ACTIVE]} to {[~YOUNG][~ACTIVE]}. While this sort of inference can cause problems, we have here in mind a context so limited that allowing it will do more good than harm in terms of efficiency. Since there are many transformations which can be made, we have to specify an objective. Let "%" be defined to be the logical equivalent of "plus or minus". Then define the objective as being of the form [%A][%B]... or [A][%B][%C]... For example, [DOG][JUMP UP] or [DOG][~JUMP UP], the two answers that the child is interested in, are of the form [DOG][%JUMP UP]. We will later suggest an algorithm capable of selecting an appropriate path to the end result. The child is likely to receive information that conflicts with his previous beliefs. His mother's input will create the root [DOG][OLD], but he may have {[DOG][YOUNG]} or {[DOG][~OLD]} in his database, thus believing, in effect, that all dogs in his environment are young. He would therefore have to choose between the new information and the old. If we build our model in that way, the child being represented must be either excessively susceptible or immune to it altogether. In fact, when the mother says that the dog is old, that should induce a slight increase in the child's acceptance (_start_p.52_) of the dog. It should not produce a response as if the child had been on intimate terms with the dog since birth. What we want is an increment statement which, when repeated, produces a belief of increasing strength. The simplest way to achieve this is to give the proposition, not a truth value, but something like a probability, which, being continuous, can have an infinitely fine variation of values. Let us therefore introduce a statistical measure of association, "&", which has a range of -1 to 1 inclusive. The co-efficient, -1, when attached to a word, represents the situation where the property is believed (with practical but not absolute certainty) not to be present, and 1 that where the property is similarly believed to be present. The value 0 implies no belief either way. If we use this "&" in place of the "%" , it will have certain useful properties. Double negations will cancel, and a chain of reasoning built on a series of dubious assumptions will reflect the cumulative uncertainty of the whole. The calculated value of "&" will have a sign which is, in a sense, a result. It will also have a magnitude which is the reliability of that result. Our new data base look like this: [(1)OLD][(-1)YOUNG] [(1)YOUNG][(.9)ACTIVE] [(1)ACTIVE][(.9)MAKE NOISE] [(1)ACTIVE][(.9)JUMP UP] [(1)ACTIVE][(.1)BITE] /(.2)DOG\[(.9)BARK][(.9)MAKE NOISE] /(.2)DOG\[(.9)WAG TAIL][(.9)FRIENDLY] [(1)FRIENDLY][(-.95)BITE] The non-substitutable matching word (in /\'s) now has an associated factor which must be used in computing "&" if the item is used under conditions where the matching word does not appear, e.g. using {/(x)A\[(y)B][(z)C]}, [(j)A][(m)D][(n)B] yields [(j)A][(m)D][(n*y*z)C], but [(m)D][(n)B] becomes [(m)D][(n*x*y*z)C] Note: For purposes of computation we can add to an item any substitutable word with a co-efficient of 0 or any non- substitutable word with a co-efficient of 1. We now have a derivation like this: [(1)DOG] [(.9)OLD] (root) {[(1)OLD] [(-1)YOUNG]} [(1)DOG] [(-.9)YOUNG] {[(1)YOUNG] [(.9)ACTIVE]} [(1)DOG] [(-.81)ACTIVE] {[(1)ACTIVE] [(.9)JUMP UP]} [(1)DOG] [(-.729)JUMP UP] When we use the database, coefficients are always multiplied together, and, within that application, have no separate importance. However, when the mother (or anyone) speaks to the child, the coefficients in the root item have a different significance. In [(1)DOG][(.9)OLD] we assign 1 to DOG since the child assumes its presence and has his attention centered on it. The other coefficient is a measure of confidence the child has in this particular speaker before he consults his own database. The result of the derivation, (_start_p.53_) [(1)DOG][(-.729)JUMP UP], does not, in itself, imply an approach to the dog, but would be one component in a larger model that might represent desires as well as beliefs. Having reconciled them, it would produce output which represents intentional actions. It is worth noting, however, that the model which produces the best output may not be one which preserves the ordinary distinction between desire and belief. Let us now turn to the database itself and ask how it might be formed. There must, in the beginning, be categories. A child is more likely to recognize and remember a cat than an object which comprises, say, the lower 60% of the cat and three square feet of the surface on which it is standing. Philosophically, there is nothing wrong with the latter sort of object, but it is unsuited for our model because, if it were a category, it would give rise to a less useful database than the sort the child seems to have. There will be a word associated with each category, and the general principle is that, whenever the child has a sufficiently striking experience, a new item is created. If he notices only an active squirrel, SQUIRREL and ACTIVE will both have positive coefficients. If he notices a young man with a hat, and notices that he has no coat. YOUNG, MAN, and HAT will have positive co- efficients and COAT a negative one. The magnitude of the coefficients will depend on the extent to which the child is "struck" by each feature, or by combinations of them. This allows for the representation of non-rational factors. The child may, for example, be intensely affected by an object or aspect of an object because he fears it, and this may predispose him to expect its re-occurrence. Another possibility is that the child may not be impressed by an experienced combination at a given conscious or unconscious level, yet repetition may still have its effect. Thus, on the tenth occurrence of the combination, he may "feel" that the two factors, which are then co-present, will always co-occur. In that case both coefficients will be higher than they would otherwise be. A completely developed model would have to have some mechanism for measuring these factors and deciding what sort of environment and prior condition of the child would give rise to an input which is striking to one degree or another. It may ultimately be found that it is better to simulate a whole environment with a number of persons in it, as opposed to construction a model for the child alone. For the present, we would envision a series of models representing a single individual, beginning with extremely simplistic ones, but which would gradually grow more sophisticated. The algorithms used to set coefficients would mirror that development. This process of database development will, in the course of time, produce items which have the same words but different coefficients. In reconciling those differences we must remember that it is not a matter of averaging them. If we have both {[(1) DOG][(.75) OLD]} and {[(1) DOG][(.65) OLD]}, we must remember that the second item provides additional confirmation for the first, and vice versa, so that the reconciled coefficient for OLD ought to be higher than in either previous instance. We will therefore need an algorithm for so handling items in the data base, and for reconciling them with new information, as, for example, that which comes from the mother. We can think of this process as one of "churning the database", and it is stimulated, not only by new input, but by many other occurrences. Since each new item must be "bounced off" and reconciled with each relevant old item, there is a natural conservatism, which favors a considerable body of old information (_start_p.54_) (subject to qualifications to be made later) over new information. Churning is suspended each time there is a need for action, and thus for definite coefficients. When that happens, the relevant database item most easily reached is used, thus importing a random element into the resulting beliefs and actions. As a churning algorithm, we suggest the following: Let a && b = a+b+c(a,b) where c(a,b)=-a*abs(b) if a*b > 0, else c(a,b) = 0 Then, taking the item from the database to be {/(x)A\[(y)B][(z)C]...} and linking from B to C: [(j)A][(m)D][(n)B][(p)C]... becomes [(j)A][(m)D][(n)B][(r)C]... where r=( p && ((x && abs(j))*n*y*z)), but, if A does not appear in the derived root item ( j is 0), [(m)D][(n)B][(p)C]... becomes [(m)D][(n)B][(r)C]... where now, r=( p && (x*n*y*z)) In either case, the coefficient z in the database is replaced by w : w=z+e*((y*r/n)-z) where e=abs(p&&(-r)). The fact that some of the algorithms required for these tasks in the model may be complex does not imply a claim that the child does complex calculations in his head. These and other algorithms are arrived at by setting forth plausible cases, plotting them, and then finding a formula that fits the curve. The result might be taken to describe a neural electro-chemical process within the brain. In all models of this sort there are many algorithms used in the computation, which can be progressively modified and adjusted to produce results more nearly corresponding to the observed reality being modeled. The battle is largely won if the model is flexible in enough ways so that the results can be skewed in the desired direction by changes of algorithm. A critical question in this sort of model construction is: How long should items be held in the database? We have argued elsewhere (Todd, Thompson and Todd 1984: Part 6) that human reasoning is more likely to suffer from too much information that from too little. The child needs a system that works fast. It is better to supply the need for action with conclusions, even if a significant percentage are false, than to have action delayed or stultified by too much processing. We also suggested there that certain phenomena of aphasia can be understood best on the assumption of a periodic wipe-out of most of the database while, at the same time, new conclusions are constantly being generated. It is often better to generate a conclusion anew than to store it indefinitely, particularly since the coefficients need periodic revision in any case. This kind of periodic wipe-out will lose connections which would have been "confirmed" if the timing of the wipe-out cycle had been different. But, again, minimal information loss is to be tolerated in the interests of speed and simplicity. At least, that is the hypothesis about the child embodied in our model. We will again leave open the exact procedure for deleting items from the database on this ground. In scientific investigation, some concepts, such as that of density, have turned out to be inordinately productive. At the opposite extreme are concepts such as Nelson Goodman's "emeruby", denoting an object that abruptly changes from an emerald to a ruby at them t (Goodman 1965: 102-3). If t is taken as the present, any evidence, which confirms the belief that an object is an emerald equally confirms the belief that it is an emeruby (and hence will change color, etc. immediately). An emeruby is, of course, an extreme case. (_start_p.55_) There are indefinitely many other concepts, which are, to one degree or another, unsuited for scientific or everyday reasoning. Goodman has shown that there is no logical or inductively justified way of ruling such properties out of court. We may not like them or use them, but science itself gives us no reason for rejecting them. A consequence of Goodman's point is that the child, "looking over his concepts", has no way of knowing which may be, to some degree, like that of an emeruby. His only real guide will be the input he gets from others. Thus, a tally must be kept of the frequency with which each word denoting a category in his database is spoken to him by others. Thus, in addition to the UP-Dating Algorithm and Churning Algorithm, there will have to be a Lack-of-Frequency Algorithm, which systematically lowers the coefficients of such words wherever they appear in the database. If we now, at the periodic wipe-out phase, eliminate, roughly speaking, all items the products of whose coefficients are distanced from 0 by less than a given threshold, the database will be skewed in favor of the concepts used by the larger society. We have seen that working with the database changes the database. We must therefore have a means of restoring the database to the state that it would have been in if we had not done the last x transformations. The simplest way to do this is to treat a change or changes as a series of wholesale insertions and deletions of items, the series being stored in a stack which exists for that purpose. These are all reversible so, to go back up the tree structure of possible transformations towards the starting point, we merely take entries from the stack, insert the deletions, and delete the insertions. Suppose now that we want to use two or more external roots. We will have a separate external root database in which these are put and it will be temporarily appended to the main database. We will then start transforming one item with the use of the others. If all the items in the external root database are used then the derived result can be said to have been derived from them. It is, of course, possible that one of the items in the external root database may be totally unrelated to the subject at hand, and, in that case, it cannot be incorporated in a chain leading to the desired result. Let us consider each possible state of the database and derived root item as a node in a branching structure with the branches being different possible transformations of the database and derived root item in the state associated with the node from which the branch issues. The branching structure would look rather like so: {1} / \ a b / \ {2} {3} / \ / \ a b a b / \ / \ {4} {5} {6} {7} where '{1}', '{2}', '{3}', '{4}', '{5}', '{6}', and '{7}' are possible states of the database and derived root item, and 'a' and 'b' are possible transformations of the same. Let us next consider searching all possible combinations of items or, (_start_p.56_) rather, some reasonable subset of them. Now this is where "&" comes into its own. Consider a quantity called "&*" and let "&*" be the product of all the "&"s of the derived root item. At this point, "&*" obviously pertains to the whole derived root item, rather than to a part of it. If "&*" falls below a certain threshold, then we branch back and try a different branch from the previous node. If all the branches from that node are untenable, then we branch back still farther, and so on. To ensure that the first items, comprising the external root database, are used, we have the rule that possible branches are considered in the order that they appear in the database. We now have a scheme which searches for what, speaking somewhat loosely, amounts to the statistically significant implications of the original state of the database, together with the external items, with special stress being placed on the implications of the latter. But this is not yet what we want. We want to know, not only whether the derivation is reasonable, but whether it is relevent. As stated before, we have a target item of a form similar to the items in the database except that it does not have "&"coefficiants. It may however have weights, which we shall call "@!", taking 0 to mean that the word does not appear in the target and 1 to mean that it is fully present. The object is to determine which of its words should get preference in being matched with words in the derived root item. Further, we have some statistic which we shall call "&#" for measuring closeness of fit between the target and the derived root item. One possible formula would be the following: &# = sum of &#(j) for all possible words (where &#(j) is a measure of fit between the occurrence of a word in the derived root item and the occurance of that same word in the target.) &#(j) is computed as follows: if @! >= abs(&) then &#(j) = 2 * abs(&) - @! otherwise &#(j) = 1.5 * @! - .5 * abs(&) This formula was obtained by taking four cases of abs(&) and @!, intuitively selecting appropriate values of &#(j) for them and then contriving a function to fit them. Here are the four cases plotted on a graph. It should be noted that the linking together of word computations is effected by addition. Therefore the identity element is 0. Wth symbolic values, the graph is: abs(&) 0 1 ______________ 1|N Y| | | | | @! | | | | | | 0|NE N-| ---------------- where N is no, Y is yes, NE is no effect, N- is no, only less emphatic than N. With numbers, bearing in mind that NE must be 0, (_start_p.57_) abs(&) 0 1 ______________ 1|-1 1| | | | | @! | | | | | | 0|0 -.5| ---------------- The result is a system that finds what are, in effect, statistical inferences about the relevance of the root item, based on new information. It should be noted that these are not definitive, as the number of items which can be derived is not finite, and therefore we search only a small subset of the possible range of combinations. We would like to treat briefly the means whereby the algorithm described above would be implemented in hardware in what might be called a realistic case, by which we mean a case involving much larger amounts of data. This may serve to give some insight into the sort of processes going on inside of a child's brain. Let us consider that the child is at a node called {A} in the algorithm above and let us consider that {A} has what might be called daughter nodes {B}, reached by branch b, {C}, reached by branch c, {D}, reached by branch d, and so on. It should be understood that the limits of speed in going through the algorithm are not posed by the total amount of computation to be done but by the number of things which must be done in sequence. If many different parts of the job do not depend on each other for inputs, then they may be done at the same time by different equipment. That said, let us assume that there are processors available for each of the branches b, c, and d. First, each of them must receive a copy of the information making up node {A}, that is a complete copy of the database, a complete copy of the change stack, and the derived root item. While this may seem an impossible amount of material to transfer, it can all be sent at the same time if the data path is broad enough. There is of course no reason why this should not be the case, as it only means that the data path or what would be called the bus in a computer must be on about the same scale, the same order of complexity, as the storage medium. Let the processors execute the branches on their copies of the node {A} and generate "&#" for the daughter nodes. The results determine whether the search will continue through that node or not. If that node is not a good candidate for continued development, then its processor will be released to a common pool of unemployed processors. If, on the other hand it is worthy of development, the paths leading to its own daughter nodes will be allocated processors from the pool, if they are available. If not enough are available, the available processor or processors will work through the paths in sequence as required. This approach is standard practice different only in scale and not in kind from the facilities available on most large mainframe and supermini computers. It will be noted that we use an underlying mechanism which is very simple of itself, in that there is no attempt to predict which branches are worthy of development. In conclusion, it should be remarked that the suggested model would (_start_p.58_) occupy a position in two different series of models. While it intentionally ignores many distinctions to be found in natural language, the result is a high degree of simplicity and speed of operation. There are, of course, many kinds of simplicity, some of which conflict with others, but we have chosen the kinds we think appropriate at this stage of language acquisition. One series of models then represents different stages of acquisition, terminating with the full adult competence. Our larger speculation is that, starting with a models such as that outlined here, subsequent ones can be fitted with additional features without there ever being a need for a radical re-design. The other series of models, starting from our outline, represents improved attempts to simulate a given level of linguistic competence. We have suggested that a great deal can be done by improving the algorithms. However, the important thing is to work towards an actual computer model whose input and output can be compared with that of the child. Then, even if the results do not tally, we would be in a position to work towards a radically improved model. References Goodman, Nelson. (1965). Fact, Fiction, and Forecast, 2nd Ed. New York: Bobbs Merrill. William Todd, George Thompson, and Andrew Todd. (1984). A Logic for Computers. Cincinnati: Ehling Clifton Books.