Andrew D. Todd A Progress Report on the Early English Ecological Population System Simulation Model Description The Early English Ecological Population System Simulation Model is a computer simulation of the population growth and distribution of England in the sixteenth to eighteenth centuries. It is comprehensively feedback-driven, with extensively modeled economic and ecological factors incorporated. At present state of progress, after the production of about 1200 lines of debugged code, the population section is substantially complete, but the economic and ecological sections are in an early state of development. Prior Art There are two significant precedents in modeling the early English population: Wrigley and Schofield's Population History of England, and David Levine's Reproducing Families (1987). Wrigley and Schofield offer exhaustive computation, and Levine offers sophistication in incorporating relevant factors. What is required of a new model is that it synthesize these two approaches, and ultimately, that it become a kind of attractive magnet, accreting to itself practically all credible suggestions which may be raised about the demographics, economics, and ecology of the period. Population modeling as such is not central to Wrigley and Schofield's Population History of England (1981). It is a kind of coda or envoi to a work of factual reconstruction. Wrigley and Schofield had presided over an extensive program of collating parish records of births, deaths, and marriages. The first portion of their books consists primarily in collating these records, and correcting the totals for various systematic omissions (eg. the presence of dissenters not recognized by the church). Their most notable theoretical contribution was the "back projection." They started with a reliable census for the year 1871, containing a detailed age distribution, and then set out to construct a comparable population table for a date five years previous. Using mortality tables, they prorated the known number of deaths over different age groups. By repeating the process, they were able to construct a series of "pseudo-census" totals running back to the earliest date for which reliable parish records were available, in the middle of the sixteenth century. The back projection process generated a check measurement, in that if someone died in year X at the age of Y years, he must have been born in the year (X-Y). W&S were therefore able to incorporate a feedback loop in their back projection mechanism, whereby the back projection process "fitted" its reconstructions to the available data(Wrigley & Schofield, pp. 715-38). The back projection is a practical example of Wrigley and Schofield's basic modeling method. In their conclusion, the authors throw out a series of proposals for dynamic models, shown as flow charts (ibid, 457-480). There is no indication that these models were ever actually implemented or reduced to practice, and the diagrams reproduced have the characteristic vagueness of ideas whose creator has not yet worked out a specific data representation. There is an old programmer's maxim to the effect that "show me your data structures, and I can reconstruct your code, but not the other way around." That said, these models are probably to be understood as hopeful intentions. They characteristically relate population to grain prices, urbanization, and manufacturing, apparently as simple indexes rather than anything more detailed. By contrast with Wrigley and Schofield, David Levine, in his Reproducing Families (1987), offers a highly worked out "thought experiment" approach to population modeling. This thought experiment uses approximate calculations counterbalanced by a sophisticated understanding of what to calculate. Thus, the calculations tend to throw off powerful theses as output. There are no exact calculations to speak of, but there is an exemplary level of consideration for all the possible factors which might affect population. For example, Levine works out approximately how many children a typical family might have been expected to have, given a typical marriage age, makes allowances for child mortality, and the average number of years for a generation, and obtains from this a back-of-the-envelope figure for annual population increase (Levine, 1987, pp.79-80). Thus far, his modeling is crude by Wrigley and Schofield's standards, but Levine has one advantage. It is only a thought experiment, and Levine has not entangled himself in the institutional baggage of "big science." He can treat his calculations with a cavalier attitude as the products of a few hour's of ""number-play," rather than a expensively sacrosanct grant-funded computer printout. Thus Levine cheerfully throws out other sets of calculations, with casual abandon. These sets of calculations are linked together to see what they might mean. Levine provides extended discussions of emigration (Levine, 1987, pp.82-86), child labor (ibid, pp. 111-14,119-20, 137-39), and urban mortality (ibid, pp. 80-82). He makes a distinction between proletarian and peasant demographic models (ibid, p. 90), and shows that the assumptions of the latter lead to a much earlier marriage, and a consequently greater rate of reproduction. Levine's "back-of-the-envelope" calculations present a credible explanation for the population explosion in the eighteenth century. The demographic history of England is in large measure the product of the nonprogramming historian. Few historians have the kind of casual numeracy of, say, a working engineer. The exceptions are likely to be renegade technical people. Exact modeling has tended to involve recruiting professional programmers, statisticians, etc., and obtaining grants to pay for them-- because the kinds of calculations, which historians want done, may not be regarded as sufficiently novel to be publishable work within the mathematics and computer science communities. Historical programming is presently caught in a hopeless dilemma between the expectations of historians and the imperatives of programming itself. As the editors of History and Computing remarked in a policy statement, "Even more unlikely to find a place will be papers that require the reproduction of pages of program listings..." ("Editorial: 'taking stock,'" vol. 7, no. 2, 1995, pp. iii). Yet, it is accepted that authors will reproduce pages upon pages of computer-generated tables. This amounts to an insolent-servile attitude to the computer. Computer outputs are either considered above criticism, or they are worthless gibberish. This attitude is imminical to the development of a tradition of close criticism. A prose description of an algorithm is a translation, bearing the usual burden of problems of translation. Models can only advance effectually when someone else builds a new model out of my old subroutines, replacing a few whose assumptions he has found dubious. On the basis of published source code, authors can cite previous art in an orderly way. They can say that subroutine "foobar" is the subroutine listed in such and such a range of lines of such and such a figure of such and such a publication, and a hundred lines of new code may well represent a significant advance in modeling technique. If program listings are not part of the published record, it becomes practically impossible for a later author to describe his point of departure in a precise way. Such program listings as exist may well become inaccessible. The end result is that anyone wanting to extend or critique a work of historical computing must first replicate it. This replication is in practice an insuperable obstacle to the carrying forward of computerized historical simulations. People can find new bodies of texts to tabulate, but no one will ever extend upon them. Programming is brutally heavy work. The computer does not tolerate any cognitive dissonance in programs, and programs necessarily have to be written to an extreme degree of precision and exactitude. One of the most infuriating errors I made and eventually corrected, in the course of writing the program described below, involved an identifier which I had inadvertently written with inconsistent capitalization, eg. something like "Foobar_List_Total" instead of "Foobar_list_Total." This error was especially infuriating because the two variants are so much alike that the difference is not readily perceptible to the human eye. For a long time, I simply could not see what the compiler was objecting to. I am not an amateur-- I am a skilled programmer of many years standing, and this sort of difficulty is simply the small change of programming. Despite the best efforts of editors, even the output of the most prestigious publishing houses is full of typographical errors of this kind. In prose, they simply do not matter, but in computer code, they are critical. Any kind of historical computer programing can only prosper if arrangements are made to systematically conserve and reuse the results of this infuriating labor. The Population History of England was produced by four authors of record, two programmers (one of them an author of record), nine keypunchers, one supervisor of keypunchers, and innumerable local historians collecting primary source material (Wrigley & Schofield, 1981, pp. iii, xii-xv). One of its appendices contains an ambiguous prose description of the "back projection" technique, written by one of the programmers. From a technical standpoint, the Population History exhibited grave flaws. For example, the mortality table (Wrigley & Schofield, tab. A-14.5, p. 714) was ambiguously documented, and I was obliged to subject it to spreadsheet analysis for internal consistency before I could use it with any confidence. The English language is ambiguous by its nature, and it was inevitable, given the terms of reference, that the Population History should possess a certain degree of "unreplicability." The Population History represents an evolutionary dead end in the sense that it would only be possible to carry it forward by assembling ever-larger teams, funded by ever-increasing government grants. There is no real prospect of such grants being forthcoming. The immediate object in building a population model is to combine Levine's sophistication about sources of population change, and details of economic livelihood with Wrigley and Schofield's solid quantitative simulation. To this end, I have built a "population table model" in the methodological tradition of Wrigley and Schofield, which I propose to enrich by adding decision-making-algorithms to determine such questions as who marries who. The population table approach to simulation is a "middle ground approach," situated between the simply approach of treating a population as a single entity, and the more perfect approach of trying to represent a population in full detail. Ultimately, the most powerful model is a "full population monte carlo" model, in which each individual and each family would be separately represented, including such details as the height and weight of each individual, and the detailed contents of each family larder. Simulating aggregates involves one in an inevitable unclarity. People are born, and die, and marry, as individuals, and as individuals, their actions can be directly represented. Similarly, they do not belong to monolithic classes, but have linkages to other people, which may or may not approximate monolithic social groups. The classic European definition of nobility is defined by "quarterings," that is, the showing of the requisite number of noble ancestors. For England, the situation is more complicated, but any reasonable definition of class must take account of genealogy. A simulation of individuals can be easily adapted to do any desired genealogical reckoning. An example of this kind of mechanism can be seen in the anthropologist James Boster's GENIAL program (Boster, 1986). The Full Population Monte Carlo approach is still computationally formidable, however. It would involve keeping detailed records on tens or hundreds of millions of fictitious individuals even for a small country like early modern England. The necessary bulk of data might amount to tens of gigabytes. Eventually, however, this brute-force approach will become promising, and will have the curious advantage of offering a more accurate simulation with a simpler program. Programming philosophy The model is designed for readability and maintainability rather than for efficiency of computer use. I have attempted to make all modules as small as possible. Access to the major demographic tables is via sets of functions which constitute a de-facto object-oriented method. I have defined two recurrent operations, that of adding to (or subtracting from) a table entry, and that of reducing it by a specified fraction. There are also functions to get and set a table entry, and these are under no circumstances to be bypassed. The reason for this approach is that the C programming language, despite its many virtues, supports only rudimentary arrays. One must "roll one's own." Sophisticated programmers will notice that in the marriage section I have used what amounts to a kind of "bubble sort" algorithm. Likewise, I propose, in a section not yet implemented, to initialize the model by simply allowing it to run "out of gear" for a long period of simulated time, perhaps. Specifically, I use an "Adam and eve" start. That is, I initialize the population at a very low level, consistent with hunting and gathering rather than agriculture. For England, this might be a more or less iron-age population of about 100,000 persons. However, the model still assumes agriculture. Since the population does not adapt to the abundance of land and natural resources by changing its mode of subsistence, it exists in a framework of abundance. The population "is fruitful and multiplies." I let the model run until it achieves the starting population and then reset the date to the starting date. This method gives me the freedom to make all kinds of wild and arbitrary assumptions about initial population distribution, in the confidence that such assumptions will be progressively replaced by generated information before the model starts running "for effect." These methods might not of course have been justified at the time when the Cambridge Study group were doing their original work, but the advance of computers, and the drastic fall in their prices over the intervening years makes such a brute-force solution attractive. The model is written in the C programming language, specifically using the inexpensive and readily available MIX Power C compiler. Except for the two or three functions providing the most immediate access to the demographic tables, usage is well within the scope of the ANSI C standard, and the model should therefore be highly portable. Population The population section of the model consists of two interrelated tables, a table of population, and a table of marriages. The population table is classified by age, sex, marital status, and sub-population. The marriage table is classified by age of wife, age of husband, and sub-population. Each year, the tables are shifted downwards to reflect a year's aging, and then the births, deaths, widowhood, and marriages are calculated. These are done in a number of "slices," to approximate the reality that all these events take place simultaneously. Each slice involves computing births, deaths, widowhoods, and marriages in that order. Computing births is one of the simplest sections of the model. We need only determine the number of women of various ages and statuses, their innate fertility, and their level of sexual participation. In the seventeenth century, with both bona fide promiscuity and birth control of any sort rare, this last item amounts to little more than counting the number of married women. In practice, innate fertility can be treated as more or less determined by age. Wrigley and Schofield's table 7.25 (p. 254)indicates that the rates for different periods differed much less than the rates for different maternal ages. Fertility is of course theoretically dependent on nutritional status, but the evidence seems to be that this dependence did not operate much in the sixteenth and seventeenth centuries. Most probably, the restrictions on marriage superseded the effects of nutrition on fertility. That is, to marry, one had to produce property and tenancy rights amounting to an assured food supply. Computing deaths involves the use of a mortality table, and is again a fairly simple matter. The table provided by Wrigley & Schofield (tab. A-14.5, p. 714) provides specific mortality figures for different age groups and overall life expectancies as functions of a general mortality parameter. I convert overall mortality into life expectancy, and interpolate between the appropriate columns of their table. My figure for overall mortality is arrived at by adding a basal mortality, approximating the modern figure; a measure of the nutritional deficit, converted into mortality; and an additional component, reflecting the mortality due to crowding. The nutritional deficit is derived by comparing the food supply with the recommendations in a standard medical reference work (Merck). Three components are used: calories, protein, and vitamin C, chosen as the minimum number of proxy components to represent different qualities of diet. It is assumed that access to vitamin A, etc., will vary in approximate proportion to vitamin C, and that protein availability is a workable measure of access to iron, calcium, etc. In case of calorific insufficiency, the body converts protein to energy, and this is reflected in the calculations. The three components are reduced to a single percentage, and this is converted into mortality by using a guess of how long it takes to starve to death. The figures for crowding mortality are likewise guesses pure and simple. Widowhood is somewhat more complicated. For each group of married persons who die, we use the marriages table to get a distribution of the ages of spouses of similar persons, and allocate widowhoods in proportion to the number of spouses of various ages. Next, marriages are performed up to a specified quota. There is a marital desirability function. Each individual has a marital desirability, based on his or her age, sex, marital status (single or widowed) and of course the sub-population to which he or she belongs. The potential parties are married off within each sub-population, starting with the most desirable, until the available quota of marriages is exhausted. In this initial version of the model, I have used only three sub-population (viz. gentry, peasants, and urban commoners), but the model could easily be modified to support substantially more sub-populations. Economy I propose that the economic model will be a simple "input output table," (see appendix) with appropriate goal seeking algorithms. These algorithms will make decisions about the allocation of resources, based on average conditions and requirements. Once this is done, and the various inputs allocated, I will repeat the calculations, this time with random variables, to allow for the uncertainties of weather, etc. The input-output model will require a considerable number of coefficients. I have undertaken a preliminary search for these in the secondary literature, and while I do not, as yet, have anything like a complete set, the results have been promising enough that I have no real concerns about ultimately locating all necessary figures. Coefficient collection has been running comfortably in advance of model-building. One major prerequisite for simulating the agricultural economy is a set of data for crop yields in terms of the various inputs, viz. seeds, land, and labor. Campbell (1983) provides a set of crop yield per seed and per acre for Norfolk at the end of the thirteenth century and the early fifteenth century. Turner (1986) offers a collection of crop yield per acre data, circa 1800, covering a range of enclosure conditions and different regions for three crops-- wheat, oats, and barley. Clark (1991), following the methods of Mark Overton, has a table of labor costs for harvesting, as well as grain yields per acre reconstructed from probate inventors, and does so for a quite long period: ca. 1500-1900. Taken as a whole, reasonable "guesstimates" are available for seed yield, acre yield, and labor yield, at least for the major grain crops. It should be possible to interpolate these together into yield functions. Thus, we can obtain labor supply from the population table, as well as seed and land available from the economic side of the simulation, and use them to obtain a quantity of grain produced. Coda Historical programming does not attract the best programmers, but for all that, it is one of the most difficult and complex of programming fields. Each word in historical prose discussion can easily translate into fifty lines of code. I set out to model the English population, and after about twice as much work as I had foreseen (and budgeted for), I found that I had merely constructed a set of sub-foundations for the work of constructing a model. Bibliography: "GENIAL," 9/8/86, Prof. James Boster (Dept. of Anthropology, University of Pittsburgh, Pittsburgh, PA 15260), bundled with A.D.A. PROLOG VERSION 1.91P (PC-SIG Disk No. 417 v4, PC-SIG, 1030D E Duane Avenue, Sunnyvale Ca. 94086, 1987) Bruce M. S. Campbell, "Arable Productivity in Medieval England," The Journal of Economic History, Vol. XLIII, June 1983, No. 2, pp. 379-404 Gregory Clark, "Yields per Acre in English Agriculture, 1250- 1860: Evidence from Labour Inputs," Economic History Review, XLIV, 3(1991), pp. 445-460 Michael Turner, English Open Fields and Enclosures, Retardation or Productivity Improvements, Journal of Economic History, Sept 1986, 46(3), pp. 669-92 David Levine, Reproducing Families: the Political Economy of English Population History, Cambridge, Cambridge University Press, 1987 The Merck Manual of Diagnosis and Therapy, 14th Edition, 1982, ed. Robert Berkow, M. D., et. al., Merck Sharp & Dohme Research Laboratories (Merck & Co., Inc.), Rahway, New Jersey. Especially Table 76-2, "Recommended Daily Dietary Allowances," pp. 876-77; Fig. 190-2, "Estimated Caloric Expenditure under Basal Conditions," p. 1838. E. A. Wrigley and R. S. Schofield, with contributions by Ronald Lee and Jim Oeppen, The Population History of England, 1541-1871, Harvard University Press, Cambridge, Mass., 1981 Appendix-- First Construct for Economic Model: I use the following notational conventions: a + b -> c + d is borrowed from chemical reaction notation. It means that a and b are used to make c and d. a = { b | c | d } indicates that a is divided up among b, c, and d. ------------------------------------------------------------- 1. First the indigenous factors of production are portioned out Land = {Arable_Land|Pasture_Land|Waste_Land} Arable_Land = { Grain_Arable_Land | Potato_Arable_Land } Pasture_Land = { Cattle_Pasture_Land | Sheep_Pasture_Land | Extracted_Surplus_Pasture_Land } Labor = { Farm_Labor | Fishing_Labor | Craft_Labor } Farm_Labor = { Potato_Farm_Labor | Grain_Farm_Labor } 2. Then the agricultural production is computed Potato_Farm_Labor + Potato_Arable_Land -> Potatoes_Food Grain_Farm_Labor + Grain_Arable_Land -> Grain_Food_Produced Grain_Food_Produced + Grain_Food_Imported -> Grain_Food Grain_Food = { Peasant_Grain_Food | Sheep_Grain_Food | Cattle_Grain_Food | Extracted_Surplus_Grain_Food } Sheep_Pasture_Land + Sheep_Grain_Food -> Mutton_Food + Wool_Material Cattle_Pasture_Land + Cattle_Grain_Food -> Beef_Food + Hides_Material Fishing_Labor + Shipping_Capital -> Fish_Food Mutton_Food + Beef_Food + Fish_Food -> Meat_Food Meat_Food = { Peasant_Meat_Food | Extracted_Surplus_Meat_Food } Peasant_Meat_Food + Peasant_Grain_Food + Potato_Food -> Peasant_Food 3. Then non-agricultural production is computed. Imported_Materials + Wool_Material + Hides_Material + Craft_Labor -> Manufactures Manufactures = { Capital_Goods | Peasant_Necessaries | Extracted_Surplus_Goods | Export_Goods } Export_Goods -> Foreign_Credits Foreign_Credits = { Purchase_Grain_Imports | Purchase_Material_Imports | Purchase_Peasant_Necessaries | Purchase_Luxuries | }