Andrew D. Todd

                  A Progress Report on the 

   Early English Ecological Population System Simulation Model

                         Description

The  Early English Ecological Population System Simulation  Model 
is   a   computer  simulation  of  the  population   growth   and 
distribution of England in the sixteenth to eighteenth centuries. 
It  is comprehensively feedback-driven, with extensively  modeled 
economic and ecological factors incorporated. At present state of 
progress,  after the production of about 1200 lines  of  debugged 
code,  the population section is substantially complete, but  the 
economic  and  ecological  sections  are in  an  early  state  of 
development. 


                         Prior Art

There  are  two  significant precedents  in  modeling  the  early 
English population: Wrigley and Schofield's Population History of 
England, and David Levine's Reproducing Families (1987).  Wrigley 
and  Schofield  offer exhaustive computation, and  Levine  offers 
sophistication   in  incorporating  relevant  factors.  What   is 
required  of  a  new  model  is  that  it  synthesize  these  two 
approaches,  and ultimately, that it become a kind of  attractive 
magnet, accreting to itself practically all credible  suggestions 
which  may  be  raised about  the  demographics,  economics,  and 
ecology of the period. 
   Population  modeling  as such is not central  to  Wrigley  and 
Schofield's Population History of England (1981). It is a kind of 
coda or envoi to a work of factual reconstruction. 
   Wrigley and Schofield had presided over an  extensive  program 
of collating parish records of births, deaths, and marriages. The 
first  portion  of their books consists  primarily  in  collating 
these  records, and correcting the totals for various  systematic 
omissions  (eg. the presence of dissenters not recognized by  the 
church). 
   Their  most  notable theoretical contribution  was  the  "back 
projection."  They  started with a reliable census for  the  year 
1871, containing a detailed age distribution, and then set out to 
construct  a  comparable population table for a date  five  years 
previous. Using  mortality tables, they prorated the known number 
of  deaths over different age groups. By repeating  the  process, 
they  were able to construct a series of  "pseudo-census"  totals 
running  back  to  the earliest date for  which  reliable  parish 
records  were available, in the middle of the sixteenth  century. 
The  back  projection process generated a check  measurement,  in 
that  if  someone died in year X at the age of Y years,  he  must 
have  been  born in the year (X-Y). W&S were  therefore  able  to 
incorporate  a feedback loop in their back projection  mechanism, 
whereby the back projection process "fitted" its  reconstructions 
to the available data(Wrigley & Schofield, pp. 715-38).
   The  back  projection is a practical example  of  Wrigley  and 
Schofield's  basic  modeling  method. In  their  conclusion,  the 
authors throw out a series of proposals for dynamic models, shown 
as flow charts (ibid, 457-480). There is no indication that these 
models were ever actually implemented or reduced to practice, and 
the  diagrams  reproduced have the  characteristic  vagueness  of 
ideas  whose  creator  has not yet worked  out  a  specific  data 
representation. There is an old programmer's maxim to the  effect 
that  "show me your data structures, and I can  reconstruct  your 
code, but not the other way around." That said, these models  are 
probably   to   be  understood  as   hopeful   intentions.   They 

characteristically    relate   population   to   grain    prices, 
urbanization,  and  manufacturing, apparently as  simple  indexes 
rather than anything more detailed. 
   By  contrast with Wrigley and Schofield, David Levine, in  his 
Reproducing Families (1987), offers a highly worked out  "thought 
experiment"   approach  to  population  modeling.  This   thought 
experiment  uses  approximate calculations counterbalanced  by  a 
sophisticated  understanding  of  what to  calculate.  Thus,  the 
calculations tend to throw off powerful theses as output. 
   There  are no exact calculations to speak of, but there is  an 
exemplary  level  of consideration for all the  possible  factors 
which  might  affect population. For example,  Levine  works  out 
approximately how many children a typical family might have  been 
expected to have, given a typical marriage age, makes  allowances 
for  child  mortality,  and the average number  of  years  for  a 
generation,  and obtains from this a back-of-the-envelope  figure 
for  annual  population increase (Levine, 1987,  pp.79-80).  Thus 
far, his modeling is crude by Wrigley and Schofield's  standards, 
but  Levine has one advantage. It is only a  thought  experiment, 
and Levine has not entangled himself in the institutional baggage 
of  "big science." He can treat his calculations with a  cavalier 
attitude  as  the  products of a few  hour's  of  ""number-play," 
rather  than  a  expensively  sacrosanct  grant-funded   computer 
printout.  Thus  Levine  cheerfully  throws  out  other  sets  of 
calculations,  with casual abandon.  These sets  of  calculations 
are linked together to see what they might mean. 
  Levine   provides extended discussions of  emigration  (Levine, 
1987,  pp.82-86), child labor (ibid, pp. 111-14,119-20,  137-39), 
and  urban  mortality (ibid, pp. 80-82). He makes  a  distinction 
between proletarian and peasant demographic models (ibid, p. 90), 
and  shows  that  the assumptions of the latter lead  to  a  much 
earlier   marriage,   and   a  consequently   greater   rate   of 
reproduction.   Levine's   "back-of-the-envelope"    calculations 
present  a credible explanation for the population  explosion  in 
the eighteenth century. 
   The  demographic  history of England is in large  measure  the 
product of the nonprogramming historian. Few historians have  the 
kind  of  casual  numeracy  of,  say,  a  working  engineer.  The 
exceptions  are  likely to be renegade  technical  people.  Exact 
modeling   has   tended  to   involve   recruiting   professional 
programmers, statisticians, etc., and obtaining grants to pay for 
them--  because the kinds of calculations, which historians  want 
done,    may  not  be  regarded  as  sufficiently  novel  to   be 
publishable  work  within the mathematics  and  computer  science 
communities. 
   Historical  programming  is  presently caught  in  a  hopeless 
dilemma   between   the  expectations  of  historians   and   the 
imperatives of programming itself. As the editors of History  and 
Computing remarked in a policy statement, "Even more unlikely  to 
find  a  place will be papers that require  the  reproduction  of 
pages of program listings..." ("Editorial: 'taking stock,'"  vol. 
7,  no. 2, 1995, pp. iii). Yet, it is accepted that authors  will 
reproduce  pages  upon pages of computer-generated  tables.  This 
amounts to an insolent-servile attitude to the computer. Computer 
outputs  are  either  considered above  criticism,  or  they  are 
worthless gibberish. 
   This  attitude is imminical to the development of a  tradition 
of  close  criticism. A prose description of an  algorithm  is  a 
translation, bearing the usual burden of problems of translation. 
Models  can only advance effectually when someone else  builds  a 
new  model  out  of my old subroutines,  replacing  a  few  whose 
assumptions  he  has  found dubious. On the  basis  of  published 
source  code,  authors can cite previous art in an  orderly  way. 
They can say that subroutine "foobar" is the subroutine listed in 
such and such a range of lines of such and such a figure of  such 
and such a publication, and a hundred lines of new code may  well 
represent a significant advance in modeling technique. If program 
listings  are  not  part  of the  published  record,  it  becomes 
practically  impossible for a later author to describe his  point 
of departure in a precise way. Such program listings as exist may 
well  become inaccessible. The end result is that anyone  wanting 
to  extend or critique a work of historical computing must  first 
replicate  it.  This replication is in  practice  an  insuperable 
obstacle  to  the  carrying forward  of  computerized  historical 
simulations. People can find new bodies of texts to tabulate, but 
no one will ever extend upon them. 
   Programming  is   brutally heavy work. The computer  does  not 
tolerate  any  cognitive  dissonance in  programs,  and  programs 
necessarily have to be written to an extreme degree of  precision 
and  exactitude.  One of the most infuriating errors I  made  and 
eventually  corrected,  in  the course  of  writing  the  program 
described below, involved an identifier which I had inadvertently 
written  with  inconsistent capitalization,  eg.  something  like 
"Foobar_List_Total"  instead of "Foobar_list_Total."  This  error 
was  especially infuriating because the two variants are so  much 
alike that the difference is not readily perceptible to the human 
eye.  For a long time, I simply could not see what  the  compiler 
was objecting to. I am not an amateur-- I am a skilled programmer 
of many years standing, and this sort of difficulty is simply the 
small change of programming. Despite the best efforts of editors, 
even the output of the most prestigious publishing houses is full 
of  typographical errors of this kind. In prose, they  simply  do 
not matter, but in computer code, they are critical. Any kind  of 
historical  computer programing can only prosper if  arrangements 
are made to systematically conserve and reuse the results of this 
infuriating labor. 
   The Population History of England was produced by four authors 
of  record,  two programmers (one of them an author  of  record), 
nine keypunchers, one supervisor of keypunchers, and  innumerable 
local  historians collecting primary source material  (Wrigley  & 
Schofield, 1981, pp. iii, xii-xv). One of its appendices contains 
an   ambiguous  prose  description  of  the   "back   projection" 
technique,  written by one of the programmers. From  a  technical 
standpoint,  the  Population History exhibited grave  flaws.  For 
example,  the mortality table (Wrigley & Schofield, tab.  A-14.5, 
p. 714) was ambiguously documented, and I was obliged to  subject 
it  to  spreadsheet analysis for internal  consistency  before  I 
could  use  it  with  any confidence.  The  English  language  is 
ambiguous  by its nature, and it was inevitable, given the  terms 
of  reference,  that  the Population  History  should  possess  a 
certain  degree  of  "unreplicability."  The  Population  History 
represents  an evolutionary dead end in the sense that  it  would 
only  be possible to carry it forward by  assembling  ever-larger 
teams,  funded by ever-increasing government grants. There is  no 
real  prospect of such grants being forthcoming. 
   The  immediate  object in building a population  model  is  to 
combine  Levine's  sophistication  about  sources  of  population 
change,  and  details  of economic livelihood  with  Wrigley  and 
Schofield's  solid quantitative simulation. To this end,  I  have 
built a "population table model" in the methodological  tradition 
of  Wrigley  and Schofield, which I propose to enrich  by  adding 
decision-making-algorithms  to  determine such questions  as  who 
marries who. 
   The  population  table  approach to simulation  is  a  "middle 
ground  approach,"  situated  between  the  simply  approach   of 
treating  a population as a single entity, and the  more  perfect 
approach  of  trying to represent a population  in  full  detail. 
Ultimately,  the most powerful model is a "full population  monte 
carlo"  model, in which each individual and each family would  be 
separately represented, including such details as the height  and 
weight  of  each individual, and the detailed  contents  of  each 
family larder. 
   Simulating aggregates involves one in an inevitable unclarity. 
People  are  born,  and die, and marry, as  individuals,  and  as 
individuals,   their   actions  can  be   directly   represented. 
Similarly,  they  do not belong to monolithic classes,  but  have 
linkages  to  other  people, which may  or  may  not  approximate 
monolithic  social  groups. The classic  European  definition  of 
nobility is defined by "quarterings," that is, the showing of the 
requisite  number of noble ancestors. For England, the  situation 
is more complicated, but any reasonable definition of class  must 
take  account  of genealogy. A simulation of individuals  can  be 
easily  adapted  to  do any desired  genealogical  reckoning.  An 
example   of  this  kind  of  mechanism  can  be  seen   in   the 
anthropologist James Boster's GENIAL program (Boster, 1986). 
   The   Full   Population   Monte  Carlo   approach   is   still 
computationally  formidable,  however. It would  involve  keeping 
detailed  records on tens or hundreds of millions  of  fictitious 
individuals  even for a small country like early modern  England. 
The  necessary  bulk of data might amount to tens  of  gigabytes. 
Eventually,  however,  this  brute-force  approach  will   become 
promising, and will have the curious advantage of offering a more 
accurate simulation with a simpler program. 


                    Programming philosophy

The model  is designed for readability and maintainability rather 
than for efficiency of computer use. I have attempted to make all 
modules as small as possible. 
   Access  to  the  major  demographic  tables  is  via  sets  of 
functions  which constitute a de-facto object-oriented method.  I 
have  defined  two recurrent operations, that of  adding  to  (or 
subtracting  from)  a table entry, and that of reducing it  by  a 
specified  fraction.  There are also functions to get and  set  a 
table entry, and these are under no circumstances to be bypassed. 
The reason for this approach is that the C programming  language, 
despite  its many virtues, supports only rudimentary arrays.  One 
must "roll one's own." 
   Sophisticated  programmers  will notice that in  the  marriage 
section  I  have  used what amounts to a kind  of  "bubble  sort" 
algorithm. 
   Likewise,  I  propose, in a section not  yet  implemented,  to 
initialize  the model by simply allowing it to run "out of  gear" 
for a long period of simulated time, perhaps. Specifically, I use 
an "Adam and eve" start. That is, I initialize the population  at 
a  very low level, consistent with hunting and  gathering  rather 
than  agriculture.  For  England, this might be a  more  or  less 
iron-age population of about 100,000 persons. However, the  model 
still assumes agriculture. Since the population does not adapt to 
the abundance of land and natural resources by changing its  mode 
of  subsistence,  it  exists in a  framework  of  abundance.  The 
population  "is  fruitful and multiplies." I let  the  model  run 
until it achieves the starting population and then reset the date 
to  the starting date. This method gives me the freedom  to  make 
all  kinds  of  wild  and  arbitrary  assumptions  about  initial 
population distribution, in the confidence that such  assumptions 
will  be progressively replaced by generated  information  before 
the model starts running "for effect."
   These  methods might not of course have been justified at  the 
time  when  the Cambridge Study group were doing  their  original 
work, but the advance of computers, and the drastic fall in their 
prices  over  the  intervening years  makes  such  a  brute-force 
solution attractive. 
   The   model  is  written  in  the  C   programming   language, 
specifically  using  the inexpensive and  readily  available  MIX 
Power C compiler. Except for the two or three functions providing 
the  most  immediate access to the demographic tables,  usage  is 
well  within  the scope of the ANSI C standard,  and   the  model 
should therefore be highly portable. 


                           Population

The population section of the model consists of two  interrelated 
tables,  a  table of population, and a table  of  marriages.  The 
population  table is classified by age, sex, marital status,  and 
sub-population. The marriage table is classified by age of  wife, 
age  of  husband, and sub-population. Each year, the  tables  are 
shifted downwards to reflect a year's aging, and then the births, 
deaths, widowhood, and marriages are calculated. 
   These  are  done in a number of "slices," to  approximate  the 
reality  that  all these events take place  simultaneously.  Each 
slice   involves  computing  births,  deaths,   widowhoods,   and 
marriages in that order. 
   Computing births is one of the simplest sections of the model. 
We   need only determine the number of women of various ages  and 
statuses,  their  innate  fertility, and their  level  of  sexual 
participation.  In the seventeenth century, with both  bona  fide 
promiscuity  and birth control of any sort rare, this  last  item 
amounts to little more than counting the number of married women. 
In  practice,  innate fertility can be treated as  more  or  less 
determined  by  age.  Wrigley  and  Schofield's  table  7.25  (p. 
254)indicates that the rates for different periods differed  much 
less than the rates for different maternal ages. Fertility is  of 
course  theoretically  dependent on nutritional status,  but  the 
evidence seems to be that this dependence did not operate much in 
the  sixteenth  and  seventeenth centuries.  Most  probably,  the 
restrictions  on marriage superseded the effects of nutrition  on 
fertility.  That  is, to marry, one had to produce  property  and 
tenancy rights amounting to an assured food supply. 
   Computing deaths involves the use of a mortality table, and is 
again  a  fairly simple matter. The table provided by  Wrigley  & 
Schofield  (tab.  A-14.5,  p. 714)  provides  specific  mortality 
figures for different age groups and overall life expectancies as 
functions  of  a general mortality parameter. I  convert  overall 
mortality  into   life expectancy, and  interpolate  between  the 
appropriate columns of their table. 
   My  figure  for overall mortality is arrived at  by  adding  a 
basal  mortality, approximating the modern figure; a  measure  of 
the  nutritional  deficit,  converted  into  mortality;  and   an 
additional  component, reflecting the mortality due to  crowding. 
The  nutritional deficit is derived by comparing the food  supply 
with  the  recommendations in a standard medical  reference  work 
(Merck).  Three  components  are  used:  calories,  protein,  and 
vitamin  C, chosen as the minimum number of proxy  components  to  
represent different qualities of diet. It is assumed that  access 
to  vitamin  A,  etc., will vary  in  approximate  proportion  to 
vitamin C, and that protein availability is a workable measure of 
access to iron, calcium, etc. In case of calorific insufficiency, 
the body converts protein to energy, and this is reflected in the 
calculations.  The  three  components are  reduced  to  a  single 
percentage, and this is converted into mortality by using a guess 
of how long it takes to starve to death. The figures for crowding 
mortality are likewise guesses pure and simple. 
   Widowhood  is  somewhat more complicated. For  each  group  of 
married  persons  who die, we use the marriages table  to  get  a 
distribution  of  the  ages of spouses of  similar  persons,  and 
allocate  widowhoods  in proportion to the number of  spouses  of 
various ages. 
   Next,  marriages are performed up to a specified quota.  There 
is a marital desirability function. Each individual has a marital 
desirability,   based  on  his or her age,  sex,  marital  status 
(single or widowed) and of course the sub-population to which  he 
or  she  belongs.  The potential parties are married  off  within 
each sub-population, starting with the most desirable, until  the 
available quota of marriages is exhausted.
   In  this initial version of the model, I have used only  three 
sub-population (viz. gentry, peasants, and urban commoners),  but 
the model could easily be modified to support substantially  more 
sub-populations. 


                          Economy

I propose that the economic model will be a simple "input  output 
table," (see appendix) with appropriate goal seeking  algorithms. 
These  algorithms  will make decisions about  the  allocation  of 
resources,  based  on average conditions and  requirements.  Once 
this is done, and the various inputs allocated, I will repeat the 
calculations,  this time with random variables, to allow for  the 
uncertainties of weather, etc. 
   The  input-output model will require a considerable number  of 
coefficients. I have undertaken a preliminary search for these in 
the  secondary  literature,  and while I do  not,  as  yet,  have 
anything  like  a complete set, the results have  been  promising 
enough that I have no real concerns about ultimately locating all 
necessary  figures.  Coefficient  collection  has  been   running 
comfortably in advance of model-building. 
   One major prerequisite for simulating the agricultural economy 
is a set of data for crop yields in terms of the various  inputs, 
viz.  seeds, land, and labor. Campbell (1983) provides a  set  of 
crop  yield per seed and per acre for Norfolk at the end  of  the 
thirteenth century and the early fifteenth century. Turner (1986) 
offers  a  collection of crop yield per acre  data,  circa  1800, 
covering  a range of enclosure conditions and  different  regions 
for  three  crops--  wheat,  oats,  and  barley.  Clark   (1991), 
following the methods of Mark Overton, has a table of labor costs 
for  harvesting, as well as grain yields per  acre  reconstructed 
from probate inventors, and does so for a quite long period:  ca. 
1500-1900.  Taken  as  a  whole,  reasonable  "guesstimates"  are 
available  for seed yield, acre yield, and labor yield, at  least 
for  the major grain crops. It should be possible to  interpolate 
these  together into yield functions. Thus, we can  obtain  labor 
supply  from  the  population table, as well  as  seed  and  land 
available from the economic side of the simulation, and use  them 
to obtain a quantity of grain produced. 


                                Coda

Historical programming does not attract the best programmers, but 
for  all  that, it is one of the most difficult  and  complex  of 
programming fields. Each word in historical prose discussion  can 
easily translate into fifty lines of code. I set out to model the 
English  population, and after about twice as much work as I  had 
foreseen   (and  budgeted  for),  I  found  that  I  had   merely 
constructed a set of sub-foundations for the work of constructing 
a model. 


Bibliography:

"GENIAL,"  9/8/86,  Prof. James Boster  (Dept.  of  Anthropology, 
University  of  Pittsburgh, Pittsburgh, PA 15260),  bundled  with 
A.D.A.  PROLOG  VERSION 1.91P (PC-SIG Disk No.  417  v4,  PC-SIG, 
1030D E Duane Avenue, Sunnyvale Ca. 94086, 1987)

Bruce M. S. Campbell, "Arable Productivity in Medieval  England," 
The  Journal of Economic History, Vol. XLIII,  June 1983, No.  2, 
pp. 379-404

Gregory  Clark,  "Yields per Acre in English  Agriculture,  1250-
1860:  Evidence  from Labour Inputs,"  Economic  History  Review, 
XLIV, 3(1991), pp. 445-460

Michael  Turner, English Open Fields and Enclosures,  Retardation 
or  Productivity Improvements, Journal of Economic History,  Sept 
1986, 46(3), pp. 669-92

David  Levine, Reproducing Families: the Political   Economy   of 
English  Population   History, Cambridge,  Cambridge   University  
Press, 1987

The  Merck Manual of Diagnosis and Therapy, 14th  Edition,  1982, 
ed.  Robert Berkow, M. D., et. al., Merck Sharp & Dohme  Research 
Laboratories (Merck & Co., Inc.), Rahway, New Jersey. 
  Especially Table 76-2, "Recommended Daily Dietary  Allowances," 
pp.  876-77;  Fig.  190-2, "Estimated Caloric  Expenditure  under 
Basal Conditions," p. 1838.

E.  A. Wrigley and R. S. Schofield, with contributions by  Ronald 
Lee and Jim Oeppen, The Population History of England, 1541-1871, 
Harvard University Press, Cambridge, Mass., 1981


Appendix-- First Construct for Economic Model:

I use the following notational conventions:  

a + b -> c + d

is borrowed from chemical reaction notation. It means that a  and 
b are used to make c and d.

a = { b | c | d } indicates that a is divided up among b, c,  and 
d.

-------------------------------------------------------------
1. First the indigenous factors of production are portioned out
  Land = {Arable_Land|Pasture_Land|Waste_Land}
  Arable_Land = { Grain_Arable_Land | Potato_Arable_Land }
  Pasture_Land = { Cattle_Pasture_Land | Sheep_Pasture_Land
                  | Extracted_Surplus_Pasture_Land }
  Labor = { Farm_Labor | Fishing_Labor | Craft_Labor }
  Farm_Labor = { Potato_Farm_Labor | Grain_Farm_Labor }
2. Then the agricultural production is computed
  Potato_Farm_Labor + Potato_Arable_Land -> Potatoes_Food
  Grain_Farm_Labor + Grain_Arable_Land -> Grain_Food_Produced
  Grain_Food_Produced + Grain_Food_Imported -> Grain_Food
  Grain_Food = { Peasant_Grain_Food | Sheep_Grain_Food
              | Cattle_Grain_Food | Extracted_Surplus_Grain_Food }
  Sheep_Pasture_Land + Sheep_Grain_Food -> Mutton_Food + Wool_Material
  Cattle_Pasture_Land + Cattle_Grain_Food -> Beef_Food + Hides_Material
  Fishing_Labor + Shipping_Capital -> Fish_Food
  Mutton_Food + Beef_Food + Fish_Food -> Meat_Food
  Meat_Food = { Peasant_Meat_Food | Extracted_Surplus_Meat_Food }
  Peasant_Meat_Food + Peasant_Grain_Food + Potato_Food -> Peasant_Food
3. Then non-agricultural production is computed. 
  Imported_Materials + Wool_Material + Hides_Material + Craft_Labor
                                     -> Manufactures
   Manufactures = { Capital_Goods | Peasant_Necessaries
                    | Extracted_Surplus_Goods | Export_Goods }
   Export_Goods -> Foreign_Credits
   Foreign_Credits = { Purchase_Grain_Imports | Purchase_Material_Imports
                      | Purchase_Peasant_Necessaries | Purchase_Luxuries
                      | }
