My Comments on:

Sharon Howard,

Digital history and the archives: loss or gain?

,



  http://hnn.us/blogs/comments/36392.html#comment (now) https://historynewsnetwork.org/blog/36392



HNN Cliopatria [pseudonym], Mar 11, 2007

Andrew D. Todd

 a_d_todd@rowboats-sd-ca.com 

http://rowboats-sd-ca.com/




(My Responses)
(03/11/2007 11:36 AM)
RE: http://hnn.us/blogs/comments/36392.html#comment

New Technology Relevant to Online Archives.

There is a device called a "Field Camera,"  an electronic camera which works on the same  principle as insect vision, and which records something approximating a hologram. That  means that you no longer have to worry about focus when you are taking pictures-- you can fix the focus after the fact, in an editing program, the same as exposure. This is important when one is taking a picture of an object, such as a bound book, which cannot easily be made to lie flat. A field camera offers good prospects of rapidly producing images of printed books good enough for optical character recognition (OCR) to work reliably. Field Cameras, by virtue of their design, can be  made very small and cheap, and yet have optical performance comparable to the finest conventional cameras. A ten-dollar field camera, only slightly bigger than a postage stamp, might have performance comparable to a specialized conventional camera weighing a thousand pounds and costing fifty thousand dollars. Think of the sorts of cameras which Google is building and using for the Google Print project, and imagine every little kid having one. With such a camera, it would be no great difficulty to copy a book in several minutes. At present, Field Cameras are still at  the level of research projects, but they  could be in mass production in a couple of years. The Field Camera is attractive to chipmakers because it allows them to increase their share of the value-added in  a camera, at the expense of the lensmakers.

Very probably, field cameras might be useful to speed up the actual photography of documents. To take advantage of this, you might have to change your way of thinking about how you republish  documents. One print habit that you need to get away from is the idea that the first published version has to  be a perfect version. In electronic media, it is permissible for the first version to be massively  flawed, so long as it is labeled accordingly, and so long as you have a regular system for  cataloging improvements.  There should be a regular system, whereby readers can report probable flaws. The twentieth or thirtieth version might eventually reach the  standard of traditional published "papers" projects.  There is an old engineer's saying that "the perfect is the enemy of  the good," and I think this applies to putting archives online.

The field camera has applications not only to archives, but also to libraries. Google Print,  like University Microfilms before it, tends to require special lending  privileges from the libraries  it collaborates with. They need to have a few thousand volumes checked out at once, including things like serials which normally do not circulate, all of them away on visits to their special cameras.  That means Google Print  has to have official sanction, and lives in the slow world of lawyers, lobbyists, and whatnot. Imagine a totally acephalous network of independent copyists, like the scriptorium monks of the  middle ages, deciding to do the same job, only to do it right, and thumbing their collective noses at all laws.

There is a category of books which are so rare that the big national libraries have the positive advantage of owning the only surviving  copy. These  are the kinds of books which the national library will have put on its website in any case. However,  there is a vast middle ground of books which are more common. Such a title will be owned by hundreds  of private individuals and by small obscure libraries with  no pretension to serious scholarship (eg. secondary school libraries), which have generally inherited the  belongings of such private individuals. These individuals and  libraries are conventionally out of the game because they don't have enough  distinctive material to be worth traveling to, securing an introduction, etc. For this category, the great strength of the big  libraries is that they have large collections of related materials. Ideas can be pursued along the shelf.  However, if the resources of the very small libraries could somehow be pooled, the  result would be a world-class library. Very well, I think such a pooling will happen on the internet. Once books are reduced to files, their bulk is trivial compared to that of  music and movies.

The tendency in internet file trading seems to be towards using "offshore banking islands," islands in the Caribbean  or the South Pacific which make their livings by systematically flouting the tax and regulatory laws of larger countries.  Such micronations are very likely going to be willing to support and even subsidize file trading as  a source of "noise" to cloak more lucrative activities such as banking, offshore "mailbox" corporations,  and internet gambling. The local copyright law will  be for a moderate term, say ten years. The National Library, run out of the local high school (*), will file away and redistribute copies of any book or record which anyone sends them, on  a no-questions-asked basis, and it will  use all the  newest privacy technologies to thwart law enforcement in the large countries. The only way to do anything about this kind of activity is to send in the Marines, and somehow I don't  think very many Americans would be enthused about American troops dying to protect the profits of Disney.
(*) The most advanced educational institution a small population can support. 

The result of all this is that  the established great libraries will lose much of their pre-eminence.

==========================
[In response to Maarja Krusten's question about detecting forgeries:]
(03/11/2007 02:13 PM)

I don't  know specifically about Adobe's product, but people  have been  talking around the  issue for a good twenty years. Stewart Brand wrote a seminal  article "Digital Retouching: the End of Photography as Evidence of Anything,," on the subject in Whole  Earth Review, back in 1985. There are things you can do, like looking for discontinuities of grain, but the forger can fake those too.  Basically, the situation is much like literary forgery. If someone is willing to do enough  work, you can only catch him through his own mistaken assumptions, as the state of knowledge changes over the  years.  Particularly, if the  forger has managed to get into the archives, like John Payne Collier, the situation can be-- messy. There's a good section on the Victorian forgers-- people like Collier and T. J. Wise-- in Richard D. Altick's _The Scholar Adventurers_ (1960).  The detective story writer Michael  Innes has an interesting  disquisition  on the subject in his novel _The Long Farewell_.  All the same commentaries apply to photography. You do need to worry about what an art historian would call provenience, and a  police detective would call "chain of custody."

In a typical legal case, it might come down to saying that the street  punk convicted of Grievous Bodily Harm  on the strength of picture taken by a surveillance camera is simply not  worth  the kind of work necessary to create a forgery-- to anyone.

http://www.seanet.com/~rod/digiphot.html
http://www.seanet.com/~rod/notes_1.html
(see note 42)

========================================
(03/11/2007 09:04 PM)

You seem to be assuming that a digitized source is something you find on a  website, and  that materials have or have  not been digitized without any action on  your part.  However, I think we're getting towards the  point where, unless you are talking about an  enormously rich  collection such as the  National Archives, your travel money buys more in the form of paying  a work-study student to scan things for you than it does in airline tickets to physically get  you there, not to mention  hotel bills. Besides, it keeps the money in the family. I gather that a  lot of small archives have effectively switched over from  encouraging people to visit to setting up a regular and profitable scan-for-hire arrangement.  That way, they build up their digital collection at the same time, at no cost to themselves.  My guess is that, with a good enough camera, the break-even point might be as much as a couple of thousand pages, depending  on where you had to go. What it comes down to is that a camera takes a picture in a hundredth to a thousandth of a second. If you have the right kind of set-up, it can take pictures as fast as  you can  turn the  pages. If you were even going to look at a page, there is no real additional cost in making a copy, and  if you are making a  copy anyway, there is no need to read the page first. The archive can  almost certainly rig up a better and more efficient camera set-up with apparatus in situ than you can do with something you have to carry around in your  pocket. 

Of course, there are these hyper-political places where they really want to use the documents as a lever to supervise your research and writing, but  that is something different.  My field is History of Technology, and the  people involved (engineers) take digitization much more for granted. The circumstances under which they fail to digitize archival papers are approximately the circumstances under  which they would deliberately withhold access anyway. I pulled down a collection of about 15 megabytes of transcripts of oral history interviews, recorded over  twenty years (~7500  pages), from one repository, and wrote about two  chapters out of it. This is a somewhat different proposition than Googling for stuff. 

A related point I should add: digitization does not necessarily mean transcribing things, and making them machine searchable.  That is much more expensive, naturally. This collection I worked from _had_  been converted to byte characters. I'm inclined to think that some of the interviewees, at least the industrial executives among them, probably provided the  money to pay for the transcribing. However,  a lot of the other sets of data I encountered were collections of photographs (that  is, bitmaps) of pages.
===================================================
(03/13/2007 03:22 PM)
[in response to Megan McShea]

 I think you are taking an unduly despondent view of the situation about preservation of old formats.  In the technical community, some of us have been looking at related issues, especially in connection with the politics of copy protection. What  it comes down to  is that you don't necessarily need the right  kind of projector, or whatever. You can use off-the-shelf cameras  in clever ways, and use computers to synthesize their output into usable form.

Just to take an  example, certain experimental evidence communicated to me by the electrical engineer Ed Nisley (Dr. Dobbs' Journal columnist, old  IBM'er)  indicates that it may be possible to use  an electronic camera or flatbed  scanner to read old phonograph records.  The work is in a very preliminary state, and neither of us had time to  pursue it, so we passed it on to a certain  Computer Science department which has developed an  interesting in applied optics, where they have energetic undergraduates looking for research projects. It might prove useful for records which are in an extremely poor state of preservation, and very  brittle. Once you have your apparatus in place, set up and automated to the extent which is customary in hard science experiments, it would  probably be possible to process records about as fast as  you could fetch them from the stacks, take them  out  of their  packaging, etc.

=======================================
(03/14/2007 11:48 AM)


Well, to my way  of thinking, describing things _is_  the deluxe treatment, because it  implies a) human  labor and  b) skilled human labor at that. I don't  know about you, but I cannot  read, or even skim, at ten to twenty pages a minute.  I used to be able to photocopy at that rate, even with an  old-fashioned photocopier, and from periodicals bound up to telephone-book size (trade magazines in an engineering school library). I went to the trouble of figuring out how to set the copier up to copy two pages in a single exposure,  and went into  rapid  routine,  like working on an assembly line. In fact, given that the copier was programed to return to standard settings after twenty seconds or so, I had no alternative  but to set a  rapid pace, bing, bing, bing. 

That, unfortunately,  was before a smashed wrist ten years ago, surgery, and persistent arthritis thereafter (sigh!).  However, I now have my little electronic camera, and tripod, which  more or less puts me back to  status quo ante. 

At any rate, I found that my reading notes  tended to  lag behind my photocopying. The supply  of materials  in situ is of course practically boundless. In each engineering field, there is an American flagship learned society professional journal; a research journal from the same organization; and a proprietary magazine (ie. people wearing their businessman hats instead of their learned professional hats). Likewise, there are British equivalents (*), and whenever there is a significant technological change, new societies, magazines, and journals spring up around the argument  about whether  to  adopt this change. The same goes for jurisdictional disputes. There are something  on the order of two million magazine pages potentially relevant to  any given research  problem, which are readily available in almost any decent university  library. I had to go to archival sources to get at the early formative period, when the entire  profession could  be gathered in a single  room, before they were so numerous that they had to publish magazines to  keep  in touch.  Having done so, I had to read something like half of  the specialist archive's flagship collection, the collection of oral histories, in order to have enough  material to work with.  The  finding guide was  not of very  much use,  because  it was focused around different  concepts than the  ones I  was interested in. The guide was interested  in the ways in which people  were famous, whereas I was interested in the ways in which they were typical, or representative. For that matter, the interviewers were mostly interested in famousness as well, and the things  I was  able to use were  the things which slipped  in despite their efforts to  keep the  interviews  "on track." It was much the same  kind of writing/research problem as  Le Roi  LaDurie's _Montailou_.  I suspect that in a traditional reading-room archive, I might have made myself seriously unpopular by requesting unreasonable numbers of documents. As it was, the  collection was on a web-server. Effectively, the archive had made the whole collection into a book and republished it.

(*) Also French, German, and Russian, of course, but there is the language barrier.

When you are talking about audio-visual  materials, which are designed to be fed into a machine, the case is even clearer, of course. With a bit of judicious tweaking, you can arrange so that the machine operates unattended for long periods of time, or that a single operator  superintends a whole bank of such machines.

================================================================
Ah,  the  citation of that book should be:  Emmanuel Le Roy Ladurie,  trans. Barbara Bray,   _Montaillou:  The Promised Land of Error_,  Vintage Books, New York, 1979

What he does is to take the official transcript of a Holy Inquistion proceeding, and "decode" it into a description of daily life in the Middle Ages.







  Index   Home