(My Responses)
(03/11/2007 11:36 AM)
RE:
http://hnn.us/blogs/comments/36392.html#comment
New Technology Relevant to Online Archives.
There is a device called a "Field Camera," an electronic
camera which
works on the same principle as insect vision, and which
records
something approximating a hologram. That means that you no
longer have
to worry about focus when you are taking pictures-- you can fix
the
focus after the fact, in an editing program, the same as exposure.
This
is important when one is taking a picture of an object, such as a
bound
book, which cannot easily be made to lie flat. A field camera
offers
good prospects of rapidly producing images of printed books good
enough
for optical
character recognition (OCR) to work reliably. Field Cameras, by
virtue
of their design,
can be made very small and cheap, and yet have optical
performance
comparable to the finest conventional cameras. A ten-dollar field
camera, only slightly bigger than a postage stamp, might have
performance comparable to a specialized conventional camera
weighing a
thousand pounds and costing fifty thousand dollars. Think of the
sorts
of cameras which Google is building and using for the Google Print
project, and imagine every little kid having one. With such a
camera,
it would be no great difficulty to copy a book in several minutes.
At present, Field Cameras are still at the level of research
projects,
but they could be in mass production in a couple of years.
The Field
Camera is attractive to chipmakers because it allows them to
increase
their share of the value-added in a camera, at the expense
of the
lensmakers.
Very probably, field cameras might be useful to speed up the
actual
photography of documents. To take advantage of this, you might
have to
change your way of thinking about how you republish
documents. One
print habit that you need to get away from is the idea that the
first
published version has to be a perfect version. In electronic
media, it
is permissible for the first version to be massively flawed,
so long
as it is labeled accordingly, and so long as you have a regular
system
for cataloging improvements. There should be a regular
system,
whereby readers can report probable flaws. The twentieth or
thirtieth
version might eventually reach the standard of traditional
published
"papers" projects. There is an old engineer's saying that
"the perfect
is the enemy of the good," and I think this applies to
putting
archives online.
The field camera has applications not only to archives, but also
to
libraries. Google Print, like University Microfilms before
it, tends
to require
special lending privileges from the libraries it
collaborates with.
They need to have a few thousand volumes checked out at once,
including
things like serials which normally do not circulate, all of them
away
on visits to
their special cameras. That
means Google Print has to have official sanction, and lives
in the
slow world of lawyers, lobbyists, and whatnot. Imagine a totally
acephalous network of independent copyists, like the scriptorium
monks
of the middle ages, deciding to do the same
job, only to do it right, and thumbing their collective noses at
all
laws.
There is a category of books which are so rare that the big
national
libraries have the positive advantage of owning the only
surviving
copy. These are the kinds of books which the national
library will
have put on its website in any case. However, there is a
vast middle
ground of books which are more common. Such a title will be owned
by
hundreds of private individuals and by small obscure
libraries with
no pretension to serious scholarship (eg. secondary school
libraries),
which have generally inherited
the belongings of such private individuals. These
individuals and
libraries are conventionally out of the game because they don't
have
enough distinctive material to be worth traveling to,
securing an
introduction, etc. For this category, the great strength of the
big
libraries is that they have large collections of related
materials.
Ideas can be pursued along the shelf. However, if the
resources of the
very small libraries could somehow be pooled, the result
would be a
world-class library. Very well, I think such a pooling will happen
on
the internet. Once books are reduced to files, their bulk is
trivial
compared to that of music and movies.
The tendency in internet file trading seems to be towards using
"offshore
banking islands," islands in the Caribbean or the South
Pacific which
make their livings by systematically flouting the tax and
regulatory
laws of larger countries. Such micronations are very likely
going to
be willing to support and even subsidize file trading as a
source of
"noise" to cloak more lucrative activities such as banking,
offshore
"mailbox" corporations, and internet gambling. The local
copyright law
will be for a moderate term, say ten years. The National
Library, run
out of the local high school (*), will file away and redistribute
copies of
any book or record which anyone sends them, on a
no-questions-asked
basis, and it will use all the newest privacy
technologies to thwart
law enforcement in the large countries. The only way to do
anything
about this kind of activity is to send in the Marines, and somehow
I
don't think very many Americans would be enthused about
American
troops dying to protect the profits of Disney.
(*) The most advanced educational institution a small population
can
support.
The result of all this is that the established great
libraries will
lose much of their pre-eminence.
==========================
[In response to Maarja Krusten's question about detecting
forgeries:]
(03/11/2007 02:13 PM)
I don't know specifically about Adobe's product, but
people have
been talking around the issue for a good twenty years.
Stewart Brand
wrote a seminal article "Digital Retouching: the End of
Photography as
Evidence of Anything,," on the subject in Whole Earth
Review, back in
1985. There are things you can do, like looking for
discontinuities of
grain, but the forger can fake those too. Basically, the
situation is
much like literary forgery. If someone is willing to do
enough work,
you can only catch him through his own mistaken assumptions, as
the
state of knowledge changes over the years.
Particularly, if the
forger has managed to get into the archives, like John Payne
Collier,
the situation can be-- messy. There's a good section on the
Victorian
forgers-- people like Collier and T. J. Wise-- in Richard D.
Altick's
_The Scholar Adventurers_ (1960). The detective story writer
Michael
Innes has an interesting disquisition on the subject
in his novel
_The Long Farewell_. All the same commentaries apply to
photography.
You do need to worry about what an art historian would call
provenience, and a police detective would call "chain of
custody."
In a typical legal case, it might come down to saying that the
street
punk convicted of Grievous Bodily Harm on the strength of
picture
taken by a surveillance camera is simply not worth the
kind of work
necessary to create a forgery-- to anyone.
http://www.seanet.com/~rod/digiphot.html
http://www.seanet.com/~rod/notes_1.html
(see note 42)
========================================
(03/11/2007 09:04 PM)
You seem to be assuming that a digitized source is something you
find
on a website, and that materials have or have
not been digitized
without any action on your part. However, I think
we're getting
towards the point where, unless you are talking about
an enormously
rich collection such as the National Archives, your
travel money buys
more in the form of paying a work-study student to scan
things for you
than it does in airline tickets to physically get you there,
not to
mention hotel bills. Besides, it keeps the money in the
family. I
gather that a lot of small archives have effectively
switched over
from encouraging people to visit to setting up a regular and
profitable scan-for-hire arrangement. That way, they build
up their
digital collection at the same time, at no cost to
themselves. My
guess is that, with a good enough camera, the break-even point
might be
as much as a couple of thousand pages, depending on where
you had to
go. What it comes down to is that a camera takes a picture in a
hundredth to a thousandth of a second. If you have the right kind
of
set-up, it can take pictures as fast as you can turn
the pages. If
you were even going to look at a page, there is no real additional
cost
in making a copy, and if you are making a copy anyway,
there is no
need to read the page first. The archive can almost
certainly rig up a
better and more efficient camera set-up with apparatus in situ
than you
can do with something you have to carry around in your
pocket.
Of course, there are these hyper-political places where they
really
want to use the documents as a lever to supervise your research
and
writing, but that is something different. My field is
History of
Technology, and the people involved (engineers) take
digitization much
more for granted. The circumstances under which they fail to
digitize
archival papers are approximately the circumstances under
which they
would deliberately withhold access anyway. I pulled down a
collection
of about 15 megabytes of transcripts of oral history interviews,
recorded over twenty years (~7500 pages), from one
repository, and
wrote about two chapters out of it. This is a somewhat
different
proposition than Googling for stuff.
A related point I should add: digitization does not necessarily
mean
transcribing things, and making them machine searchable.
That is much
more expensive, naturally. This collection I worked from
_had_ been
converted to byte characters. I'm inclined to think that some of
the
interviewees, at least the industrial executives among them,
probably
provided the money to pay for the transcribing.
However, a lot of the
other sets of data I encountered were collections of photographs
(that
is, bitmaps) of pages.
===================================================
(03/13/2007 03:22 PM)
[in response to Megan McShea]
I think you are taking an unduly despondent view of the
situation
about preservation of old formats. In the technical
community, some of
us have been looking at related issues, especially in connection
with
the politics of copy protection. What it comes down
to is that you
don't necessarily need the right kind of projector, or
whatever. You
can use off-the-shelf cameras in clever ways, and use
computers to
synthesize their output into usable form.
Just to take an example, certain experimental evidence
communicated to
me by the electrical engineer Ed Nisley (Dr. Dobbs' Journal
columnist,
old IBM'er) indicates that it may be possible to
use an electronic
camera or flatbed scanner to read old phonograph
records. The work is
in a very preliminary state, and neither of us had time to
pursue it,
so we passed it on to a certain Computer Science
department which has
developed an interesting in applied optics, where they
have energetic
undergraduates looking for research projects. It might prove
useful for
records which are in an extremely poor state of preservation,
and very
brittle. Once you have your apparatus in place, set up and
automated to
the extent which is customary in hard science experiments, it
would
probably be possible to process records about as fast as
you could
fetch them from the stacks, take them out of
their packaging, etc.
=======================================
(03/14/2007 11:48 AM)
Well, to my way of thinking, describing things _is_
the deluxe
treatment, because it implies a) human labor and
b) skilled human
labor at that. I don't know about you, but I cannot
read, or even
skim, at ten to twenty pages a minute. I used to be able to
photocopy
at that rate, even with an old-fashioned photocopier, and
from
periodicals bound up to telephone-book size (trade magazines in an
engineering school library). I went to the trouble of figuring out
how
to set the copier up to copy two pages in a single exposure,
and went
into rapid routine, like working on an assembly
line. In fact, given
that the copier was programed to return to standard settings after
twenty seconds or so, I had no alternative but to set
a rapid pace,
bing, bing, bing.
That, unfortunately, was before a smashed wrist ten years
ago,
surgery, and persistent arthritis thereafter (sigh!).
However, I now
have my little electronic camera, and tripod, which more or
less puts
me back to status quo ante.
At any rate, I found that my reading notes tended to
lag behind my
photocopying. The supply of materials in situ is of
course
practically boundless. In each engineering field, there is an
American
flagship learned society professional journal; a research journal
from
the same organization; and a proprietary magazine (ie. people
wearing
their businessman hats instead of their learned professional
hats).
Likewise, there are British equivalents (*), and whenever there is
a
significant technological change, new societies, magazines, and
journals spring up around the argument about whether
to adopt this
change. The same goes for jurisdictional disputes. There are
something
on the order of two million magazine pages potentially relevant
to any
given research problem, which are readily available in
almost any
decent university library. I had to go to archival sources
to get at
the early formative period, when the entire profession
could be
gathered in a single room, before they were so numerous that
they had
to publish magazines to keep in touch. Having
done so, I had to read
something like half of the specialist archive's flagship
collection,
the collection of oral histories, in order to have enough
material to
work with. The finding guide was not of
very much use, because it
was focused around different concepts than the ones
I was interested
in. The guide was interested in the ways in which
people were famous,
whereas I was interested in the ways in which they were typical,
or
representative. For that matter, the interviewers were mostly
interested in famousness as well, and the things I was
able to use
were the things which slipped in despite their efforts
to keep the
interviews "on track." It was much the same kind of
writing/research
problem as Le Roi LaDurie's _Montailou_. I
suspect that in a
traditional reading-room archive, I might have made myself
seriously
unpopular by requesting unreasonable numbers of documents. As it
was,
the collection was on a web-server. Effectively, the archive
had made
the whole collection into a book and republished it.
(*) Also French, German, and Russian, of course, but there is the
language barrier.
When you are talking about audio-visual materials, which are
designed
to be fed into a machine, the case is even clearer, of course.
With a
bit of judicious tweaking, you can arrange so that the machine
operates
unattended for long periods of time, or that a single
operator
superintends a whole bank of such machines.
================================================================
Ah, the citation of that book should be:
Emmanuel Le Roy Ladurie,
trans. Barbara Bray, _Montaillou: The Promised
Land of Error_,
Vintage Books, New York, 1979
What he does is to take the official transcript of a Holy
Inquistion
proceeding, and "decode" it into a description of daily life in
the
Middle Ages.
Index
Home