Ryan Taylor When web sites disappear
Disappearing web sites


About Ryan Taylor

Dave Obee's
tribute to Ryan

Ryan's Columns

Ryan's Books

Devonians in

Back to

By Ryan Taylor

One problem faced by information collectors of all kinds is how to be sure material published only on the World Wide Web will still be available ten or twenty years from now.

If you collect books, itís easy. They can go on a shelf (or in a pile under a chair, if thatís all the room you have) and will stay there until needed. On-line documents may disappear tomorrow. I was shocked to learn that websites have an average life of only six weeks.

The obvious solution for a lot of people is simply to make paper copies of anything interesting they find on the web. Then they have a file to put it in, and itís safe. Itís also expensive to be constantly printing, and your printer may be like mine, way too slow.

Itís also impossible to manage. Many web-published documents are simply too large to print. For example, Marj Kohli of the University of Waterloo has a website full of indexes which she has compiled (available at http://ist.uwaterloo.ca/~marj/). Printing those out would be too costly, and anyway, Marjís amazing capability to produce new ones means that the indexes are changing all the time. No one can keep returning to ever-morphing websites to reprint them every week or two.

What might be a problem for us at home is magnified many times for libraries. The British Library, national library of the United Kingdom, is trying to find a way to collect web information, but is finding it hard going.

They are starting with a pilot project to select and obtain permission to download about a hundred websites, so they can determine the costs and volume which will be involved. They must also try to discover which topics they should concentrate on. There are technological and legal questions in the process aside from the practical problems of how to store it, catalogue it and then make it accessible to users.

How often will webites have to be captured? The British Library thought about monthly, but some sites may need to be monitored daily. It may be possible to program a computer to call up the site and download it automatically, but a certain amount of human participation will be needed. In all likelihood, the computer could even be programmed to make judgements about whether the website has changed significantly since it was last checked, or whether the information on it is worth keeping.

The head of the British Library, Lynne Brindley, says the institutionís starting point is that it is the legal repository for all traditionally published books, films and recordings. Publishers are required to deposit copies in the library as a permanent record. This is also true in Canada, where the repository is the National Library in Ottawa.

"The British Library has the nationís memory," she says, "It happens to be in printed form, but the function is the same in the electronic age." The difficulty is determining how to take on this tremendous challenge.

I confess that our library has been known to make print copies of web indexes to add to our collection, always with the compilerís permission. We also make print copies of occasional, very ephemeral, lists such as passengers on large air crashes. We know that eventually we will be asked if we have them, so we collect them now in the short time they are available.

I suspect all genealogists do this. It is not possible to do it on a large scale and the question of what we should be doing next is hard to solve. Perhaps the large genealogical conferences of the next two or three years will include discussions of this topic. Hereís hoping the techno-wizards have some answers in the works. I hate to think of all the data thatís disappearing into the ether.

Column copyright © 2002 Ryan Taylor

Sponsored by Interlink Bookshop and Genealogical Services and hosted by Islandnet.com