|
FLHIG Home page |
List of Online Articles | Newsletter contents | Site map | Links |
| This article first appeared in SIGGNL 26 pages 5 to 10 (February/March 2001) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Bedfordshire
Parish Registers - Preparing a Master Index |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Under successive County
Archivists, the baptisms, marriages and burials registers for all 128
Bedfordshire parishes up to 1812 became the first complete county-wide set
of parish registers to be transcribed and published: the first volume in
1931, and the last of 80 volumes in 1990. In 1998-9, all the
transcriptions were made available on microfiche, through a joint venture
between the Bedfordshire Family History Society (Beds FHS) and the
Bedfordshire and Luton Archives and Records Service (BLARS).
Each parish or volume of several parishes had its own index but it was thought valuable to have a single Bedfordshire surname index available for researchers which would quickly lead them to the parishes and pages of interest. In February, 2000 I agreed to coordinate the task which would involve designing and building a database index of some 150,000 entries using volunteer members of Beds FHS for data entry. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ORIGINAL MATERIAL | Inspection of the indexes, mostly
duplicated typescripts, soon revealed that they were variable not only in
size and legibility but also in style and consistency. Examples of some of
the different forms of original entry which were encountered included:
basic entries name variants (some spelt phonetically) name variants and cross references From this material it was necessary to generate a consistent style of entry, taking into account all names and variants so that we could produce a flexible, searchable database with the option of preparing a compact alphabetical listing with adequate cross referencing; the exact form of publication - microfiche, CD or Internet - would be decided at a later stage. The original basic entries posed few problems beyond the representation of page ranges but dealing with inconsistently presented name variants and cross references proved quite challenging. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DATA PREPARATION | In order to avoid delay, draft data
entry guidelines were quickly prepared, taking into account the known
variations and providing examples to follow, and photocopies of the
indexes to all 80 volumes made in anticipation of the rush of volunteers!
An appeal, made through the Beds FHS Journal, provided a pool of about 50
volunteers, some as far away as the USA, New Zealand and Australia - some
experienced keyboarders, others almost first-time PC users, all willing to
undertake data entry, together with those without PCS who were willing to
act as checkers.
It was decided that the master database should ultimately be created in Microsoft Access but, since the process needed to be kept as simple as possible and only a limited number of potential volunteers might have this software available on their home machines, initial data entry was achieved using a spreadsheet package such as Excel or Lotus 123 (or, where necessary, through Microsoft Word tables). The resultant files were to be standardized and edited in Excel before ultimately importing the data into Access. The origin of each entry had to be defined in terms of parish and corresponding printed volume; each parish was thus allocated a 3-character code and used alongside the volume number, e.g. 41 HCO for Houghton Conquest (Vol 41); 35 BDM for Bedford St.Mary (Vol 35). A typical batch of new line entries from several parishes would thus be:
The data entry process revealed yet further style variations which demanded editorial decisions which had to be fed back into revised guidelines e.g. for the expansion and management of cross references. Volunteers were asked to exercise their own judgement about other matters e.g. the condensation of surnames by the use of brackets to indicate optional characters e.g. Thom(p)son, Cook(e); where possible, bracketing within the first three characters which would affect sort order, e.g T(h)omkins, was to be avoided. Commonly occurring surnames (e.g. Smith) had sometimes not been fully indexed in the original transcript and notes had to be included to guide the user. Most volunteers coped well and soon produced adequate files which required a minimum of correction after checking. Some forwarded the first few pages of their first parish for scrutiny and correction to make sure that they were dealing correctly with the typescripts and, in retrospect, the latter approach could have been more widely suggested to prevent those few who misinterpreted (or did not read!) the guidelines from producing large files of limited value without reference to the project coordinator. This serves to highlight the difficulties inherent in ‘managing' a large, dispersed force of willing but understandably inexperienced volunteers; we were all learning as we went along. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CHECKING AND CORRECTION | When returning files to the
coordinator on disk or via e-mail, inputters were asked to maintain
entries in entry order and not to sort alphabetically in order to aid the
process of checking their work against the original typescript. Files for
each volume of one or more parishes, however delivered, were standardized
in Microsoft Excel and printed before presentation to a checker.
Checkers (if not previously involved in data entry and now checking the work of others) were provided with the same guidelines as used for data entry and made handwritten corrections on the printed output which was then returned for physical correction of the Excel files, either by the originator or by the coordinator or his assistant. Any areas of doubt were resolved by reference to the complete transcript. Corrected Excel data for each volume were then sorted field by field and inspected for any outstanding errors of format or content. Batches of checked and corrected volumes were then progressively cumulated into eight and ultimately four Excel files, sorted and inspected again for duplicate entries and for variations across parishes resulting from the different approach taken in each original index or decisions made independently at the data entry stage. Entries were progressively rationalized as files were compared and internal editorial standards were set. Finally, edited data was then imported from the Excel spreadsheet into an Access database, a process which allowed for further checking for data integrity. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| THE DATABASE | Microsoft Access is a versatile database package but compromises still had to be made to achieve the desired outcome. The major issue was the accommodation of all the possible variant surname spellings, ensuring that the database offered the user high recall even at the expense of relevance, whether searching the data on CD or via the Internet; this was resolved by developing a Soundex-style algorithm which generates a one or two phonetic letter codes for each surname entry and its variants and similarly processes the user query, matching the resultant codes. A surname search (either directly in Access or via a Visual BASIC front end yet to be developed) provides a list of candidate names and variants with their parish of origin from which the searcher can select those of interest and see the full entries as a structured report for printing or viewing on the screen; the relevant page of the microfiche transcripts for the relevant parishes can then be consulted for the full entry. The database can also be used to generate a full alphabetical listing with cross references suitable for microfiche publication or simple alphabetical inspection on CD or the Internet (see example below). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| PROGRESS AND PROSPECTS | We are on target to meet our original time scale and have now completed the data entry and correction phase. The cumulation and editing phase is now underway but is proving to be an activity which requires more effort than originally envisaged. The database design is being finalized and we hope to beta-test this within Beds FHS, using a large sample of data, before Easter, 2001. Using feedback from this test, during the second quarter of 2001, we hope to have a full trial version available on CD for testing through the public libraries and BLARS; if all is successful, we anticipate offering the database on sale as a CD during the last quarter of 2001. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SAMPLE PAGE |
Bedfordshire Parish Registers Index
Whilst we have tried to stay faithful to the original indexes,
editorial decisions and rationalization of the data will mean that the
final product cannot wholly correspond to the individual printed indexes
from which it was compiled. Again, although every care has been taken in
the preparation of the data and we were even able to correct some errors
pre-existent in the original indexes, a caveat will have to be issued to
warn users of the potential presence of errors and inconsistencies which
may have been inadvertently introduced; however the existence of a
centrally maintained database will mean that any errors fed back by users,
relating to the index data or the transcripts, can for the first time be
recorded permanently.
This project has demonstrated the willingness of our members to get
involved in something which would just not be feasible without adequate
volunteer support and we hope that the final outcome of the project which
has built on the work of many during the 20th century will prove to be
effective and beneficial to family historians in the 21st.
Gerry Allen lives at 183 Putnoe Street, Putnoe, Bedfordshire,
England.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Page updated 20 November 2004 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||