FLHIG
 

Family and Local History
Indexing Group


Selected articles from the Newsletter

FLHIG
Home page
List of Online Articles Newsletter contents Site map Links
         
   

This article first appeared in SIGGNL 26 pages 5 to 10 (February/March 2001)

     
   

Bedfordshire Parish Registers - Preparing a Master Index
by Gerry Allen

    Under successive County Archivists, the baptisms, marriages and burials registers for all 128 Bedfordshire parishes up to 1812 became the first complete county-wide set of parish registers to be transcribed and published: the first volume in 1931, and the last of 80 volumes in 1990. In 1998-9, all the transcriptions were made available on microfiche, through a joint venture between the Bedfordshire Family History Society (Beds FHS) and the Bedfordshire and Luton Archives and Records Service (BLARS).

Each parish or volume of several parishes had its own index but it was thought valuable to have a single Bedfordshire surname index available for researchers which would quickly lead them to the parishes and pages of interest. In February, 2000 I agreed to coordinate the task which would involve designing and building a database index of some 150,000 entries using volunteer members of Beds FHS for data entry.

ORIGINAL MATERIAL   Inspection of the indexes, mostly duplicated typescripts, soon revealed that they were variable not only in size and legibility but also in style and consistency. Examples of some of the different forms of original entry which were encountered included:

basic entries
Wilson 1-4, 6-11, 14-5, 18-21, 32-5, 41-2, 44-8, 50-1, 55, 58-9, 62-3, 65-6, 68
Allen 1, 2, 6-9, 11-12, 41-43, 44
Thomas A1, 5-8, 21, 34, 56; B2, 5, 7-11, 23-4, 47; C34

name variants (some spelt phonetically)
Tom(p)kins 33-5, 56, 85-6, 89
Pearson, Pierson, Person 11-2, 21, 52, 67, 91
or
Pearson (Pierson, Person) A11-2, 21, 52, 67, 91; C15
Peck(e), Pack 8, 62, 82
Hawkins, -in(e)s, -ings , Howkins 15-7, 67, 88
Langlye/Laulye B26
Nicholl, Nickols/olles 2-7, 10, 41
Bedcote (-cot(t)) A45, 78
Robinson alias Cook 97
Smith als Jones 123
Birdsey(-zey, Burdsey) A2, 34
Adkins (At-) A9, 45; B6

name variants and cross references
Norton {Orton 7} 7, 58, 109
Billington (Mill- {B12, 67}) A17, 21; B11
Pack see Peck
Lea see Leigh, Lee
Adkins (cf Atkins) 12, 56
Gamkin see Champkin
Thomson see Thompson
Gell see also Gale 3, 15, 19, 74
Langley 24 see Langlye/Laulye
Mi(t)chell 23, 67, 78-9 see Maskell
Howkins {see also Allkins} 15-7, 67, 88
Pierson, Person see Pearson
Cook(e) see Robinson alias Cook

From this material it was necessary to generate a consistent style of entry, taking into account all names and variants so that we could produce a flexible, searchable database with the option of preparing a compact alphabetical listing with adequate cross referencing; the exact form of publication - microfiche, CD or Internet - would be decided at a later stage. The original basic entries posed few problems beyond the representation of page ranges but dealing with inconsistently presented name variants and cross references proved quite challenging.

DATA PREPARATION   In order to avoid delay, draft data entry guidelines were quickly prepared, taking into account the known variations and providing examples to follow, and photocopies of the indexes to all 80 volumes made in anticipation of the rush of volunteers! An appeal, made through the Beds FHS Journal, provided a pool of about 50 volunteers, some as far away as the USA, New Zealand and Australia - some experienced keyboarders, others almost first-time PC users, all willing to undertake data entry, together with those without PCS who were willing to act as checkers.

It was decided that the master database should ultimately be created in Microsoft Access but, since the process needed to be kept as simple as possible and only a limited number of potential volunteers might have this software available on their home machines, initial data entry was achieved using a spreadsheet package such as Excel or Lotus 123 (or, where necessary, through Microsoft Word tables). The resultant files were to be standardized and edited in Excel before ultimately importing the data into Access.

The origin of each entry had to be defined in terms of parish and corresponding printed volume; each parish was thus allocated a 3-character code and used alongside the volume number, e.g. 41 HCO for Houghton Conquest (Vol 41); 35 BDM for Bedford St.Mary (Vol 35). A typical batch of new line entries from several parishes would thus be:

Name Page or cross reference Parish / volume code
Allen 1, 23, 45, 67 41 HCO
Allyn see Allen
Bright A2, 45, 67-9 43 CHA
Bright B4, 67-9, 123 43 TOD
Bright see also Blight (A68) 43 CHA
Legget(t) see also Levit(t)
Cook C23, 46, 78-9 41 HCO
Cook see also Robinson alias Cook
Clark(e) 24, 56, 63 53A LUT
Clark(e) 45, 78, 93 53B LUT

The data entry process revealed yet further style variations which demanded editorial decisions which had to be fed back into revised guidelines e.g. for the expansion and management of cross references. Volunteers were asked to exercise their own judgement about other matters e.g. the condensation of surnames by the use of brackets to indicate optional characters e.g. Thom(p)son, Cook(e); where possible, bracketing within the first three characters which would affect sort order, e.g T(h)omkins, was to be avoided. Commonly occurring surnames (e.g. Smith) had sometimes not been fully indexed in the original transcript and notes had to be included to guide the user.

Most volunteers coped well and soon produced adequate files which required a minimum of correction after checking. Some forwarded the first few pages of their first parish for scrutiny and correction to make sure that they were dealing correctly with the typescripts and, in retrospect, the latter approach could have been more widely suggested to prevent those few who misinterpreted (or did not read!) the guidelines from producing large files of limited value without reference to the project coordinator. This serves to highlight the difficulties inherent in ‘managing' a large, dispersed force of willing but understandably inexperienced volunteers; we were all learning as we went along.

CHECKING AND CORRECTION   When returning files to the coordinator on disk or via e-mail, inputters were asked to maintain entries in entry order and not to sort alphabetically in order to aid the process of checking their work against the original typescript. Files for each volume of one or more parishes, however delivered, were standardized in Microsoft Excel and printed before presentation to a checker.

Checkers (if not previously involved in data entry and now checking the work of others) were provided with the same guidelines as used for data entry and made handwritten corrections on the printed output which was then returned for physical correction of the Excel files, either by the originator or by the coordinator or his assistant. Any areas of doubt were resolved by reference to the complete transcript.

Corrected Excel data for each volume were then sorted field by field and inspected for any outstanding errors of format or content. Batches of checked and corrected volumes were then progressively cumulated into eight and ultimately four Excel files, sorted and inspected again for duplicate entries and for variations across parishes resulting from the different approach taken in each original index or decisions made independently at the data entry stage. Entries were progressively rationalized as files were compared and internal editorial standards were set. Finally, edited data was then imported from the Excel spreadsheet into an Access database, a process which allowed for further checking for data integrity.

THE DATABASE   Microsoft Access is a versatile database package but compromises still had to be made to achieve the desired outcome. The major issue was the accommodation of all the possible variant surname spellings, ensuring that the database offered the user high recall even at the expense of relevance, whether searching the data on CD or via the Internet; this was resolved by developing a Soundex-style algorithm which generates a one or two phonetic letter codes for each surname entry and its variants and similarly processes the user query, matching the resultant codes. A surname search (either directly in Access or via a Visual BASIC front end yet to be developed) provides a list of candidate names and variants with their parish of origin from which the searcher can select those of interest and see the full entries as a structured report for printing or viewing on the screen; the relevant page of the microfiche transcripts for the relevant parishes can then be consulted for the full entry. The database can also be used to generate a full alphabetical listing with cross references suitable for microfiche publication or simple alphabetical inspection on CD or the Internet (see example below).

PROGRESS AND PROSPECTS   We are on target to meet our original time scale and have now completed the data entry and correction phase. The cumulation and editing phase is now underway but is proving to be an activity which requires more effort than originally envisaged. The database design is being finalized and we hope to beta-test this within Beds FHS, using a large sample of data, before Easter, 2001. Using feedback from this test, during the second quarter of 2001, we hope to have a full trial version available on CD for testing through the public libraries and BLARS; if all is successful, we anticipate offering the database on sale as a CD during the last quarter of 2001.

SAMPLE PAGE  
Bedfordshire Parish Registers Index

Name Vol Parish Pages
Abbonny see Albaney
Abbot(t) see also Ibbot
 see also Ebbott
 see also Philips alias Abbott(t)
 01 BDC C6, 8, 20, 32, 34
 01 BDJ B10, 25
 01 ELS A21, 22, 50, 80, 81, 93
 30 BIG 38, 55
 43 WOO 23, 34, 56, 78, 89-90
 53A LUT 31, 34, 36, 38-43, 59, 63, 65, 67, 69, 74
 53B LUT 22, 36, 48, 59, 66, 70-2, 74, 86, 94, 99, 125
 53C LUT 31, 34, 36, 38-43, 59, 63, 65, 67, 69, 74
 58A BDL 4, 5, 8, 14, 18-9, 21-3, 25-7, 30-1, 34, 39, 42, 33, 46-50, 53, 60, 62, 146, 161, 163
 58B BDL 3, 10, 13-4, 16-7, 19, 21, 25-6, 28-9, 31-2, 34-6, 39, 48-9, 51, 53, 60, 62-3, 73-4, 83, 87, 135, 144, 150
 58C BDL 1, 3-5, 9, 10, 12-3, 15-6, 19, 20, 30, 54, 61, 64, 71, 79, 81
 74 ESO 11-2, 14, 16-7, 48-9, 52, 54, 68-9, 169-170, 172, 178, 186-7, 193, 195, 203, 211, 229, 231, 239, 263, 339
Abbot(t) alias Philips 58B BDL 63, 88
 01 BDC C6
 01 BDJ B49

16 October 2000 page 3 of 1194

Whilst we have tried to stay faithful to the original indexes, editorial decisions and rationalization of the data will mean that the final product cannot wholly correspond to the individual printed indexes from which it was compiled. Again, although every care has been taken in the preparation of the data and we were even able to correct some errors pre-existent in the original indexes, a caveat will have to be issued to warn users of the potential presence of errors and inconsistencies which may have been inadvertently introduced; however the existence of a centrally maintained database will mean that any errors fed back by users, relating to the index data or the transcripts, can for the first time be recorded permanently.

This project has demonstrated the willingness of our members to get involved in something which would just not be feasible without adequate volunteer support and we hope that the final outcome of the project which has built on the work of many during the 20th century will prove to be effective and beneficial to family historians in the 21st.

Gerry Allen lives at 183 Putnoe Street, Putnoe, Bedfordshire, England.


         
Page updated
20 November 2004
   

Ç TOP