Family History South Australia
South Australian family history database specialists home > world > family tree software > database design

Genealogy database design and its genetic basis

Barry Leadbeater

When designing any database, relational or not, the most basic requirement is that it models the intended real world situation accurately. And one of the tests that this is true is that no more restrictions apply to the data than exist in real life. So, if a genealogy database allows the entry of no more than, say, 8 children per couple, or 3 marriages per person, then the database design is flawed because no such restrictions exist in life. However, this sort of restriction does apply to the majority of genealogy database programs. They usually model a person and their family as something like

            person = spouse                     (1)
                   |
      ___________________ _ _ _ _ _ _____
     |             |                     |
  child 1       child 2               child 8

This definition of a generalised family is inadequate because the number of children in actual families varies. The number of persons' spouses is also variable. The result is problems in both building and using the database.

GEDCOM data uses this model in the FAM records, but, because of its freeform structure, does not suffer from any limitation to the number of children per family.

The fundamental database definition

It is preferable to base the model on the simplest possible definition of a person and their family - a definition which applies to everyone without requiring adaptation. This is

            father = mother                     (2)
                   |
                   | sexual reproduction
                   |
                person

To implement model (2) efficiently, a two-table relational database is required. One table contains all persons. The other contains all parent pairs. This second table can, of course, also contain childless partnerships which are handled without modification. In this way it may do double duty as the marriage event table but its basic purpose is to contain all couplings resulting in children. The two tables are linked together to generate families from the individuals. All family relationships are automatically represented.

GEDCOM data incorporates this model. The INDI records contain every individual person. The FAM records contain all marriages. But they also unnecessarily contain references to the children of each marriage. The result is a considerable amount of redundant reference data with the potential for conflicts and inconsistencies. That is, the GEDCOM model is fundamentally flawed and therefore should not be used as the basis for a genealogy database design. However, it will still be necessary in the forseeable future to provide a means of converting data to and from the GEDCOM format to enable export to, and import from other GEDCOM enabled databases.

A good model should be able to cope with all the modern methods of producing children, including artificial insemination of sperm, artificial implantation of the fertilised egg in the genetic mother or in a surrogate, and even cloning. Multiple spouses, at the same time or sequentially, should be no problem.

Coping with clones

Model (2) easily copes with all these situations with the one possible exception of cloning. In this case, the one and only parent provides both sets of chromosomes (and therefore genes) to the cloned child. Cloning of people is analogous to propagating plants from cuttings. It might seem that we can only use our model if we say that the one parent is half father and half mother. The father half of the parent provides one set of chromosomes and the mother half, the other set. The model appears to be crude, since the simplest model of cloning seems to be the one-to-one relationship

                parent                         (3)
                   |
                   | asexual reproduction
                   |
                person

However, genetically, the clone is a sibling rather than a child - equivalent to an identical twin. So, in fact, model (2) applies without modification and results in a family group like

            father = mother                    (4)
                   |
      ____________________________
     |             |              |
  child 1       child 2  →  child 2 clone

Coping with unknown parents

Model (2) easily copes with one or two unknown parents. A person with both parents unknown always has a record in the persons table but there may not be any parents record. However, if there are known siblings, they each need to refer to the one parents record even though the parents' names are unknown. This parents record serves to link the individual persons as siblings.

Expanding the design to a family history database

Events

Family historians would wish to expand the previously described genealogy database into a fully relational family history database. This requires, as a minimun, the addition of an events table containing the event type, place and time the event occurred. There should be a sub-table of event types and another of places to ensure the database is well normalized. The marriage event type is the most important as it will probably replace the parent pairs table described previously.

Sources

In the events table, a field for the sources of the information is highly desirable together with a sub-table of sources, preferably including an estimate of confidence level in the information given in each source. Some family history database designers prefer to model the source material rather than the facts. The sources table is their fundamental database table. This simplifies the design for coping with multiple sources of the same data, especially when they conflict. However, this design clearly models the paper trail, not the actual family history. Of course, a combination of the two databases where the "facts" database is derived from the "sources" database may be ideal.
Go to the family tree software resource directory
Go to the South Australian family tree resource directory

Revised 18 April 2008
Copyright © 2004-2008 B Leadbeater, Australia. All rights reserved.