Genealogy database design and its genetic basis
Barry Leadbeater
person = spouse (1)
|
___________________ _ _ _ _ _ _____
| | |
child 1 child 2 child 8
This definition of a generalised family is inadequate because the number of children in actual families varies. The number of persons' spouses is also variable. The result is problems in both building and using the database.
GEDCOM data uses this model in the FAM records, but, because of its freeform structure, does not suffer from any limitation to the number of children per family.
The fundamental database definition
It is preferable to base the model on the simplest possible definition of a person and their family - a definition which applies to everyone without requiring adaptation. This is father = mother (2)
|
| sexual reproduction
|
person
To implement model (2) efficiently, a two-table relational database is required. One table contains all persons. The other contains all parent pairs. This second table can, of course, also contain childless partnerships which are handled without modification. In this way it may do double duty as the marriage event table but its basic purpose is to contain all couplings resulting in children. The two tables are linked together to generate families from the individuals. All family relationships are automatically represented.
GEDCOM data incorporates this model. The INDI records contain every individual person. The FAM records contain all marriages. But they also unnecessarily contain references to the children of each marriage. The result is a considerable amount of redundant reference data with the potential for conflicts and inconsistencies. That is, the GEDCOM model is fundamentally flawed and therefore should not be used as the basis for a genealogy database design. However, it will still be necessary in the forseeable future to provide a means of converting data to and from the GEDCOM format to enable export to, and import from other GEDCOM enabled databases.
A good model should be able to cope with all the modern methods of producing children, including artificial insemination of sperm, artificial implantation of the fertilised egg in the genetic mother or in a surrogate, and even cloning. Multiple spouses, at the same time or sequentially, should be no problem.
Coping with clones
Model (2) easily copes with all these situations with the one possible exception of cloning. In this case, the one and only parent provides both sets of chromosomes (and therefore genes) to the cloned child. Cloning of people is analogous to propagating plants from cuttings. It might seem that we can only use our model if we say that the one parent is half father and half mother. The father half of the parent provides one set of chromosomes and the mother half, the other set. The model appears to be crude, since the simplest model of cloning seems to be the one-to-one relationship parent (3)
|
| asexual reproduction
|
person
However, genetically, the clone is a sibling rather than a child - equivalent to an identical twin. So, in fact, model (2) applies without modification and results in a family group like
father = mother (4)
|
____________________________
| | |
child 1 child 2 → child 2 clone