GEDCOM

From Wikipedia, the free encyclopedia

Jump to: navigation, search
GEDCOM

GEDCOM 5.5 Standard + Errata Sheet (2 January 1996)
Filename extension .ged
Developed by LDS FHD
Type of format Genealogy data exchange
Standard(s) De facto[1]

GEDCOM, an acronym for GEnealogical Data COMmunication, is a de facto[2] specification for exchanging genealogical data between different genealogy software. GEDCOM was developed by The Church of Jesus Christ of Latter-day Saints as an aid to genealogical research[3].

A GEDCOM file is plain text (usually either ANSEL or ASCII) containing genealogical information about individuals, and meta data linking these records together. Most genealogy software supports importing from and/or exporting to GEDCOM format.[4] However, some genealogy software programs incorporate the use of proprietary extensions to the GEDCOM format, which are not always recognized by other genealogy programs, for example the GEDCOM 5.5 EL (Extended Locations) specification.[5][6][7].

Contents

[edit] GEDCOM model

GEDCOM uses a lineage-linked data model. This data model is based on the nuclear family and the individual. This contrasts with evidence-based models, where data are structured to reflect the supporting evidence. In the GEDCOM lineage-linked data model, all data are structured to reflect the believed reality, that is, actual (or hypothesized) nuclear families and individuals.

[edit] GEDCOM file structure

A GEDCOM file consists of a header section, records, and a trailer section. Within these sections, records represent people (INDI record), families (FAM records), sources of information (SOUR records), and other miscellaneous records, including notes. Every line of a GEDCOM file begins with a level number where all top-level records (HEAD, TRLR, SUBN, and each INDI, FAM, OBJE, NOTE, REPO, SOUR, and SUBM) begin with a line with level 0, while other level numbers are positive integers.

Although it is theoretically possible to write a GEDCOM file by hand, the format was designed to be used with software and thus is not especially human-friendly. A GEDCOM validator[8] that can be used to validate the structure of a GEDCOM file is included as part of PhpGedView project, though it is not meant to be a standalone validator. For standalone validation you can use "The Windows GEDCOM Validator" [9] or the older unmaintained Gedcheck[10] from the LDS.

During 2001, The GEDCOM TestBook Project evaluated how well four popular genealogy programs conformed to the GEDCOM 5.5 standard using the Gedcheck program.[11] Findings showed that a number of problems existed and that The most commonly found fault leading to data loss was the failure to read the NOTE tag at all the possible levels at which it may appear.[12] In 2005, the Genealogical Software Report Card was evaluated, (by Bill Mumford who participated in the original GEDCOM Testbook Project)[13] and included testing the GEDCOM 5.5 standard using the Gedcheck program.[14]

[edit] Example

sample.ged
0 HEAD 
1 SOUR Reunion
2 VERS V8.0
2 CORP Leister Productions
1 DEST Reunion
1 DATE 11 FEB 2006
1 FILE test
1 GEDC 
2 VERS 5.5
1 CHAR MACINTOSH
0 @I1@ INDI
1 NAME Bob /Cox/
1 SEX M
1 FAMS @F1@
1 CHAN 
2 DATE 11 FEB 2006
0 @I2@ INDI
1 NAME Joann /Para/
1 SEX F
1 FAMS @F1@
1 CHAN 
2 DATE 11 FEB 2006
0 @I3@ INDI
1 NAME Bobby Jo /Cox/
1 SEX M
1 FAMC @F1@
1 CHAN 
2 DATE 11 FEB 2006
0 @F1@ FAM
1 HUSB @I1@
1 WIFE @I2@
1 MARR 
1 CHIL @I3@
0 TRLR

The following is a sample GEDCOM file. The first column indicates an indentation level.

The header (HEAD) includes the source program and version (Reunion, V8.0), the GEDCOM version (5.5), and the character encoding (MACINTOSH), (Which is invalid, as according to the GEDCOM 5.5 specification, valid choices are (ANSEL), (UNICODE) or (ASCII).)

The individual records (INDI) define Bob Cox(ID 1—@I1@), Joann Para (ID 2), and Bobby Jo Cox (ID 3).

The family record (FAM) links the husband (HUSB), wife (WIFE), and child (CHIL) by their ID numbers.

[edit] Versions

The current version of the specification is GEDCOM 5.5, which was released on 12 January 1996. A subsequent draft GEDCOM 5.5.1[15] specification was issued in 1999, introducing nine new tags, including WWW, EMAIL and FACT, and adding UTF-8 as an approved character encoding. This draft has not been formally approved, but its provisions have been adopted in some part by a number of genealogy programs[16][17][18] and is used by FamilySearch.org[19] While PAF 5.2 does support GEDCOM 5.5, PAF 5.2 uses UTF-8 as its internal character set, a feature which was introduced in the GEDCOM 5.5.1 draft, and can output a UTF-8 GEDCOM.[20][21]

On January 23, 2002 a draft(beta) version of GEDCOM 6.0 was released for developers to study only as it was not a complete specification and recommended not to begin to implement in their software. For example, descriptions of the meaning and expected contents of tags were not included.[22] GEDCOM 6.0 was to be the first version to store data in XML format, and was to change the preferred character set from ANSEL to Unicode.

Today, lineage-linked GEDCOM is still the deliberate de facto common denominator[23]. Despite version 5.5 of the GEDCOM standard first being published in 1996, many genealogical software suppliers have yet to support the feature of multilingual Unicode text (instead of the ANSEL character set) introduced with that version of the specification. Uniform use of Unicode would allow for the usage of international character sets. An example is the storage of East Asian names in their original Chinese, Japanese, and Korean (CJK) characters, without which they could be ambiguous and of little use for genealogical or historical research.[24]

[edit] Release History

Meaning
Red Old Standard/Draft; not supported
Yellow Old Standard; still supported
Green Current Standard
Blue Future Draft
GEDCOM Version Release Date Notes
1.0[25] 1984[26] -
2.0[27] Dec 1985[28] PAF 2.0
2.1 Feb 1987[29] GEDCOM for PAF 2.1
2.3 Draft 7 August 1985[30] with PAF2.0 GEDCOM implementation conventions
2.4 Draft 13 December 1985[31] with PAF2.0 GEDCOM implementation conventions
3.0 Standard[32] 9 October 1987[33] PAF 2.0 and 2.1 implementation of 3.0
4.0 Standard August 1989 PAF 2.1 - 2.31
4.1 Draft[34] - -
4.2 Draft[35] 25 January 1990[36] -
5.0 Draft[37] 31 December 1991[38] lineage-linked structures were introduced.[39]
5.1 Draft 18 September 1992[40] -
5.2 Draft 22 January 1992[41] -
5.3 Draft 4 November 1993[42] Unicode standard (ISO/IEC 10646) was introduced as an additional character set.
5.4 Draft 21 August 1995[43] -
5.5 Standard 11 December 1995[44] PAF 3, 4 and 5
5.5 Standard + Errata Sheet 2 January 1996[45][46] PAF 3, 4 and 5
GEDCOM (Future Direction) Draft[47] 1 May 1998[48][49] "it used an entirely new data model"[50]
5.5.1 Draft[51][52] 2 October 1999[53] Used by FamilySearch.org[54] UTF-8 added as an approved character encoding.
5.6 Private Draft -[55] "Jed Allen sent those two files to a few people only for sort of "private comments"[56]
6.0 XML Draft 28 December 2001[57] Was not a complete specification and recommended not to begin to implement in their software.

[edit] Limitations

One weakness of GEDCOM is that events cannot be shared among multiple individuals except for a few predefined family events such as marriage (MARR).[citation needed] This means that if an event were associated with multiple individuals, the data must be duplicated in each record. For example, if multiple people were listed on a census, the census data (CENS) would need to be duplicated for each of the individuals referenced there. Most genealogy programs would require the user to enter the data multiple times through their user interface.[citation needed]

The way GEDCOM handles places is also considered a weakness of the specification.[citation needed] Places are encoded as strings on the events. An example would look like this following:

1 BIRT
2 PLAC New York City, , New York, USA

Additional references to New York City are represented by additional strings, so changes (for example, to add the county or change spelling) require changing every occurrence throughout the file. This also leads to duplication of information if geo-coding or other subrecords are added to the place.

Sometimes the GEDCOM specification has been made purposefully flexible to support many ways of encoding data, particularly in the area of sources. This flexibility has led to a great deal of ambiguity, and has produced the side effect that some genealogy programs which import GEDCOM do not import all of the data from a file.[citation needed]

GEDCOM does not allow representation of many types of close interpersonal relationships, such as same-sex marriages, domestic partnerships, cohabitation, polyamory, and polygamy, most of which are increasingly recognized in modern Western societies as well as diverse cultures and historical contexts.[citation needed]

[edit] Myths

A common myth about GEDCOM is that it doesn't support multi-media[citation needed] such as photos which are often attached to people. GEDCOM actually does support the linking of multi-media items as references to files on the file system. An example would look like this:

0 @I1@ INDI
1 NAME John /Doe/
1 OBJE @O1@
0 @O1@ OBJE
1 FILE johndoe.jpg
2 TITL John Doe at age 50

If this were encoded in a GEDCOM file named "doe.ged" then the file "johndoe.jpg" would need to be in the same directory as the file "doe.ged." The disadvantage with this media encoding comes when you want to transfer this GEDCOM file AND the attached multimedia because you would have to manage the transfer of the files separately. A possible solution is to use a data compression program to archive the GEDCOM and the media into the same file.

Another myth is that GEDCOM does not support multiple opinions[citation needed] or conflicting data[citation needed]. Sometimes in genealogy, it is desirable to record conflicting information. An example of such conflicting information would be a birth certificate listing a birth date as 10 January 1800 and a death certificate listing the birth date as 11 January 1800. Such mistakes are common due to transcription errors or perhaps the relative giving the information forgot. Good practices in genealogy research require that the researcher record both birth records along with the source citation information. GEDCOM does not prevent you from recording two birth (BIRT) records for a single person. In the case of multiple instances of the same record, the preferred record should be listed first in the record. This example encoded in GEDCOM might look like this:

0 @I1@ INDI
1 NAME John /Doe/
1 BIRT
2 DATE 10 JAN 1800
2 SOUR @S1@
3 DATA
4 TEXT Transcription from birth certificate would go here
3 NOTE This birth record is preferred because it comes from the birth certificate
3 QUAY 2
1 BIRT
2 DATE 11 JAN 1800
2 SOUR @S2@
3 DATA
4 TEXT Transcription from death certificate would go here
3 QUAY 2

There is room for improvement in the area of source citations which could be used to make this clearer. The quality of assessment field (QUAY) for example could be extended to use something besides just a numerical level 0-3.[citation needed]

Another myth about GEDCOM is that it does not support internationalization in name support[citation needed]. In the same way that you can have multiple events on a person, GEDCOM allows you to have multiple names for a person.[citation needed] If the GEDCOM were encoded in the unicode or UTF-8 character sets, then you may also record the name in Cyrillic, Hebrew, or any other alphabet.[citation needed] GEDCOM's NAME field also supports a phonetic variation (FONE) and a romanized variation (ROMN) of the name.[citation needed]

[edit] Alternatives to GEDCOM

Commsoft, the authors of the Roots[58] series of genealogy software and Ultimate Family Tree, defined a version called Event-Oriented GEDCOM (also known as "Event GEDCOM" and originally called InterGED[59]),[60] which included events as first class (zero-level) items. Although it is event based, it is still a model built on assumed reality rather than evidence. Event GEDCOM was more flexible, as it allowed some separation between believed events and the participants. However, Event GEDCOM was not widely adopted by other developers due to its semantic differences.[citation needed] With Roots and Ultimate Family Tree no longer available, very few people today are using Event GEDCOM.[61]

  • Event-Oriented GEDCOM specification - Draft Release 1.0 (12 September 1994) (Microsoft Word in a ZIP file)
  • GEDC - An XML-Based Standard for Genealogy - originated as an outgrowth of the GEDCOM 6.0 Beta specification and Gentech's Genealogical Data Model (GDM).
  • FamilyML based on XML and allows the exchange of genealogical data. Also based on GedML and GEDCOM, but in a more human-readable format.
  • GDMUML - Genealogical Data Models in the Unified Modeling Language
  • GedML: Genealogical Data in XML - combines the GEDCOM data model with the XML standard.
  • Gendatam - genealogical data model
  • The GENTECH Genealogical Data Model
    • gdmxml, a RELAX NG Schema to validate XML documents with genealogical information according to the GENTECH Genealogical Data Model.
  • GeniML - Genealogical Information Markup Language (pronounced 'jeenie em el') is a data model and XML vocabulary for recording and exchanging genealogical data.
  • GenXML is a file format for exchange of data between genealogy programs.
  • GREnDL - Genealogical Record Exchange and Description Language - XML specification[62]
  • GEDCOM 5.5 XML - "attempts to be a 100 percent one-to-one translation of GEDCOM 5.5 into XML; it even includes the superfluous (and empty) <TRLR/> element." - By Chad Albers - neomantic.com

[edit] References

  1. ^ Subject: GEDCOM and Formal Standards Organizations Date: Wed, 24 Jan 1996 11:53:52 -0700 From: Bill Harten - Organization: Brigham Young University "why wasn't GEDCOM developed through a formal standards organization?..."Thus GEDCOM was born as a deliberate, de facto standard, to be followed only by those who felt it was in their best interest to do so
  2. ^ Subject: GEDCOM and Formal Standards Organizations Date: Wed, 24 Jan 1996 11:53:52 -0700 From: Bill Harten - Organization: Brigham Young University "why wasn't GEDCOM developed through a formal standards organization?..."Thus GEDCOM was born as a deliberate, de facto standard, to be followed only by those who felt it was in their best interest to do so
  3. ^ Subject: rep: T Jenkins - open letter to GEDCOM-L - "The goal was to try and provide a standard to allow developers to provide a vehicle for their users to share genealogical conclusions and supporting evidence with others." From: "Jed R. Allen" Brigham Young University - Date: 29 Sep 1995 17:40:04 -0600 - GEDCOM-L Archives -- September 1995, week 5 (#7)
  4. ^ "Genealogical Software Report Card". March 2005. http://www.mumford.ca/reportcard/index.htm. 
  5. ^ GEDCOM 5.5 EL(Extended Locations) specification
  6. ^ Ability to save information against places - "Support for parts of the GEDCOM 5.5EL proposal" - FHUG Wish List
  7. ^ 0000688: Support for Gedcom 5.5EL - Gramps Bugtracker
  8. ^ View of phpgedview's GEDCOM validator source code
  9. ^ The Windows GEDCOM Validator
  10. ^ Gedcheck - "uses a grammar file for the specific version of GEDCOM you want to check against." The Church of Jesus Christ of Latter-day Saints
  11. ^ "GEDCOM TestBook Project". 2001. https://www.ngsgenealogy.org/ngsgentech/projects/TestBook2001/index.cfm. 
  12. ^ [GEDCOM and the GenTech Testbook Project] Genealogical Computing 7/1/2001 - Archive Summer 2001 Vol. 21.1 - Ancestry.com
  13. ^ The Genealogical Software Report Card 2000 S W Mumford Last updated March 2005
  14. ^ Reviews from the NGS Newsmagazine and its Predecessors. - Test Result are in the PDF's
  15. ^ GEDCOM 5.5.1 draft
  16. ^ GED-GEN is based on GEDCOM version 5.5.1 (draft), dated 2 October 1999. The following record types are parsed: header, individual, family, notes, source, and repository. However not all elements within these records are processed. - Specifications - GED-GEN Introduction
  17. ^ 0000688: Support for Gedcom 5.5EL(0008068) romjerome (developer) 2009-01-25 06:13 - "Note : GRAMPS 3.0.x supports a part of GEDCOM 5.5.1 on export, which is not supported by most programs" - Gramps Bugtracker
  18. ^ "MyBlood supports the GEDCOM 5.5 and 5.5.1 file format." - MyBlood Support - Forum, FAQ, Know Problems
  19. ^ FamilySearchtoGEDCOMMapping.doc - FamilySearch XML to GEDCOM Mapping - .."The GEDCOM v5.5.1 (http://www.phpgedview.net/ged551-5.pdf) specification was used as the target for the GEDCOM side."
  20. ^ Personal Ancestral File 5.2 and PAF Companion 5.4 - Software Version Changes Release 5.0.1.4, 22 December 2000 - "10.GEDCOM improvements: Table:Destination:PAF 5 GEDCOM Version:5.5 Character Set:UTF-8
  21. ^ Personal Ancestral File 5.1 - "Also noted in a second test was the use of four tags from a later draft version of the Gedcom specification, FONE (phonetic name), ROMN (romanized name), EMAIL (e-mail), and _UID" Jan/Feb 2002 NGS Newsmagazine
  22. ^ GEDCOM 6.0 specification
  23. ^ Subject: GEDCOM and Formal Standards Organizations Date: Wed, 24 Jan 1996 11:53:52 -0700 From: Bill Harten - Organization: Brigham Young University "why wasn't GEDCOM developed through a formal standards organization?..."Thus GEDCOM was born as a deliberate, de facto standard, to be followed only by those who felt it was in their best interest to do so"
  24. ^ Personal Ancestral File 5.2 and PAF Companion 5.4 - Software Version Changes Release 5.0.1.4, 22 December 2000 - "10.GEDCOM improvements: Table:Destination:PAF 5 GEDCOM Version:5.5 Character Set:UTF-8
  25. ^ pafuser : Beitrag: Re: [pafuser PAF 5.01 und GEDCOM] By Eckhard Henkel - Beitrag #103 von 1494 - Yahoo Groups
  26. ^ Subject:description of InterGED theory From:Gary Steiner - "The first GEDCOM standard, version 1.0, was released to the genealogical software development community in 1984." - GEDCOM-L Archives -- July 1994, week 4 (#14)
  27. ^ pafuser : Beitrag: Re: [pafuser PAF 5.01 und GEDCOM] By Eckhard Henkel - Beitrag #103 von 1494 - Yahoo Groups
  28. ^ Subject:Timeline of GEDCOM versions and PAF By George Archer - GEDCOM-L Archives -- November 2000, week 3 (#12)
  29. ^ Subject:Timeline of GEDCOM versions and PAF By George Archer - GEDCOM-L Archives -- November 2000, week 3 (#12)
  30. ^ Subject:Re: GEDCOM standards help pleaseFrom:Graham Starkey - "DRAFT VERSION 2.3 - 7 August 1985 with PAF2.0 GEDCOM implementation conventions" - GEDCOM-L Archives -- June 2000, week 4 (#1)
  31. ^ Subject:Re: GEDCOM standards help pleaseFrom:Graham Starkey - "DRAFT VERSION 2.4 - 13 December 1985 with PAF2.0 GEDCOM implementation conventions" - GEDCOM-L Archives -- June 2000, week 4 (#1)
  32. ^ pafuser : Beitrag: Re: [pafuser PAF 5.01 und GEDCOM] By Eckhard Henkel - Beitrag #103 von 1494 - Yahoo Groups
  33. ^ RootsWeb: ROOTS-L Re: Large Charts (fairly long):Date:Tue, 11 Jul 89 15:14:31 CDT From: Marty Hoag <NU021172@N...> Subject:Re: Printing trees with PAF? From soc.roots ... * GEDCOM release 3.0, 9 Oct 1987, 131 pages (!)
  34. ^ This GEDCOM format uses the 4.1 GEDCOM specification
  35. ^ File Structures for PAF and GEDCOM - Date: 1996/01/04 - soc.genealogy.computing | Google Groups:
  36. ^ Subject:4.x specsFrom:Rafal Prinke -"while this document has the date January 25, 1990. So maybe it is GEDCOM 4.2 ?" - GEDCOM-L Archives -- May 1994, week 1 (#19)
  37. ^ pafuser : Beitrag: Re: [pafuser PAF 5.01 und GEDCOM] By Eckhard Henkel - Beitrag #103 von 1494 - Yahoo Groups
  38. ^ Subject:Re: GEDCOM standards help pleaseFrom:Graham Starkey - "DRAFT Release 5.0 - 31 December 1991 150kb" - GEDCOM-L Archives -- June 2000, week 4 (#1)
  39. ^ Subject: GEDCOM (Future Direction) Announced by Family History From: "Jed R. Allen" Date: Fri, 1 May 1998 18:08:24 -0600
  40. ^ Subject:Timeline of GEDCOM versions and PAF By George Archer - GEDCOM-L Archives -- November 2000, week 3 (#12)
  41. ^ Subject:Re: GEDCOM standards help pleaseFrom:Graham Starkey - "DRAFT Release 5.2- 22 January 1992 120kb" - GEDCOM-L Archives -- June 2000, week 4 (#1)
  42. ^ GEDCOM 5.3 draft - 4 November 1993
  43. ^ THE GEDCOM STANDARD - DRAFT Release 5.4 - 21 August 1995
  44. ^ Subject:Timeline of GEDCOM versions and PAF By George Archer - "5.5 11 Dec 1995 (Title Page for 5.5)"- GEDCOM-L Archives -- November 2000, week 3 (#12)
  45. ^ GEDCOM 5.5 Standard (Executable file in Envoy format)
  46. ^ Re: Looking for GEDCOM versions 4 & 5.xx "Brian C. Madsen" - "A GEDCOM 5.5 Errata Sheet dated 10 January 1996 supposedly contains corrections to pages 23, 24, 25, 26, 29, 29, 29, 33, 34, 39, 57, 79, and 85."
  47. ^ Subject: GEDCOM (Future Direction) Announced by Family History From: "Jed R. Allen" Date: Fri, 1 May 1998 18:08:24 -0600
  48. ^ "According to the Family History Department's Jed Allen who sent out the announcement message, the GEDCOM (FD)"
  49. ^ Comments on the GEDCOM Future Directions document Michael H. Kay, 17 May 1998
  50. ^ Subject:GEDCOM Future Directions - From:John Nairn - Date:Mon, 11 May 1998 13:38:45 -0600 - GEDCOM-L Archives -- May 1998, week 2 (#3)
  51. ^ GEDCOM 5.51 data model in UML format - Software Renovation Corporation
  52. ^ Modifications in Gedcom Version 5.5.1 compared to Gedcom 5.5
  53. ^ GEDCOM 5.5.1 draft
  54. ^ FamilySearchtoGEDCOMMapping.doc - FamilySearch XML to GEDCOM Mapping - .."The GEDCOM v5.5.1 (http://www.phpgedview.net/ged551-5.pdf) specification was used as the target for the GEDCOM side."
  55. ^ Subject:Re: GEDCOM History From:STEFANO BOSCOLO - Date:Tue, 20 Feb 2001 19:54:06 +0100 - GEDCOM-L Archives -- February 2001, week 3 (#1)
  56. ^ Subject: Re: GEDCOM History From:"Rafal T. Prinke" - Date:Tue, 20 Feb 2001 22:14:55 +0100 - GEDCOM-L Archives -- February 2001, week 3 (#4)
  57. ^ Draft Specification for GEDCOM XML 6.0
  58. ^ CommSoft to Return?Dick Eastman Online 3/14/2001 - Archive - Ancestry.com
  59. ^ RootsWeb: TMG-L [TMG InterGED/Event GEDCOM] Date: Fri, 15 Feb 2002 13:33:18 -0700
  60. ^ "Commsoft : The Roots Story". http://sonic.net/~commsoft/rstory.html. Retrieved on 2008-11-20. 
  61. ^ "TMGL Archives : Event-oriented". 2000-06-29. http://archiver.rootsweb.com/th/read/TMG/2000-06/0962255126. Retrieved on 2008-11-20. 
  62. ^ The Windows GEDCOM Validator:History By Tim Forsythe "I personally feel the GEDCOM format has way too many limitations (See the XML specification for GREnDL 1.1 for an alternate approach - and yes I wrote it" rumblefische.com

[edit] External links

[edit] See also

Personal tools