Using Large Language Models Like ChatGPT to Create Genealogy Summaries and Abstracts from Historical Documents

I’m a huge fan of creating full transcriptions of complex or lengthy genealogical documents, such as probate records, deeds, and military pension files. These documents often contain valuable insights buried in the legal language that we can miss if we only read or skim the material. However, once the transcription is complete, taking the additional step of creating genealogy summaries or abstracts from those lengthy transcriptions produces research notes that are more usable and discoverable—making it easier to find important information and incorporate it into our research and writing.

Large language models (LLMs) like ChatGPT can assist in converting our genealogy transcriptions into summaries, making it easier to extract essential information. LLMs can quickly read through a transcribed genealogical document and highlight key points, saving us considerable time. However, to get the best results, it’s important to follow some best practices when learning to summarize historical documents with AI.

Getting Started

Transcriptions can be pasted into the text box following your prompt. ChatGPT and similar tools have a limit on how much text can be processed at once, so you may need to experiment with breaking your document into parts if it is too long.

When inputting your request, be specific. Yes, you can get reasonable results by simply asking for a summary.

Example:

“Write a paragraph summarizing the key information from the following text,”


ChatGPT Output:

Basic one paragraph summary of a historical deed created by ChatGPT.

However, providing details increases the likelihood of a result that meets your needs. Tell the AI what type of document you’re asking it to read, the time period (if possible), and the summary format. This helps the model understand the nuances of the text as well as what you’re looking for.

Example:

“Create a genealogical abstract from the following transcription from an 1850s Chatham County, Georgia, deed book. Consider text enclosed in square brackets to be a possible alternate interpretation of the word or phrase that it follows. Please include dates, all names and alternate names, and other pertinent information.”


ChatGPT Output:

Detailed summary of a deed created by ChatGPT from a transcription of the original.

Best Practices

  • Quality of input is essential. Ensure the transcriptions are as accurate as possible.
  • Have a conversation with the AI. If there’s something you would like done differently after you see the initial response—perhaps you want more narrative and fewer bullet points or some information was left out that you wanted included—then let the AI know and ask it to produce a new version. Don’t be afraid to experiment.
  • Always review and edit the genealogy summaries produced. Compare them with the original documents to ensure accuracy and completeness.

Potential Pitfalls

LLMs can sometimes generate information that isn’t present in the document, a phenomenon known as “hallucination.” To minimize this, be as specific as possible in your instructions and consider breaking long documents into smaller sections. Historical language and terminology can also pose challenges. Providing definitions of archaic terms found in the document can help improve accuracy.


By following best practices and being mindful of potential pitfalls, you can ensure your AI-generated genealogy summaries are both accurate and useful. Using AI to summarize transcribed genealogical documents is a good way to dip your toes into the waters of AI and LLMs while making your work more productive and rewarding. Use the time savings to focus on deeper analysis and insights!