A LangChain-based pipeline for one-shot synthetic text generation using generative pre-trained transformers in palliative care research

dc.contributor.authorRonan, Isabelen
dc.contributor.authorCrowley, Patriceen
dc.contributor.authorRombouts, Evaen
dc.contributor.authorCornally, Nicolaen
dc.contributor.authorSaab, Mohamad M.en
dc.contributor.authorMurphy, Daviden
dc.contributor.authorTabirca, Sabinen
dc.contributor.funderScience Foundation Irelanden
dc.contributor.funderResearch Irelanden
dc.contributor.funderUniversity College Corken
dc.date.accessioned2025-10-20T13:57:59Z
dc.date.available2025-10-20T13:57:59Z
dc.date.issued2025-10-15en
dc.description.abstractObjective: As the world’s population ages, nursing homes are of increasing importance. In order to care for a growing number of older adults, intelligent technologies are needed. Artificial Intelligence can be utilised to enhance palliative care in nursing homes. However, the data needed to train artificially intelligent agents is lacking within this sensitive domain due to privacy issues. Therefore, it is difficult for researchers to develop technological solutions. With the advent of large language models, such as ChatGPT, new text generation methods are made possible using limited data. In this pilot study, we investigate the use of large language models to generate synthetic data. Methods: We investigate the feasibility of using GPT-3.5 and GPT-4o models along with one-shot prompting to produce synthetic nurse notes which faithfully describe nursing home residents with met or unmet palliative care needs. We used LangChain to create a repeatable pipeline which can be adapted to different use-cases. We also compare the performance of both models using a set of qualitative and quantitative evaluations to determine which set of notes is more suitable for subsequent research. Results: GPT-3.5 performed slightly better than GPT-4o in our qualitative healthcare professional analysis. Quantitative analysis revealed appropriately heterogenous results across contextual similarity, lexical overlap, sentiment, and readability scores. Conclusion: Our work is the first investigation of such a generation method in the nursing home palliative care domain. Further refinement and validation of such data is needed in order to ensure the safe use of our approach.en
dc.description.versionAccepted Versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.articleid104936en
dc.identifier.citationRonan, I., Crowley, P., Rombouts, E., Cornally, N., Saab, M. M., Murphy, D. and Tabirca, S. (2025) 'A LangChain-based pipeline for one-shot synthetic text generation using generative pre-trained transformers in palliative care research', Journal of Biomedical Informatics, 171, 104936 (12pp). https://doi.org/10.1016/j.jbi.2025.104936en
dc.identifier.doi10.1016/j.jbi.2025.104936en
dc.identifier.endpage12en
dc.identifier.issn1532-0464en
dc.identifier.journaltitleJournal of Biomedical Informaticsen
dc.identifier.startpage1en
dc.identifier.urihttps://hdl.handle.net/10468/18068
dc.identifier.volume171en
dc.language.isoenen
dc.publisherElsevier Inc.en
dc.relation.ispartofJournal of Biomedical Informaticsen
dc.relation.projectinfo:eu-repo/grantAgreement/SFI/Centres for Research Training (CRT) Programme/18/CRT/6222/IE/SFI Centre for Research Training in Advanced Networks for Sustainable Societies/en
dc.rights© 2025, Elsevier Inc. For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.en
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en
dc.subjectSyntheticen
dc.subjectPalliativeen
dc.subjectDataseten
dc.subjectGenAIen
dc.subjectLLMen
dc.subjectPromptsen
dc.subjectLangChainen
dc.titleA LangChain-based pipeline for one-shot synthetic text generation using generative pre-trained transformers in palliative care researchen
dc.typeArticle (peer-reviewed)en
oaire.citation.volume171en
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
generatedNotes_manuscript[53].pdf
Size:
2.59 MB
Format:
Adobe Portable Document Format
Description:
Accepted Version
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.71 KB
Format:
Item-specific license agreed upon to submission
Description: