Large language model applications for evaluation: Opportunities and ethical implications
Resource type
Authors/contributors
- Head, Cari Beth (Author)
- Jasper, Paul (Author)
- McConnachie, Matthew (Author)
- Raftree, Linda (Author)
- Higdon, Grace (Author)
Title
Large language model applications for evaluation: Opportunities and ethical implications
Abstract
Large language models (LLMs) are a type of generative artificial intelligence (AI) designed to produce text-based content. LLMs use deep learning techniques and massively large data sets to understand, summarize, generate, and predict new text. LLMs caught the public eye in early 2023 when ChatGPT (the first consumer facing LLM) was released. LLM technologies are driven by recent advances in deep-learning AI techniques, where language models are trained on extremely large text data from the internet and then re-used for downstream tasks with limited fine-tuning required. They offer exciting opportunities for evaluators to automate and accelerate time-consuming tasks involving text analytics and text generation. We estimate that over two-thirds of evaluation tasks will be affected by LLMs in the next 5 years. Use-case examples include summarizing text data, extracting key information from text, analyzing and classifying text content, writing text, and translation. Despite the advances, the technologies pose significant challenges and risks. Because LLM technologies are generally trained on text from the internet, they tend to perpetuate biases (racism, sexism, ethnocentrism, and more) and exclusion of non-majority languages. Current tools like ChatGPT have not been specifically developed for monitoring, evaluation, research, and learning (MERL) purposes, possibly limiting their accuracy and usefulness for evaluation. In addition, technical limitations and challenges with bias can lead to real world harm. To overcome these technical challenges and ethical risks, the evaluation community will need to work collaboratively with the data science community to co-develop tools and processes and to ensure the application of quality and ethical standards.
Publication
New Directions for Evaluation
Volume
2023
Issue
178-179
Pages
33-46
Date
2023
Language
en
DOI
ISSN
1534-875X
Short Title
Large language model applications for evaluation
Accessed
11/12/2023, 09:47
Library Catalogue
Wiley Online Library
Rights
© 2023 American Evaluation Association and Wiley Periodicals LLC.
Extra
Citation
Head, C. B., Jasper, P., McConnachie, M., Raftree, L., & Higdon, G. (2023). Large language model applications for evaluation: Opportunities and ethical implications. New Directions for Evaluation, 2023(178–179), 33–46. https://doi.org/10.1002/ev.20556
Link to this record