The evaluator’s guide to generative AI

Resource type
Authors/contributors
Title
The evaluator’s guide to generative AI
Abstract
AI’s role in evaluation: so long and thanks for all the synthesis How generative AI can enhance evaluation and analysis? We share three real-world case studies to explore what works, what doesn’t, and how to harness AI responsibly. Generative AI is everywhere—but is it really the answer to Life, the Universe and Everything? At Itad, we see AI as a powerful assistant, not a replacement for human judgment. Our new learning brief, The evaluator’s guide to generative AI, shares lessons from three real-world evaluations where we piloted AI tools to support rigorous analysis. Real-world AI use to support our work In our evaluation of the Wellcome Trust’s Climate Impacts Awards, we used Microsoft 365 Copilot to extract and synthesise data from hundreds of grant proposals. This freed up our team to focus on interpretation and recommendations. For the MacArthur Foundation’s Big Bet on Nigeria programme, we used CoLoop AI to transcribe and translate interviews, speeding up data processing. However, we found it needed careful handling to support nuanced data analysis. And in our work with a partner that supports organisations throughout Africa to generate evidence and data, we combined Copilot, ChatGPT and MaxQDA’s AI tools to analyse nearly 450 grantee reports and interviews. Each case offered valuable insights into where AI adds value and where human expertise remains essential. Lessons for rigorous AI use Across these projects, we identified seven key lessons for using AI in evaluation: 1. AI is great at extraction; humans are great at sensemaking AI tools excel at structuring large volumes of data and surfacing patterns. But interpretation, trade-offs and conclusions must come from people. In the Wellcome case, Copilot helped us quickly synthesise proposal data, but our team’s judgment was still vital to make sense of the findings. 2. Break the work into smaller steps Complex tasks can overwhelm AI. Better results can be gained by breaking analysis into clear stages, such as defining a framework, extracting evidence, classifying responses and synthesising findings. This approach improves accuracy and makes verification easier. 3. Design prompts around your framework The quality of AI output depends on how you ask the question. Structured prompts aligned to evaluation frameworks, with clear definitions and examples, helps AI stay on-topic and produce more consistent results. 4. Always verify the results Even when AI seems accurate, it’s essential to check. Build in spot-checks and scoring to assess completeness, relevance, and accuracy. In the Wellcome case, manual review of 15% of proposals confirmed high fidelity. In other cases, verification helped us identify and correct oversimplifications. 5. Limit the context for nuanced analysis AI performs better when given focused tasks. In the MacArthur evaluation, CoLoop struggled with broad thematic analysis. We found that narrowing the scope to analyse one theme or subset at a time leads to more meaningful insights. 6. Choose the right tool for the job Different tools have different strengths, and matching the right tool to the task is key. Our experience suggests that ChatGPT is best for contextual analysis, Copilot works well for secure document review and MaxQDA’s Tailwind supported synthesis of coded data. 7. Document your process To build confidence in the use of AI, a clear audit trail is essential. Keeping records of prompts, outputs and verification steps helped us build trust with clients, support internal learning and ensure transparency. Looking ahead: scaling AI in evaluation The next challenge is scaling AI use without losing rigour. This means using AI to reduce manual effort, such as document triage or evidence extraction, while keeping humans in the loop for interpretation and decision-making. At Itad, we’re building bespoke platforms that embed AI into our workflows. For example, one new tool we’re developing for a philanthropic partner organisation will help surface insights from five years of grantee reports. We’re also investing in an internal platform to make our best AI practices available across all projects, laying the groundwork for new service areas including foresight and scenario planning. As we explore these frontiers, we’re guided by a simple truth: AI can help us ask better questions and work more efficiently—but it’s still up to us to make sense of the answers.
Report Type
Learning Brief
Institution
Itad
Place
Brighton, UK
Date
2026.03
Language
en
Library Catalogue
Zotero
Citation
Perry, C., Moses, M., & Pinter, M. (2026). The evaluator’s guide to generative AI [Learning Brief]. Itad. https://www.itad.com/knowledge-product/the-evaluators-guide-to-generative-ai/