From Throughput to Learning: How to Get Compound Returns From GenAI

This MIT Sloan Management Review article explores how organizations can generate compounding business value from generative AI through broader integration and strategic alignment. Connect with ICON, SRL to discuss how AI initiatives can scale beyond isolated use cases.

Frequently Asked Questions

What does “compounding value” from generative AI actually mean?

How should we structure our AI workflows to move beyond basic productivity gains?

To move beyond simple time savings, you need to design AI workflows around **three connected operations**: verification, evaluation, and learning capture. When all three are present and linked, you start to see compounding benefits.

Verification: “Does this meet our standard?”
This is the basic quality gate.
- Binary check: correct/incorrect, usable/not usable.
- Compares AI output to existing criteria (brand rules, compliance requirements, technical specs).
- Prevents “confident nonsense” from slipping through, but on its own it doesn’t create learning.
Evaluation: “What does this output reveal?”
Here, domain experts look beyond pass/fail.
- They ask: What worked? What failed? What was interestingly wrong — wrong in a way that teaches us something about the problem?
- Evaluation can actually create new standards that didn’t exist before.
- This step is where tacit expertise starts to become explicit.
- Human bandwidth for evaluation (not AI access) becomes the real bottleneck, especially as volume, variety, and velocity of outputs increase.
Learning capture: “How do we make this insight persist?”
Without capture, everything you learned in the session evaporates.
- Turn expert judgments into reusable assets: updated prompts, checklists, templates, decision logs.
- Think of it as “version control for organizational judgment.”
- Design for retrievability, not perfection: prompt libraries, annotated model logs, simple evaluation notes.

These three steps reinforce each other:

Better verification → cleaner signals for evaluation.
Better evaluation → richer material for capture.
Better capture → smarter criteria and prompts for the next round of verification.

The article also highlights five practical moves leaders can make:

Preserve evaluation expertise: Don’t let deep domain expertise atrophy just because AI can “do the work.” You need experts as evaluators, not just producers.
Build verification mechanisms: Use minimally viable checks (multi-judge reviews, consistency checks) to keep costs reasonable while still filtering out bad outputs.
Institute evaluation practices: Make “What worked? What failed? What was interestingly wrong?” a standard part of AI-augmented workflows.
Create capture systems: Use lightweight tools like decision journals, prompt repositories, and evaluation logs to store and share learning.
Measure the cycle, not just the output: Track how many interactions are verified, evaluated, and captured — and how quickly those learnings change practice.

When you embed this cycle into everyday work, you’re not just doing the same tasks faster; you’re **reimagining how your organization learns from its own behavior and from AI’s behavior over time**.

Where should we focus first to get meaningful returns from generative AI?

A practical starting point is to focus on **domains where your people already have deep expertise** and where you can realistically run the full cycle of verify → evaluate → capture. The article suggests several priorities: 1. Start where expertise is strong, not weak.
AI can compress implementation, but it cannot compress the formation of expertise. When you deploy AI into areas where your teams already have months or years of judgment, they can:

Recognize when an output is “not perfect but usable.”
Interrogate what the output reveals about assumptions and blind spots.
Turn “interesting failures” into new standards and prompts.

If you start in areas with shallow expertise, people tend to only accept or reject outputs — they don’t have the depth to evaluate and learn from them. 2. Build minimally viable verification first.
Don’t wait for perfect evaluation frameworks. Begin with simple, credible checks:

Multi-reviewer systems that flag disagreement.
Consistency checks across different formulations of the same problem.
Basic automated tests where possible (for example, in code or financial models).

This gives you enough confidence to move into deeper evaluation without overengineering the process. 3. Make evaluation and capture part of normal work.
Instead of treating AI reviews as side projects, embed them into existing workflows. For example:

Marketing: After AI drafts a campaign brief, a strategist verifies brand fit, evaluates what’s new or missing, and updates shared prompt templates with any new rules (for example, “We lead with customer identity, not product features”).
Finance: After AI generates a scenario model, an analyst stress-tests it against historical data, notes where it was off, and logs those insights in an annotated model repository.

4. Measure learning, not just efficiency.
Most organizations track AI by tools adopted, hours saved, or tasks completed. The article frames these as **consumption metrics**. To see compounding value, add questions like:

How many AI interactions were both verified and evaluated?
How often did captured learning change how we worked the following week?
Are we seeing new standards, prompts, or playbooks emerge from AI use?

Research cited in the article shows that organizations combining strong organizational learning with AI-specific learning are **up to 80% more effective at managing uncertainty**. That’s a useful lens for prioritization: start where you can both **use AI and learn with AI**, then scale those patterns across functions.

The full experience is only one step away!

ICON BUSINESS CONSULTING, LLC is ready to help!

Please confirm your email address!