LLM Reality Check: What Large Language Models Can and Can’t Do in Procurement

Why This Matters

Generative AI and large language models (LLMs) are being marketed as transformative for procurement—auto-writing RFPs, negotiating contracts, even running sourcing events. But practitioners know the gap between demos and reality can be wide. This article reviews where LLMs are proving useful, and where caution is still warranted, based on public, reputable sources from Deloitte, BCG, Gartner, and industry case studies.

Where LLMs Deliver Value Today

1. Drafting and Summarizing Documents

Icertis NegotiateAI integrates GenAI into Microsoft Word to redline draft contracts against playbooks, accelerating negotiation cycles and reducing legal review time (Icertis).
LLMs can also summarize long contracts or supplier reports, cutting manual review hours significantly.

2. Search and Q&A Across Procurement Data

Zip uses conversational AI to guide intake requests, letting employees describe needs in plain language and routing them automatically through compliant workflows (internal playbook).
Embedded in platforms, LLMs enable natural-language search: “Show me all contracts expiring in Q3 with auto-renewal clauses.”

3. Training and Enablement

Virtual assistants powered by LLMs guide employees through tasks (e.g., supplier onboarding) in plain language.
Coupa Navi (unveiled at Coupa Inspire 2025) showcases agentic AI for guided decision-making across sourcing and payments, designed to simplify user adoption (ProcurementMag.com).

4. Specialized Procurement Use Cases

AllCaps.ai applies LLMs to contract renewals. By extracting 30+ attributes from CLM systems and drafting AI-backed negotiation playbooks, it enables 10–15% savings and ~80% faster cycle times versus BPOs (internal playbook).
Globality (“Glo”) uses AI agents to interactively gather requirements and draft RFPs, shortening sourcing cycle times.

Where LLMs Still Struggle

1. Accuracy and Hallucinations

LLMs generate plausible-sounding but incorrect answers. Gartner warns that unchecked use in compliance-sensitive workflows can expose firms to risk.

2. Confidentiality and Data Security

Feeding sensitive supplier or contract data into public LLMs risks leakage. Many CPOs restrict usage until enterprise-grade controls are in place.

3. Context and Domain Depth

General models lack procurement-specific training. For example, an LLM may miss nuances in escalation clauses or bid evaluation criteria.

4. Integration Complexity

Without orchestration layers, LLMs sit siloed from ERP, CLM, and sourcing platforms—limiting their impact.

Proof Points

Deloitte (2024 CPO Survey): 92% of CPOs were planning or assessing GenAI, but only 37% had piloted or deployed—highlighting the adoption gap.
BCG (Apr 2025): GenAI can reduce manual procurement work by ~30%, but requires targeted deployment and human oversight to avoid errors.
Gartner (2024): Advises that procurement teams treat LLMs as assistants, not decision-makers, due to risks of hallucinations and compliance exposure.

Action Plan for CPOs

Start with Low-Risk Use Cases: Drafting, summarization, and knowledge search.
Secure the Environment: Use enterprise-grade deployments with data governance.
Layer Human Review: Require human sign-off for outputs used in sourcing, contracting, or compliance.
Measure Impact: Track cycle time saved on document prep and user adoption of AI-powered search.

Takeaway

LLMs are valuable assistants, not autonomous agents. In procurement today, they shine in drafting, summarizing, and answering questions—but they cannot replace human judgment in negotiations, compliance, or strategy. Leaders who deploy LLMs pragmatically—focusing on narrow, high-value, low-risk workflows like renewals, RFP automation, and guided intake—will capture real gains without falling for hype.