The Three Layers Between Your AI Agent and Your ERP

The architecture of what "agentic-ready" actually looks like.

McKinsey recently called ERP “the ugly stepchild” of the AI conversation.

That landed hard, because it’s exactly what I’ve been watching happen. Companies are pouring investment into AI models, agent frameworks, and orchestration platforms — while the enterprise systems that hold the data those agents need to reason against are treated as legacy baggage. An afterthought. Something to abstract away rather than fix.

The result is predictable: agents that perform brilliantly in the lab and stall the moment they touch production.

In the first article of this series, I laid out the problem in three layers — APIs that can’t keep pace with agentic reasoning loops, business event streams that don’t exist, and master data that’s fundamentally ungoverned. The response was clear: the diagnosis resonated. Now the question is what “fixed” actually looks like.

This is the architecture piece. It’s written for the CTO or enterprise architect who’s been handed the mandate to make agentic AI work — and is discovering that the real work starts three layers below the agent.

Layer 1: Rebuild the integration surface

The core mismatch is architectural, not incremental. Legacy enterprise systems were designed for human-speed, request-response interactions. An operator submits a transaction. The system processes it. The operator reads the result and decides what to do next. The cycle time is measured in minutes, sometimes hours.

Agents operate at a fundamentally different tempo. An agent executing a procurement workflow — detect a stock-out risk, evaluate alternative vendors, check contract terms, generate a purchase requisition — needs to complete that entire loop in seconds, not minutes. Each step is a system interaction. Each interaction needs to return fast enough that the agent can reason about the result and decide the next action before the context window goes stale.

Three architectural shifts make this possible.

First: move from batch to event-driven integration. The traditional pattern is synchronous, tightly coupled API calls — the agent calls SAP, waits for the response, calls Maximo, waits again. Every call is a serial bottleneck. The shift is to an event-driven architecture built on a message broker — Kafka, Google Pub/Sub, or equivalent. Systems publish events as they occur. Agents subscribe to the events they need. A quality deviation in the MES publishes an event. The maintenance agent, the inventory agent, and the production planning agent all receive it simultaneously and can reason in parallel rather than in sequence.

This isn’t theoretical. It’s how modern manufacturing platforms are being rewired. The key insight is that the ERP doesn’t need to become real-time itself — you build an event bridge that captures state changes as they happen and publishes them to a stream that agents can consume at their own tempo.

Second: build a typed integration layer between agents and production systems. No agent should call a legacy API directly. Every system interaction should pass through a versioned interface that normalises data formats, handles authentication, implements retry logic, and returns structured errors. When SAP’s OData endpoint returns an unexpected schema — and it will, because 75% of production APIs have drift from their documentation — the integration layer catches it, logs it, and returns a structured failure to the agent instead of letting a malformed response propagate into the reasoning loop.

This is where I’ve seen the most expensive mistakes. Organisations skip the integration layer because it feels like overhead. Then an agent makes a procurement decision based on a malformed inventory response, and the cost of the “overhead” looks trivial compared to the cost of the wrong purchase order multiplied across every production line.

Third: create an API inventory before you build anything. Map every production system the agent needs to read from or write to. Document the actual API characteristics — not what the vendor documentation says, but what the production endpoint actually does. REST, GraphQL, batch export, SOAP, direct database? What’s the authentication mechanism? What are the real rate limits? What’s the actual response latency under production load?

Most manufacturing environments I’ve worked in have somewhere between 8 and 15 core systems that an agent would need to interact with across a single workflow. The integration complexity isn’t in any one system — it’s in the mesh of connections between them. Map it before you build.

Layer 2: Build business event streams — not more logs

This is the layer where I see the widest gap between what enterprises have and what agentic AI needs.

Let me make the distinction concrete with a manufacturing example.

What you have today (application logs):

2026-03-15 14:32:07 INFO PurchaseOrderService.create() called
2026-03-15 14:32:07 DEBUG Payload: {vendor_id: "V-4421", material: "BRG-6205", qty: 24}
2026-03-15 14:32:08 INFO SAP RFC call BAPI_PO_CREATE successful, PO# 4500018823
2026-03-15 14:32:08 INFO Response: 200 OK

An engineer can use this to debug a failed transaction. An agent can extract almost nothing useful from it.

What you need (structured business events):

{
  "event": "purchase_order_created",
  "timestamp": "2026-03-15T14:32:08Z",
  "domain": "procurement",

  "intent": {
    "trigger": "reorder_point_breach",
    "triggered_by": "inventory_monitor_agent",
    "material": "BRG-6205-2RS",
    "plant": "MFG-04",
    "current_stock": 3,
    "reorder_point": 25
  },

  "context": {
    "vendor": "V-4421",
    "vendor_name": "SKF Distribution Centre",
    "vendor_lead_time_avg_days": 12,
    "vendor_on_time_delivery_pct": 94.2,
    "alternative_vendors_evaluated": 2,
    "unit_price_usd": 18.40,
    "contract_reference": "SA-2024-0891"
  },

  "outcome": {
    "po_number": "4500018823",
    "quantity_ordered": 24,
    "expected_delivery": "2026-03-27",
    "approval_status": "auto_approved",
    "approval_rule": "below_threshold_5000_usd"
  }
}

The difference isn’t just format — it’s purpose.

The application log records that something happened. The business event records what happened, why it happened, what context informed the decision, and what the outcome was. An agent reading the second record can reason: “The last three POs to this vendor were triggered by reorder point breaches at the same plant. The vendor’s on-time delivery has dropped from 97% to 94% over six months. Should I flag this for sourcing review?”

That reasoning is impossible against application logs, no matter how sophisticated the agent’s model is.

The two-tier architecture. The practical pattern we’ve implemented is a two-tier approach. Tier 1 is an operational event store — the hot layer where recent business events live for real-time agent access. In a GCP environment, this might be Firestore or Cloud SQL. Events are structured, indexed by entity and action type, and queryable within milliseconds. Tier 2 is an analytical event store — the deep layer where events stream for historical pattern analysis, anomaly detection, and training data for agent behaviour. BigQuery, Snowflake, or equivalent. This is where an agent goes to answer questions like “what’s the seasonal pattern for this part’s consumption?” or “how did our vendor mix change after the last supply disruption?”

The critical design decision is the event schema. A single, unified event collection with a consistent structure across all domains — procurement, maintenance, production, quality, inventory — gives agents the ability to reason across boundaries. Separate event stores per domain force the agent to query multiple sources and stitch context together, which is exactly the kind of cross-system complexity that causes agent failures at scale.

What to capture. Every business event needs three sections: intent (what triggered this action and why), context (what information was available at decision time), and outcome (what happened as a result). This isn’t metadata — it’s the reasoning substrate. Without intent, the agent knows what happened but not why. Without context, it can’t evaluate whether the decision was good given what was known. Without outcome, it can’t learn from patterns.

Start capturing business events now, even before you deploy agents. The historical depth matters. An agent that launches with six months of structured business events is fundamentally more capable than one that launches with a blank slate.

Layer 3: Govern data at the point of entry — not after the pollution

This is my home ground. I’ve spent 15 years building data governance systems for asset-intensive industries, and the pattern is always the same: organisations try to clean their data after it’s already inside the ERP. They run deduplication projects. They hire contractors to scrub the parts catalog. They implement periodic data quality audits.

It doesn’t work. Or more precisely — it works once, expensively, and then the data degrades again within months because nothing changed at the point where data enters the system.

The manufacturing ERP is a perfect illustration of why.

A maintenance technician needs a part. They search the system, don’t find an obvious match — because the part exists under a description they didn’t think to search for — and create a new record. That new record is now a duplicate. It gets attached to a purchase order. The purchase order creates a goods receipt. The goods receipt updates inventory. The inventory count feeds the MRP run. The MRP run generates planned orders for a part that already has sufficient stock under a different record.

One ungoverned entry. Six downstream consequences. All before anyone notices.

Now multiply that by the scale of a manufacturing operation. We’ve processed approximately 5 million master data records over our 15 years in this space. The pattern is remarkably consistent: 15-25% duplication rates in mature ERP environments, with some catalogues exceeding 40%. Each duplicate is a decision error waiting to happen — for a human, and now, at much greater speed and scale, for an agent.

Prevention-first governance. The architectural principle is straightforward: govern data before it enters the ERP, not after the pollution has spread. This means building a governance layer that sits between the requestor and the system of record. When someone creates a new material master, the governance layer checks for duplicates against normalised descriptions, manufacturer part numbers, and classification hierarchies. It standardises the description format. It validates the unit of measure against the material group norm. It ensures cross-references to vendors and equipment are consistent.

Only after the record passes governance does it enter the ERP.

This inverts the traditional model. Instead of periodic cleanup — which is expensive, disruptive, and temporary — you build a system where clean data is the default state. The ERP starts clean and stays clean because the entry point is governed.

Why this matters specifically for agents. An agent operating against a governed parts catalog can make procurement decisions with confidence. It knows that when it queries for a bearing, it gets one result — the correct, normalised, fully cross-referenced record. It can evaluate vendor alternatives because the vendor-material relationships are accurate. It can assess inventory positions because there’s one record per part, not four.

An agent operating against an ungoverned catalog is making decisions against data that a human would spend 20 minutes cross-referencing before trusting. The agent doesn’t have that instinct. It trusts what it reads.

The classification challenge. Manufacturing master data carries a specific complexity that general-purpose data quality tools don’t address: technical classification. A bearing isn’t just a bearing — it has a bore diameter, an outer diameter, a width, a seal type, a cage material, a load rating, a speed rating. These attributes determine interchangeability, procurement grouping, and maintenance specifications. Getting the description right isn’t enough. The attributes need to be normalised against an industry taxonomy — and that taxonomy needs to be enforced at the point of entry.

We’ve built classification systems that use a combination of domain-specific dictionaries and AI-assisted extraction to govern this at scale. The AI doesn’t make the final call — it proposes a classification and a set of normalised attributes based on the input description and manufacturer data. A governed workflow validates the proposal before it enters the system. The result is data that’s both clean and technically precise — not just deduplicated, but properly classified.

This is where automation in data governance gets nuanced. The AI component consistently looks better on paper than in production. Automated classification might achieve 75-80% accuracy on first pass. That sounds good until you realise that the 20-25% it gets wrong are exactly the edge cases — unusual parts, ambiguous descriptions, cross-category materials — where a wrong classification creates the most damage downstream. The value isn’t in replacing human judgment. It’s in building a governed pipeline where AI handles the routine cases and routes the exceptions to domain experts who can resolve them correctly.

The architecture as a whole

These three layers aren’t independent — they’re a stack. Event-driven integration (Layer 1) generates the business events that feed the event store (Layer 2). The event store captures actions taken against governed master data (Layer 3). An agent operating across all three layers has real-time system access, historical business context, and trustworthy data to reason against.

Remove any one layer and the system degrades:

Without Layer 1, your agent can’t interact with your systems at the speed it needs to reason. Without Layer 2, your agent can observe current state but can’t understand patterns, history, or the reasoning behind past decisions. Without Layer 3, your agent is reasoning against data that even your experienced human operators don’t fully trust.

The good news: these layers can be built incrementally. Start with Layer 3 — govern the data — because clean data improves every system that touches it, agents or not. Add Layer 2 — start capturing business events — because historical depth compounds over time and you want the data accumulating before your agents are ready. Build Layer 1 — the integration surface — last, because it’s the most system-specific and benefits from the stability of the layers below it.

The uncomfortable truth from the research is this: for every $1 spent developing an AI model, you need $3 on the surrounding infrastructure and change management. Most enterprises have the ratio inverted. The model gets the investment. The plumbing gets the leftovers.

Flip the ratio. Fix the plumbing. The agents will thank you.

Next in this series: what a fast, engineering-led readiness engagement looks like in practice — and why the traditional SI model isn't built for this moment.

About the Author

Raghu Vishwanath is Managing Partner at Bluemind Solutions, a product engineering firm specializing in MRO master data governance. He writes about software engineering, AI, and building platforms that last.