The MSP Automation Stack: The Seven Layers and How They Connect

Most MSPs cannot draw their own automation architecture. They can name the tools — PSA, RMM, documentation system, scripting platform — but the picture of how those tools hand off to each other lives in the heads of two or three engineers and nowhere on paper. When those engineers leave, the architecture leaves with them.

A reference architecture fixes that. The msp automation stack described below has seven layers. Each layer has a defined responsibility, defined inputs, and defined outputs. The handoffs between layers are where most MSPs break — not because the individual layers are weak, but because the seams between them are undefined.

This article lays out the seven layers, what belongs in each, and the specific points where the handoffs typically fail. Use it to audit your own stack and to plan the next investment.

Layer 1: Identity and Tenant Context

The bottom of the stack is identity. Every action in every other layer eventually reduces to a question of “who is this for and what tenant are they in.” If identity is wrong, everything above is wrong.

The identity layer answers four questions for any incoming signal:

Which client tenant does this belong to?
Which user inside that tenant initiated or is affected?
What entitlements does that user have?
What policies apply to this tenant for the kind of work being requested?

For MSPs the identity layer typically spans Microsoft Entra ID across many tenants, Google Workspace for some, and the MSP’s own internal directory. The hard problem is mapping identity across tenant boundaries — knowing that a ticket from “[email protected]” maps to the Acme tenant, the Acme contract, the Acme runbook set, and the Acme escalation policy.

This is the layer that is most often hand-waved as “we’ll figure that out at the PSA level.” That is a mistake. Identity needs to be resolved before the PSA, not derived from it. The right pattern is a tenant-context resolver that runs on every incoming signal and stamps the signal with verified identity before it reaches anything else.

Layer 2: PSA (System of Record)

The PSA — ConnectWise, Autotask, HaloPSA — sits at layer 2. It is the system of record for tickets, contracts, time, billing, customer master data, and assignment.

The PSA’s job in an automation architecture is narrower than most MSPs treat it. The PSA holds the truth. It does not need to be the place where intelligence lives. Workflow rules in the PSA are useful for deterministic routing, status management, and notification. They are not the right home for AI decisions, complex remediation logic, or cross-system orchestration.

A clean architecture pulls intelligence out of the PSA and pushes it into a dedicated workflow and agent layer (layers 5 and 6 below) while keeping the PSA as the durable record of what happened.

For a deeper look at how to make AI work consistently across the major PSAs, see AI automation across your PSA.

Layer 3: RMM and Telemetry

Layer 3 is where signals about the world come from. RMM tools (Datto RMM, NinjaOne, Kaseya), monitoring platforms, security tools, and observability systems all produce a stream of events that can trigger automation.

The architectural question at this layer is normalization. Five different monitoring tools produce alerts in five different shapes. The downstream automation cannot reason about five formats. Something needs to translate every incoming alert into a normalized event with consistent fields: tenant, asset, severity, category, source, raw payload.

The mature pattern is an event bus or normalization service that sits between the telemetry tools and the rest of the stack. Without it, every script and every workflow rule has to know about every variant of every alert. With it, downstream layers see a clean, consistent stream.

The intersection of RMM and AI deserves its own look — see our piece on RMM and AI for MSPs for how this layer evolves when intelligence enters the picture.

Layer 4: Documentation and Knowledge

Layer 4 holds the institutional memory: client procedures, configurations, network diagrams, passwords, vendor contacts, runbooks. The dominant systems are IT Glue, Hudu, Confluence, and SharePoint.

Documentation is the layer that determines whether the layers above can do their work. An agent in layer 6 making a decision about a client ticket needs to ground that decision in client-specific context — what is their license entitlement, what is the approval path for this kind of change, what are the known quirks of their environment.

The architectural requirement at layer 4 is machine-readability. Documentation written for humans (long prose, embedded screenshots, implicit context) does not serve automation. Documentation structured for retrieval (consistent fields, tagged content, linked entities) does.

Most MSPs sit at the wrong end of this spectrum. Migrating documentation from human-shaped to machine-shaped is one of the highest-leverage investments an MSP can make in their automation maturity. See documentation APIs for IT Glue, Hudu, and AI for how to think about this transition.

Layer 5: Workflow Engine

Layer 5 is where deterministic logic lives. This is the orchestration layer — the place where “if this signal arrives from RMM, look up the asset in IT Glue, check the ticket history in the PSA, and decide whether to escalate” gets composed.

The workflow engine is not the AI. It is the conductor. It calls APIs, reads from documentation, writes to the PSA, invokes scripts, and waits for results. It handles retries, timeouts, parallel branches, and error paths.

In a traditional MSP stack this layer is implicit — it lives across PSA workflow rules, RMM scripts, scheduled tasks, and Power Automate flows. In a mature automation architecture it is explicit, with a single named system (or a small number of integrated ones) responsible for orchestration.

The benefit of making this layer explicit is observability. When the workflow engine is one place, you can see what is running, what failed, and what is queued. When it is implicit across a dozen tools, troubleshooting takes hours or days.

Layer 6: AI / Agent Layer

Layer 6 is where intelligence and autonomy live. This is the agent runtime, the LLM calls, the classification models, the policy engine, and the planning logic.

The defining property of this layer is that it makes decisions rather than executing recipes. Given a ticket, it decides what category, what priority, what action to take. Given a signal, it decides whether to remediate, escalate, or ignore. Given a question, it decides what context to retrieve and what answer to compose.

The agent layer calls down into the workflow engine (layer 5) to actually execute its decisions. It reads from documentation (layer 4) to ground its reasoning. It reads from the PSA (layer 2) for context. It writes back to the PSA when actions complete.

The architectural mistake here is putting the agent layer in front of the workflow engine instead of next to it. An agent that bypasses the orchestration layer and calls APIs directly is harder to observe, harder to govern, and harder to roll back when something goes wrong. Keep the agent as a decision-maker and the workflow engine as the executor.

Layer 7: Governance and Audit

The top of the stack is governance. This is the layer that watches everything below it: every decision, every action, every escalation, every exception.

Governance includes:

An immutable audit log of every action taken by automation
A policy engine that defines what each agent or workflow is allowed to do
An evaluation framework that samples decisions for quality
A monitoring layer that alerts on drift, errors, or out-of-bounds behavior
A reporting surface that surfaces all of this to humans

The temptation is to treat governance as an afterthought — something to add once the rest of the stack is working. That is exactly backward. Governance has to be designed in from the start because retrofitting it onto a running automation system is significantly harder than building it alongside.

Without layer 7, you do not actually know what your automation is doing. You assume. Assumptions about production systems are how outages happen.

The Handoff Map (Where Most Stacks Break)

The seven layers are interesting. The handoffs between them are where the failure modes live. Here are the seams that break most often.

Handoff	Common Failure	Symptom
Identity to PSA	Tenant resolution fails for ambiguous user	Ticket lands in wrong client
PSA to Workflow Engine	Workflow rules duplicate logic that lives in the engine	Two systems do the same thing inconsistently
RMM to Workflow Engine	Alerts arrive in unnormalized formats	Downstream logic breaks on edge cases
Documentation to Agent Layer	Stale or unstructured docs poison agent context	Agents make confident wrong decisions
Workflow Engine to Action APIs	Credentials, rate limits, retry logic scattered	Silent failures and data drift
Agent Layer to Workflow Engine	Agents bypass the engine and call APIs directly	No observability, no rollback
Everything to Governance	Audit log is partial or non-queryable	Cannot answer “what did the system do”

Audit your own stack against this table. The handoffs where you cannot describe the contract precisely are the handoffs where your incidents will originate.

The way to harden these seams is the same in every case: define the contract, log the traffic, monitor the SLOs, and own the failure modes. None of this is exotic. All of it is operational discipline applied consistently.

For a working sense of how the integration layer specifically fits together — APIs, connectors, and the seams between them — our integrations overview is the right starting point.

FAQ

Do we need all seven layers to start?

No. Every MSP already has layers 1, 2, 3, and 4 — identity, PSA, RMM, and documentation — even if they are not architected. The question is what to add and in what order. For most MSPs the next investment is making the workflow engine explicit (layer 5). After that, the agent layer (layer 6) and the governance layer (layer 7) come together.

Can we use our PSA as the workflow engine?

Up to a point. PSA workflow rules are a fine layer-5 capability for simple, deterministic flows that live entirely inside the PSA. Once you need cross-system orchestration, retries, or branching logic, the PSA becomes the wrong tool. Most MSPs eventually outgrow the PSA as their workflow engine and move to a dedicated platform.

Where do automation tools like Power Automate or Make fit?

They fit at layer 5 — workflow engine. They are reasonable choices for that layer in many MSP stacks. The architectural question is not which tool you use but whether the workflow logic is consolidated in one place or scattered across many.

How does this stack handle multi-tenant security?

Layer 1 (identity and tenant context) is the foundation. Every action in every other layer carries the verified tenant context with it. The agent layer and workflow engine enforce tenant boundaries at execution time. The audit log records which tenant every action touched. Tenant isolation is a property of the architecture, not a feature you turn on.

What is the biggest mistake MSPs make when planning this stack?

Buying a tool before defining what layer it belongs to. The tool then ends up being used for three layers it was not designed for, and the seams between layers become spaghetti. Define the architecture first, even on a napkin, and choose tools that fit one layer well.

If you want help mapping your current stack against this reference architecture and identifying the next investment that moves your automation maturity forward, our team does this kind of work routinely. Reach out via our contact page and explore how Mizo’s integration platform connects the layers you already have into a stack that scales.