
Most service-level agreements on Adobe Commerce maintenance retainers are aspirational fiction. The agency promises a 30-minute response time on critical incidents 24/7, and the client signs because the number sounds good. The agency knows there is no on-call rotation that actually delivers 30 minutes, but the contract is signed and the work begins. The first time production goes down at 2am Eastern on a Saturday, the response is 90 minutes, the client is angry, and the relationship enters a slow erosion.
The honest version is that SLAs are operational commitments, not marketing claims. Real SLAs cost real money to deliver, and the cost should be visible in the retainer pricing. If you are paying mid-market retainer rates and getting enterprise SLA promises, somebody is lying, and the lie will surface when you need the SLA the most.
This piece is the comparison Bemeir’s Adobe Commerce maintenance practice uses when scoping retainers with new clients. It is built from the actual cost of running on-call rotations and from the response-time data we have measured on real incidents over the last several years.
The tiers that actually exist
There are three SLA tiers that map to recognizable retainer pricing brackets. Each one has different response-time guarantees, different coverage windows, and different cost structures.
The basic tier is business-hours coverage, with a 4-hour critical-incident response during business days and best-effort response outside. This tier fits stores doing 5 to 20 million in annual GMV where downtime cost is real but not catastrophic. The retainer pricing for this tier typically lands between 4,000 and 12,000 dollars per month.
The mid-market tier extends to extended business hours, plus 24/7 coverage with a 1-hour critical-incident response. This fits stores doing 20 to 100 million in annual GMV where weekend and holiday downtime would be operationally painful. The retainer pricing lands between 10,000 and 30,000 dollars per month.
The enterprise tier is 24/7 with a 15 to 30 minute critical-incident response, dedicated on-call engineers, and a defined rotation that the client can audit. This fits stores doing 100 million or more in annual GMV, or stores where downtime cost is large enough to justify the investment. The retainer pricing lands between 25,000 and 100,000 dollars per month and up, depending on the additional scope.
The pattern is that response time decreases roughly logarithmically with retainer cost. Cutting the response time in half roughly doubles the operational cost of delivering it.
What “response time” actually means
Response time SLAs are misunderstood by both clients and agencies. The number on the contract is the time from when the agency is notified of an incident to when an engineer acknowledges and begins working. It is not the time to resolution, and it is not the time to first communication with the client.
A 1-hour response SLA means an engineer is looking at the problem within an hour. It does not mean the problem is fixed within an hour. Resolution time depends on the nature of the incident and can range from minutes to days. The SLA the contract should also specify is the resolution time target for each severity tier, with realistic ranges rather than fictional minimums.
A useful framing is to define four severity levels and target both response and resolution for each.
| Severity | Definition | Mid-market response | Mid-market resolution | Enterprise response | Enterprise resolution |
|---|---|---|---|---|---|
| P1 critical | Site down or payment broken | 1 hour 24/7 | 4 hours target | 15 min 24/7 | 2 hours target |
| P2 high | Major feature broken | 2 hours business hours | 8 hours target | 1 hour 24/7 | 4 hours target |
| P3 moderate | Minor feature broken | 4 hours business hours | 24 hours target | 2 hours business hours | 12 hours target |
| P4 low | Cosmetic or non-urgent | Next business day | 5 business days | Next business day | 3 business days |
These targets are deliverable. We have hit them consistently across mid-market and enterprise retainers over the last several years. The discipline is the staffing and tooling that supports them.
What the agency needs to actually deliver
Each tier requires specific infrastructure on the agency side. Clients should ask about this infrastructure as part of evaluating an SLA promise.
The basic tier needs a documented intake process for incidents during business hours and a defined business-day work schedule. Most reasonable agencies can deliver this with their normal team.
The mid-market tier needs a documented on-call rotation that covers nights, weekends, and holidays. At a minimum, two engineers in rotation, with documented escalation paths. The rotation needs to have actual sleep-disrupting commitments, which means the engineers need to be compensated for the rotation either through stipends or time-off compensation. Without compensation, the rotation erodes within months as engineers burn out or quit. The agency should be able to name the engineers in rotation, the rotation frequency, and the compensation structure.
The enterprise tier needs the same on-call infrastructure plus dedicated resources for the account. Dedicated means the engineers’ primary responsibility is the account, not a side load. The rotation may also be larger, with three to five engineers to spread the on-call burden. The compensation structure is more elaborate, and the agency should have a runbook library that covers the top 20 incident types specific to the client’s store.
The Atlassian incident management handbook and the Google SRE book are both useful reference points for what mature on-call practice looks like, regardless of platform. Most agencies that take SLAs seriously have read one or both.
Uptime SLAs and the math behind them
The uptime number on the contract is usually 99.9 percent. Sometimes 99.95 percent. Occasionally 99.99 percent for enterprise retainers. The numbers translate to allowable downtime as follows.
99.9 percent allows 8.76 hours of downtime per year. 99.95 percent allows 4.38 hours. 99.99 percent allows 52.6 minutes. Stores that operate at 99.9 percent are within the band that most well-run Adobe Commerce stores actually achieve. Stores at 99.99 percent are doing exceptional work, usually with multi-region failover and significant infrastructure investment beyond the retainer.
The honest conversation about uptime is that it is a function of the hosting infrastructure, not just the agency. If the store is on shared hosting with no failover, 99.9 percent is the realistic ceiling regardless of how good the agency is. If the store is on Adobe Commerce Cloud with the recommended high-availability configuration, 99.95 percent becomes achievable. If the store invests in multi-region active-active architecture, 99.99 percent is on the table but expensive.
The Adobe Commerce Cloud SLA documentation is the reference for what Adobe commits to on the platform side. The agency’s SLA layers on top of that and cannot exceed it. Any agency that promises uptime numbers higher than the hosting platform supports is making promises that depend on luck.
Bemeir’s pattern with mid-market clients is to commit to 99.9 percent on a single-region Adobe Commerce Cloud configuration, with the option to upgrade to 99.95 percent on a high-availability configuration that the client pays for separately. Above 99.95 percent, the conversation shifts to multi-region architecture which is a separate engineering project.
What the SLA does not cover
A clear SLA also specifies what is out of scope, which prevents arguments later. The standard exclusions are: scheduled maintenance windows that the client has been notified of, third-party service outages outside the agency’s control (payment processors, search providers, ERP integrations), force majeure events, and downtime caused by client-initiated changes outside the agency’s process.
The third-party exclusion is the one that comes up most often. When Stripe is down or NetSuite is unreachable, the storefront may appear partially broken to customers even though Adobe Commerce itself is healthy. The agency cannot fix Stripe. What the agency should do is monitor the third-party service, communicate clearly when there is an outage, and have documented degraded-mode behavior so the storefront fails gracefully rather than producing scary error pages.
The AWS Service Health Dashboard and the Stripe status page are examples of the third-party services that most stores depend on. Subscribe to their status feeds and surface the data in your incident response tooling. The store’s perceived availability is the composite of every service in its critical path.
Penalty clauses that mean something
The penalty clause for SLA breaches is the part that separates real SLAs from fictional ones. A clause that says “the agency will use best efforts to meet the SLA” is not a real SLA. A clause that says “if the SLA is missed, the client receives a 10 percent service credit on the following month’s retainer, with the credit doubling for each consecutive month of breach” is a real SLA.
The penalty should be material enough to actually motivate the agency to deliver and small enough that it does not threaten the engagement. Service credits at 5 to 15 percent of the monthly retainer for each missed SLA event, capped at 50 percent of the monthly retainer for a single month, is the range that most working SLAs land in.
Bemeir’s standard contract includes graduated service credits for SLA misses and an escalation clause that requires a written incident postmortem for every P1 incident. The postmortem discipline is more useful than the credit itself, because it forces both sides to look at what happened and what changes to prevent recurrence. The credits are the consequence. The postmortems are the improvement loop.
What to ask before signing
If you are evaluating an Adobe Commerce maintenance retainer, the SLA section deserves its own conversation. Five questions to ask before signing.
Who specifically is on the on-call rotation for this account? Name them. The agency should be able to answer this in 30 seconds. If they cannot, the rotation does not exist in the way the contract implies.
What is the documented escalation path if the on-call engineer does not respond within the SLA window? There should be a second tier and a third tier, with named individuals or roles for each.
What are the resolution targets for each severity tier, not just the response targets? Resolution is what actually matters to the business. Response is the precondition for resolution.
What is the penalty for SLA breach, and has the agency ever paid one? An agency that has never paid a service credit either has perfect operational discipline or has a contract loose enough that no breach is provable. The honest answer usually includes one or two paid credits in the last 24 months.
What is the runbook library for this store, and can we see a redacted example? Agencies that maintain runbooks have actual operational practice. Agencies that wing it can produce a story but not a runbook.
If you are running parallel maintenance evaluations across Shopify Plus, Shopware, or BigCommerce partners, the same questions translate. The SLAs differ in platform-specific detail. The discipline of having documented rotations, escalation paths, and resolution targets is universal. The agencies that take SLAs seriously have these answers. The agencies that do not will tell you with their hesitation. Either way, you will know what you are buying before you sign the contract.




