
Most Adobe Commerce stores have a backup of some kind. Very few have actually tested a restore. Even fewer have a disaster recovery plan that specifies what gets restored, in what order, to what infrastructure, within what time. The gap between “we have backups” and “we can recover from a major incident” is wider than merchants usually realize, and it shows up only at the worst possible time. A maintenance retainer that does not address this gap is leaving the merchant exposed to a failure mode that will eventually happen.
This piece walks through the disaster recovery framework Bemeir’s maintenance team builds into every retainer. The categories below cover what every retainer should include, what should be tested regularly, and how the recovery objectives drive the cost-vs-risk trade-offs.
What needs to be backed up
Adobe Commerce stores have more state than the merchant typically realizes. A complete backup plan covers:
Database. The primary data store. Customers, products, orders, prices, inventory, customer segments, sales rules, content, configuration. Everything that lives in MySQL or Aurora.
Media files. Product images, category images, CMS media. Usually stored on S3 for cloud-native deployments, sometimes on local disk for older configurations. Either way, backed up separately from the database.
Application code. The custom modules, themes, third-party extensions, and configuration files. This is essentially the code repository plus any environment-specific deployment artifacts.
Configuration that is not in code. Some Adobe Commerce configuration lives in the database (cron schedules, indexer state, configuration cache). Some configuration is in environment files outside the repository (env.php, config.php, app keys, encryption keys, API credentials). Both need to be backed up.
Logs. Application logs, access logs, error logs. Less critical for operational restore but important for post-incident analysis.
Cron schedules. What jobs run, when, with what configuration. Often documented only in the production system.
Integration credentials. API keys, webhook secrets, OAuth tokens for every integration. Often the hardest thing to reconstruct after a complete environment loss.
A backup that covers only the database is incomplete. A backup that covers database, media, and code is the floor; a complete backup covers all seven categories above.
Backup cadence
Different data has different change rates and different recovery requirements. The right cadence:
| Data category | Recommended cadence | Retention |
|---|---|---|
| Database (full) | Daily | 30 days |
| Database (incremental or binlog) | Continuous or every 15 minutes | 7 days |
| Media files | Daily incremental, weekly full | 30 days full, longer for archived |
| Application code | Per deploy via Git, point-in-time tags | Indefinite |
| Configuration not in code | Daily | 30 days |
| Logs | Continuous streaming to long-term storage | 90+ days for compliance |
| Cron and integration credentials | Per change, plus monthly verification | Indefinite versioned |
The cadence is calibrated to the recovery point objective (RPO): how much data can the merchant tolerate losing in the worst case. A 15-minute binlog cadence means worst-case data loss is 15 minutes. A daily full backup means worst-case loss is 24 hours.
Recovery objectives that drive the design
Two numbers define the disaster recovery posture:
Recovery Time Objective (RTO). How long can the store be down before recovery is complete? An RTO of 4 hours means the disaster recovery design needs to complete a full recovery in 4 hours. An RTO of 24 hours allows a different design with lower ongoing cost.
Recovery Point Objective (RPO). How much data loss is acceptable? An RPO of 15 minutes means the backup cadence needs to be at most 15 minutes. An RPO of 24 hours allows daily backups.
The merchant defines these numbers based on revenue per hour, customer impact, and operational tolerance. The maintenance team designs the backup and recovery infrastructure to meet those numbers. Cost scales with how aggressive the numbers are: a 1-hour RTO requires hot-standby infrastructure that is meaningfully more expensive than the 24-hour RTO version.
What needs to be tested
A backup that has not been tested is not a backup. The testing regimen:
Restore test, quarterly minimum. Take the most recent backup and restore it to a non-production environment. Verify the database loads, the application starts, key flows work, and the data is intact. Document any issues found.
Failover test, annual minimum. If the disaster recovery design includes a standby environment, fail over to it. Run the production workload from the standby for a defined window, then fail back. This proves the failover process works.
Documentation walkthrough, semi-annual. A team member who was not involved in writing the disaster recovery documentation runs through it step by step. Any gap or ambiguity becomes a documentation fix. This catches the cases where the playbook assumes knowledge that only the original author has.
Tabletop exercise, annual. A scenario-driven discussion: “the database is corrupted, what do we do?” The team walks through their response. The exercise surfaces gaps in process, communication, and decision-making that real incidents would expose at high cost.
The tests are not theater. They are the only way to know whether the recovery posture actually works.
Common failure modes the testing surfaces
When merchants run a real restore test for the first time, the issues that consistently surface:
Backup is incomplete. The database backup was running but the media backup was not. Or the database backup was running but only on the primary table set, missing extension-specific tables. Or backups were running but the credentials had expired three months ago and nobody noticed.
Restore takes longer than expected. The recovery design assumed a 2-hour database restore. The actual test takes 6 hours because the backup is on slow storage, or the restore process is single-threaded, or the destination environment is undersized.
Application does not start cleanly. Configuration mismatches, missing environment variables, integration credentials that need re-issuing, or schema versions that do not match between the backup and the application code.
Data is inconsistent. The database backup and the media backup were not taken at the same point in time. Orders reference media files that were deleted before the database backup captured the deletion. Or customer records reference catalog entries that no longer exist.
Documentation is wrong. The playbook references infrastructure that has changed, tools that have been renamed, or processes that have evolved. Following the documentation literally produces errors.
Every one of these issues is fixable when surfaced by a test. None of them are fixable in the middle of a real incident.
What disaster recovery looks like on AWS
For Adobe Commerce stores hosted on AWS, the standard architectural patterns:
Multi-AZ deployment. Application and database span at least two availability zones. AZ failure is handled transparently by the auto-scaling and database replication.
Cross-region replication for media. S3 cross-region replication mirrors the media bucket to a second region. Regional failure of the primary region does not lose media.
Cross-region database snapshots. Daily snapshots replicated to a second region. Manual restore in the second region is possible if the primary region is unavailable.
Code in Git, deployable from anywhere. The application code lives in a Git repository that can be deployed to any compatible environment. The deployment process is documented and runnable by multiple team members.
Documentation in a runbook. The recovery procedure is documented step by step, including specific commands, configuration values, and decision points.
Adobe Commerce Cloud handles most of this infrastructure layer automatically. Self-hosted AWS deployments require the maintenance team to implement and verify each piece. The patterns themselves are well-understood; the operational rigor to maintain them is what varies.
The retainer’s role
A maintenance retainer that takes disaster recovery seriously delivers:
Backup configuration audit, initial. Verify what is actually backed up, how often, and where. Often the audit surfaces gaps versus what the merchant believes is in place.
Backup test, quarterly. Documented restore test from the most recent backup. Findings written up; remediations tracked.
Recovery documentation, maintained. The runbook is updated when the infrastructure changes. Versioned, reviewable, and findable.
Failover test, annual. Full test of the failover capability if one exists. Documented results.
Incident response readiness. The team knows what to do if a real incident happens. The contact list is current, the escalation path is defined, and the decision authority is clear.
A retainer that does not deliver these is not actually delivering disaster recovery; it is delivering the option to delivery disaster recovery, which is not the same thing.
What this is worth
Disaster recovery is insurance work. It costs money when nothing is wrong and pays off when something is. The math is similar to other insurance: low probability of catastrophic event multiplied by high cost of unmitigated event equals meaningful expected value.
For Adobe Commerce stores doing $20M+ annual revenue, an unrecovered major incident is in the high six figures to low seven figures: revenue lost during downtime, data lost permanently, brand damage from extended outage, regulatory exposure if personally identifiable information is involved. The disaster recovery investment that prevents this is a few thousand dollars a month of retainer scope plus modest infrastructure cost.
Bemeir’s perspective is that disaster recovery belongs in every Adobe Commerce maintenance retainer because the cost of not having it is enormous and the cost of having it is modest. The merchants who skip it usually do so because nothing has gone wrong yet; the merchants who have lived through a real incident usually pay for it without negotiation. The AWS disaster recovery whitepapers and Adobe Commerce backup documentation document the technical patterns; the operational discipline to maintain them is what the retainer actually provides. The work is well-understood; the only variable is whether the team does it before the incident or after.





