ARTICLE

Adobe Commerce Uptime Monitoring: What Your Maintenance Team Should Be Watching

Adobe Commerce Uptime Monitoring: What Your Maintenance Team Should Be Watching

“Is the site up” is not a useful monitoring question for Adobe Commerce. The site is almost always up in some sense; the meaningful question is which functions are working, which are degraded, and which are silently failing. The merchants who pay retainer fees and still suffer slow-burning incidents are usually paying for “is the site up” monitoring and not getting the layered observability that catches problems before customers do. Your maintenance team should be watching specific signals continuously, not just responding when something visibly breaks.

This piece walks through the monitoring stack Bemeir’s Adobe Commerce maintenance team runs for clients, organized by what each signal predicts and how it should drive operational response. The categories are practical, not theoretical; each one corresponds to a real failure mode that has cost merchants real money.

Layer 1: External availability

The outermost layer is whether the site responds at all from the public internet. The right tools here are external synthetic monitors that hit your site from multiple geographic locations on a frequency you specify.

What to monitor. Homepage status code and response time, category page status code and response time, product detail page status, checkout page status, key API endpoints if applicable. Five to ten URLs covering the main page types.

Frequency and locations. Every 60 seconds from at least 4 geographic regions. The geography matters because regional CDN or DNS issues can cause partial outages that single-region monitoring will miss.

Failure criteria. Any 5xx response, response time above 10 seconds, or 2+ failed regions simultaneously. Single-region failures often reflect network glitches and should not page; multi-region failures usually reflect real outages.

Tools. Pingdom, UptimeRobot, Datadog Synthetic, or built-in CloudWatch synthetic monitors on AWS. The tool choice is less important than having one configured with the right scope.

Layer 2: Application health

Beyond “does it respond,” is the application actually functional? This layer catches problems where the server returns 200s for failing flows.

What to monitor. Cart add success rate, checkout completion rate, login success rate, search returning results, admin login. These are user journeys, not URLs, and each requires a small script that simulates the action and verifies success.

Frequency. Every 5-15 minutes depending on traffic. High-traffic stores get frequent checks; lower-traffic stores can ease back.

Failure criteria. Any expected step returning unexpected output, error messages appearing in the response, or a flow timing out. Stale data in the response (e.g., the test product never appearing in the cart) is also a failure.

Tools. Datadog Synthetic Monitoring, Checkly, or custom scripts running on AWS Lambda or similar. The infrastructure is less important than the discipline of building the scripts and keeping them updated.

Layer 3: Server and infrastructure health

The traditional infrastructure metrics that catch problems before they cause user-visible failure.

What to monitor. CPU utilization, memory utilization, disk space, network throughput, database query rate, database connections, Redis memory and eviction rate, OpenSearch cluster status. Standard observability stack output.

Thresholds. CPU sustained above 80% for 5+ minutes pages someone. Memory above 90% pages. Disk above 85% pages. Database connections approaching the configured maximum pages. Redis evictions above zero is a red flag that should be investigated even if not paging.

Tools. CloudWatch (for AWS), New Relic, Datadog, or Adobe Commerce’s built-in observability on Adobe Commerce Cloud. The choice depends on the broader observability stack.

The infrastructure metric merchants most often miss. Cron job execution. Magento depends on cron for indexing, queue processing, and scheduled tasks. Failing cron jobs are silent until something downstream breaks. Monitor cron job completion status and time-since-last-success for each job; alert on any job that has not completed successfully in 24 hours.

Layer 4: Adobe Commerce-specific health

The metrics that are specific to Adobe Commerce and not covered by generic infrastructure monitoring.

Indexer status. Adobe Commerce uses indexers for catalog search, prices, stock, and similar. An indexer that falls behind degrades search results and stock display. Monitor indexer status and alert on any indexer in “invalid” or “running” state for more than 30 minutes.

Message queue depth. Adobe Commerce uses message queues for async operations. A growing queue depth means messages are being produced faster than consumers are draining them, eventually leading to user-visible delays. Monitor queue depths and alert above defined thresholds.

Cache hit rates. Full page cache hit rate, configuration cache hit rate, block cache hit rate. Sudden drops indicate cache invalidation issues, often introduced by deploys.

Patch level. The current Magento and Adobe Commerce patch level vs. the latest available patches. Monitor weekly and report on any growing gap. The Adobe Security Bulletins drive this; subscribe to the feed.

Custom code health. Application-level error rate from custom modules. Sentry, Rollbar, or Adobe Commerce’s logging pipeline. Alert on error rate spikes and on new error types appearing.

Layer 5: Business-meaningful metrics

The metrics that translate technical state into business impact.

Conversion rate. Hour-over-hour and day-over-day. A drop of 20% or more in conversion rate, holding traffic constant, predicts a real issue even before technical monitoring catches it.

Cart abandonment patterns. Cart adds that did not progress to checkout, checkouts that did not complete. Pattern changes flag flow regressions.

Search results returning zero matches. Search is a primary discovery mechanism; an unexpected spike in zero-result searches usually indicates an indexer problem or a catalog issue.

Payment method failure rates. Per-payment-method success rates. A payment processor having issues is invisible to standard monitoring until customers cannot complete checkout.

Average order value and order count. Daily comparison to expected baseline. Significant deviation flags either marketing changes or technical issues affecting the funnel.

How the layers should drive response

A useful operational rhythm:

Layer Response cadence Owner Escalation
External availability Immediate (paging) Maintenance team on-call Within 15 min if not resolved
Application health Immediate (paging) for hard failures, daily review for soft Maintenance team Within 30 min for outages
Server/infrastructure Threshold-based paging Hosting team or maintenance team Within 30 min for paging events
Adobe Commerce-specific Daily review for non-critical, immediate for indexer/queue spikes Maintenance team Within 2 hours for blockers
Business metrics Daily review with merchant team Joint review Investigation triggered by anomaly

The merchants who try to page on every layer end up with alert fatigue and miss the important signals. The merchants who only page on layer 1 miss everything that happens before customers complain.

What the maintenance team’s monitoring deliverable looks like

For a healthy Bemeir maintenance retainer engagement, the monitoring artifacts the merchant should see:

A dashboard. Single view with the key metrics across all five layers. Glanceable, with the right granularity. The merchant should be able to look at it for 30 seconds and know whether the site is healthy.

A weekly status report. Written summary: what happened, what was the response, what trends are visible, what is the patch level, what is in flight. Two pages, not twenty.

A monthly incident review. Every incident from the prior month with root cause, time to detect, time to mitigate, and what changed in response. Even non-customer-impacting incidents go through this review because they teach you about resilience.

A quarterly resilience review. Where would the site be most fragile if a specific dependency failed? Database, Redis, Varnish, image CDN, search? The review identifies the top three resilience gaps and prioritizes investment.

The signals to add when the merchant is ready

After the standard monitoring is in place, three categories of advanced signal that mature engagements add:

Real-user monitoring (RUM). Page-load performance from real visitor browsers, segmented by device, geography, and customer state. This catches the long-tail performance issues synthetic monitoring misses.

Distributed tracing. Request-level tracing through the Adobe Commerce request lifecycle and out to dependencies. Useful when isolating slow database queries or third-party service degradation.

Anomaly detection. Machine-learning-driven detection of unusual patterns, not just threshold breaches. Useful for catching subtle issues like a custom module starting to log new error types without crossing an absolute threshold.

These layers are not necessary on day one but they are valuable additions as the engagement matures.

What good monitoring is not

Three patterns that look like monitoring but produce no value:

Dashboards nobody looks at. A wall of graphs that nobody reviews is theater, not monitoring. The dashboard should be reviewed at known cadences (daily or by exception) and should change behavior when it shows problems.

Alerts that nobody responds to. An alert that fires and is ignored teaches the team that alerts can be ignored. Either silence the alert because it is not actionable, or commit to responding when it fires.

Monitoring that does not survive deploys. Some monitoring scripts break silently when the site changes. A test that always passes because the assertion no longer applies is worse than no test. Regularly audit that monitoring is detecting actual failures.

Bemeir’s maintenance team treats monitoring as core retainer work, not as something added after launch. The merchants who get this discipline have fewer incidents, shorter time-to-detect when incidents happen, and a clearer view of where the platform is heading over time. The Adobe Commerce operations documentation provides the platform-level guidance; the operational rigor above is what bridges the platform documentation to actual production calm. The work is well-understood; the cost of skipping it is paid in customer-impacting incidents that better monitoring would have caught two days earlier.

Let us help you get started on a project with Adobe Commerce Uptime Monitoring: What Your Maintenance Team Should Be Watching and leverage our partnership to your fullest advantage. Fill out the contact form below to get started.

more articles about ecommerce

Read on the latest with Shopify, Magento, eCommerce topics and more.