
Zero-trust architecture for eCommerce requires: continuous identity verification (never implicit trust), micro-segmentation (isolate critical systems), least-privilege access (minimize user permissions), API security (token-based, not API keys), real-time monitoring (detect anomalies), and incident response workflows. Implementation spans 12 – 18 months for brownfield platforms and includes identity provider setup, API gateway hardening, and network isolation on AWS/GCP/Azure infrastructure.
Zero-Trust Architecture for eCommerce: A Practitioner’s Checklist
“Zero-trust” is the security buzzword that actually matters. Unlike most security terminology, zero-trust isn’t about compliance theater; it’s about architecture that survives real attacks.
The idea: Never trust anyone or anything, even inside your network. Verify every access request. Log everything. Assume breach.
For eCommerce, zero-trust means: A customer logging in, an employee accessing the database, a payment processor calling your API – all are verified in real-time, every time. No exceptions. No “trusted” internal networks where hackers can move laterally once they get in.
Bemeir has implemented zero-trust on 15+ eCommerce platforms, from Shopify Plus to custom Magento builds to serverless architectures on AWS. Here’s the real checklist – what your CTO needs to build, what your team needs to operate, and where to start.
Phase 1: Assessment & Strategy (Weeks 1 – 3)
Current State Inventory
- [ ] Map all systems and services (list every application, database, API, third-party integration)
- [ ] Identify data flows (where does customer data move? Where does it sit at rest?)
- [ ] List all identities (employees, contractors, API keys, service accounts, automation users)
- [ ] Document access patterns (who logs in to what? How often? From where?)
- [ ] Identify trust boundaries currently in place (what systems are “trusted”? Which aren’t?)
- [ ] List external integrations (payment processors, CDN, email, marketing, analytics)
- [ ] Audit VPN usage (who uses it? Why? For how long?)
- [ ] Document current logging & monitoring (what events are logged? Who reviews logs?)
- [ ] Identify compliance requirements (GDPR, SOC 2, PCI-DSS, industry-specific?)
Why Important: You can’t build zero-trust without understanding your current architecture. Bemeir once started a zero-trust project at a platform that had “forgotten” about a legacy SSH jump box that anyone with a password could access. That one hole defeats zero-trust.
Critical Asset Identification
- [ ] Define “crown jewels” (customer payment data, customer PII, your source code, admin dashboards)
- [ ] Prioritize systems by risk (payment processing = high risk; blog = low risk)
- [ ] Map which assets interact (does payment service need access to customer database? Really?)
- [ ] Identify attack surface (which systems are exposed to internet? Which are internal-only?)
- [ ] Document data sensitivity (public data vs. confidential vs. secret)
- [ ] List compliance-sensitive assets (systems that need audit logs, encryption, etc.)
Organizational & Process Audit
- [ ] Document current onboarding process (when new employee joins, how do they get access?)
- [ ] List offboarding procedures (when someone leaves, do all their credentials actually get revoked?)
- [ ] Check for shared accounts (do multiple people share one database password? SSH key? API key?)
- [ ] Document change management (who can deploy code? What’s the approval process?)
- [ ] Identify incident response procedures (if hacked at 2am, who does what?)
- [ ] List compliance/audit obligations (annual security audit? Monthly? Real-time monitoring?)
- [ ] Assess team capability (do you have someone who understands mTLS? OAuth2? Network segmentation?)
Honest Reality: Most eCommerce companies score poorly here. Shared database passwords, no change log, vague incident response. This is normal; zero-trust fixes it.
Business Impact Planning
- [ ] Identify “can’t fail” systems (payment processing, order fulfillment, customer login)
- [ ] Estimate downtime tolerance (can payment API be down for 1 minute? 1 hour?)
- [ ] Calculate cost of breach (estimated loss of customer trust + liability + remediation)
- [ ] Plan for false positives (zero-trust monitoring creates alerts; who investigates them?)
- [ ] Estimate user impact (will employees notice higher friction during auth? How much is acceptable?)
- [ ] Define success metrics (faster incident detection? Fewer breaches? Compliance pass?)
Phase 2: Identity & Access Architecture (Weeks 4 – 8)
Identity Provider Setup
- [ ] Choose identity provider (Okta, Azure AD, Auth0, custom OAuth2 provider on AWS?)
- [ ] Implement multi-factor authentication (MFA for all human users; SSO/OIDC for service accounts)
- [ ] Set up identity federation (can contractors/partners authenticate without your directory?)
- [ ] Plan for passwordless authentication (FIDO2, phone sign-in, biometric)
- [ ] Configure session management (how long before session expires? Can users have multiple sessions?)
- [ ] Implement device posture checking (can users only log in from company devices? Or any device?)
- [ ] Set up real-time revocation (if someone is fired, how fast can you revoke their session?)
- [ ] Plan for audit logging (every authentication event logged? Include success and failures?)
Bemeir Recommendation: Okta or Auth0 for most mid-market eCommerce. They’re managed, handle MFA well, and integrate with everything. If you’re AWS-only, consider Cognito + IAM.
Access Control Architecture
- [ ] Implement role-based access control (RBAC) for humans (employees, contractors, vendors)
- [ ] Define roles (Developer, DevOps, Product Manager, Customer Support, Finance, etc.)
- [ ] Map permissions per role (Developer can deploy to staging but not production)
- [ ] Implement attribute-based access control (ABAC) for fine-grained rules (access allowed only if: user is in “payments” team AND request is from office network AND time is 9am-6pm)
- [ ] Set up just-in-time (JIT) access (temporary elevated permissions for emergency fixes, auto-revoke after 1 hour)
- [ ] Plan for access reviews (quarterly, manually verify: does this person still need this role?)
- [ ] Document approval workflow (how do people request access? Who approves? What’s the SLA?)
Common Mistake: Setting up RBAC but never reviewing it. Bemeir clients typically find that 30% of access permissions are outdated (person was promoted, role changed, but permissions never updated).
Service-to-Service Authentication
- [ ] Stop using hardcoded API keys (replace with OAuth2 client credentials flow or mTLS)
- [ ] Implement OAuth2 / OIDC for all service-to-service communication
- [ ] Set up mutual TLS (mTLS) for critical service boundaries (payment → order service, for example)
- [ ] Plan for token rotation (tokens expire after 1 hour; automatically refresh)
- [ ] Implement token binding (token is bound to a specific service identity; can’t be reused elsewhere)
- [ ] Monitor token usage (if a service starts requesting tokens at 10× normal rate, alert)
- [ ] Plan for service discovery (if services are containerized, how do they find each other securely?)
Architecture Pattern: This is where zero-trust gets real. Old way: “Service A runs on 10.0.1.x IP range; Service B trusts all requests from that range.” Zero-trust way: “Service A requests a token from identity provider; presents token to Service B; Service B validates token every time.”
External Integration Security
- [ ] Audit all third-party integrations (payment processors, shipping APIs, CRM, analytics)
- [ ] Check: which integrations have persistent API keys? (Replace with OAuth2 + scoped access)
- [ ] Plan for least-privilege API scopes (payment processor only needs to read invoices, not modify customers)
- [ ] Set up API key rotation (rotate keys every 90 days minimum)
- [ ] Implement webhook validation (verify webhooks come from actual third-party, not attacker)
- [ ] Monitor third-party API usage (unusual spike in API calls? Alert)
- [ ] Plan for vendor security (have you asked: does your payment processor have SOC 2? Do they scan for vulnerabilities?)
| Integration | Current Model | Zero-Trust Model |
|---|---|---|
| Payment Processor | Persistent API key in code | OAuth2 + token (1-hour expiry) |
| Shipping API | API key in environment variable | OAuth2 + scoped permission (“read shipment”, not “delete shipment”) |
| Email Service | API key in config file | OAuth2 + webhook validation |
| Analytics | Beacon pixel + API key | OIDC + scoped access |
Phase 3: Network & Infrastructure Hardening (Weeks 9 – 13)
Micro-Segmentation
- [ ] Map network boundaries (separate payment service from blog? Yes)
- [ ] Implement network policies (security groups, NACLs, Calico policies if Kubernetes)
- [ ] Segment by sensitivity (crown jewel services are most restricted)
- [ ] Plan for east-west traffic rules (traffic between internal services; most should be blocked by default)
- [ ] Implement API gateway (single entry point for all external requests; validate every request)
- [ ] Set up web application firewall (WAF) rules (block SQL injection, XSS, DDoS patterns)
- [ ] Plan for service-to-service mTLS (payment service can only call order service if authenticated)
- [ ] Monitor unusual traffic patterns (spike in traffic between normally isolated systems? Alert)
Practical Example:
Old way (Implicit Trust):
Zero-trust way:
Infrastructure Access Control
- [ ] Implement bastion hosts / jump boxes with MFA (only way to SSH to servers)
- [ ] Disable direct SSH to production servers (all access goes through bastion, is logged)
- [ ] Use IAM roles instead of long-lived credentials (AWS EC2 instance has temporary role credentials; no SSH key needed)
- [ ] Enable AWS Systems Manager Session Manager (access instances without SSH keys; all sessions logged)
- [ ] Remove standing VPN access (VPN is implicit trust; replace with specific, temporary access per task)
- [ ] Plan for database access (no direct database access from laptops; go through application only)
- [ ] Implement database activity monitoring (log all queries; alert on unusual access patterns)
Why This Matters: A developer’s laptop gets stolen. On the old model, attacker has SSH keys, database passwords, API keys – everything. Zero-trust model: attacker has nothing useful; every action requires real-time authentication from the identity provider.
Container & Infrastructure Hardening
- [ ] Implement image scanning (scan container images for vulnerabilities before deployment)
- [ ] Sign container images (only run images built and signed by trusted pipeline)
- [ ] Enable pod security policies (Kubernetes: restrict what containers can do)
- [ ] Implement network policies in Kubernetes (pod-to-pod traffic rules)
- [ ] Monitor container runtime behavior (detect anomalies: unexpected process, network conn, file writes)
- [ ] Use least-privilege container security context (container runs as non-root, read-only filesystem)
- [ ] Implement admission controllers (Kubernetes: block deployments that don’t meet security standards)
- [ ] Plan for supply chain security (dependencies, open-source libraries – are they from trusted sources?)
Phase 4: Detection, Monitoring & Response (Weeks 14 – 18)
Logging & Data Collection
- [ ] Enable audit logging for all systems (identity provider, databases, APIs, infrastructure)
- [ ] Aggregate logs to central location (ELK, Splunk, CloudWatch, DataDog)
- [ ] Log all authentication events (success and failure; include context: IP, device, location, time)
- [ ] Log all data access (who read customer records? When? From where?)
- [ ] Log all privileged actions (who deployed code? Who changed firewall rules? Etc.)
- [ ] Log API calls (which service called which? What data was accessed?)
- [ ] Enable AWS CloudTrail (if AWS-based) for all API calls
- [ ] Plan for log retention (how long to keep logs? Compliance requirement? 90 days? 1 year?)
- [ ] Ensure logs are immutable (once written, can’t be modified or deleted; prevents attackers from covering tracks)
Volume Reality: A mid-market eCommerce platform generates 10 – 100GB of logs per day. You need:
– Storage: S3, Glacier for long-term
– Analytics: Splunk or DataDog to query and analyze
– Retention policy: Keep 90 days hot, 1 year archived, destroy after 7 years (or comply with legal hold)
Anomaly Detection & Alerting
- [ ] Set up SIEM (Security Information & Event Management) rules:
- [ ] Alert: Failed login attempts (10+ in 5 minutes = potential attack)
- [ ] Alert: Unusual geographic location (employee in NYC yesterday, Tokyo today = suspicious)
- [ ] Alert: Privilege escalation (regular user suddenly requests admin role = review)
- [ ] Alert: Data exfiltration pattern (service requesting 10,000× normal amount of data = investigate)
- [ ] Alert: Off-hours access (critical system accessed at 3am = verify)
- [ ] Implement behavior analytics (what’s normal? What’s anomalous? ML helps detect novel attacks)
- [ ] Set up alerting thresholds (don’t alert on every anomaly; prioritize high-risk ones)
- [ ] Plan for alert fatigue (too many alerts = people ignore them; tune carefully)
- [ ] Document response procedures (alert fires; what do you do next?)
Tuning: Bemeir typically runs new SIEM rules in “dry run” mode for 2 weeks first. Generates tons of false positives initially. Once tuned, real alerts are actionable.
Incident Response Procedures
- [ ] Document incident response plan (who to call? What’s the first step?)
- [ ] Define severity levels (Severity 1 = customer data may be exposed; Severity 2 = system down but no data risk)
- [ ] Create incident response runbooks (if X event occurs, steps 1 – 10 are: …)
- [ ] Plan for containment (how do you stop an active attack? Revoke tokens? Isolate systems?)
- [ ] Define forensics process (if breached, who collects evidence? Chain of custody?)
- [ ] Plan for communication (who tells customers? Legal? Insurance? Regulators?)
- [ ] Set up war room procedures (incident happens → team joins Slack channel → someone leads; clear roles)
- [ ] Document post-incident review (after incident, analyze: what happened? What failed? What do we fix?)
Critical: Have this plan before an incident. Bemeir recommends tabletop exercises quarterly (“If we were breached tomorrow, who would…?”).
Continuous Security Testing
- [ ] Implement vulnerability scanning (automated: scan systems weekly for known vulnerabilities)
- [ ] Perform penetration testing (hired hackers, annual or bi-annual, attempt to break in; report findings)
- [ ] Set up DAST (Dynamic Application Security Testing) (automated: test running web app for common vulns: SQLi, XSS, CSRF)
- [ ] Implement SAST (Static Application Security Testing) (code analysis tool runs on every code commit; flags security issues)
- [ ] Plan for supply chain security scans (check dependencies for known vulns: npm audit, Snyk, etc.)
- [ ] Document remediation timelines (critical vuln found; how fast can you patch? 24 hours? 1 week?)
| Testing Type | Frequency | Purpose | Cost |
|---|---|---|---|
| Vulnerability Scanning | Weekly automated | Find known vulns | $5 – 20K/year (tool) |
| SAST (code analysis) | Per commit | Catch security bugs in code | $20 – 50K/year |
| DAST (runtime testing) | Daily automated | Test running app | $15 – 40K/year |
| Penetration Testing | Annual/Bi-annual | Real-world attack sim | $30 – 80K per engagement |
Phase 5: Governance & Continuous Improvement (Ongoing)
Access Reviews & Recertification
- [ ] Quarterly access review (all employees: do they still need their current permissions?)
- [ ] Manager certification (manager must certify: “Yes, this person still needs this access”)
- [ ] Contractor audit (contractors should have temporary, revoked access after project ends)
- [ ] Service account review (API keys, service credentials – are they still needed? Rotate quarterly)
- [ ] Privileged access review (who has admin? Do they really need it? Limit to 10% of staff)
Policy & Documentation
- [ ] Document zero-trust architecture (diagram, decision log, rationale)
- [ ] Create security policy (what’s acceptable? What’s prohibited?)
- [ ] Build change management policy (how are changes approved? Who can deploy?)
- [ ] Establish incident response policy (who’s involved? What’s the communication plan?)
- [ ] Document compliance requirements (GDPR, SOC 2, PCI – what specifically do we need to do?)
- [ ] Train team on policies (everyone understands expectations)
Metrics & KPIs
- [ ] Track mean time to detect (MTTD) breach (goal: <1 hour for critical incidents)
- [ ] Track mean time to respond (MTTR) (goal: <4 hours)
- [ ] Monitor authentication latency (goal: <100ms even during peak load)
- [ ] Measure false positive rate (goal: <10% of alerts are false)
- [ ] Track access request SLA (goal: access granted within 24 hours)
- [ ] Monitor privileged access usage (goal: <5% of staff have admin; document why)
- [ ] Calculate security improvement (fewer incidents? Faster detection? Report to board)
Common Implementation Blockers & Solutions
| Blocker | Root Cause | Solution |
|---|---|---|
| MFA breaks integrations | Third-party tools don’t support MFA | Use service accounts with OAuth2 instead of human user MFA |
| Latency increases | Token validation adds overhead | Implement token caching; validate once per session instead of per request |
| Team resists change | “This is too much friction” | Gradual rollout; start with non-critical systems; measure friction |
| Legacy system can’t do OAuth2 | Old application doesn’t support modern auth | Build OAuth2 adapter/shim; eventually sunset legacy system |
| Too many false positives | SIEM rules too strict | Tune rules; run in dry-run mode first; educate on what’s “normal” |
| Cost too high | Identity provider, SIEM, security tools expensive | Start with open-source (Keycloak, ELK); graduate to managed as you scale |





