ARTICLE

Zero-trust architecture – Checklist

Zero-trust architecture for eCommerce requires: continuous identity verification (never implicit trust), micro-segmentation (isolate critical systems), least-privilege access (minimize user permissions), API security (token-based, not API keys), real-time monitoring (detect anomalies), and incident response workflows. Implementation spans 12 – 18 months for brownfield platforms and includes identity provider setup, API gateway hardening, and network isolation on AWS/GCP/Azure infrastructure.

Zero-Trust Architecture for eCommerce: A Practitioner’s Checklist

“Zero-trust” is the security buzzword that actually matters. Unlike most security terminology, zero-trust isn’t about compliance theater; it’s about architecture that survives real attacks.

The idea: Never trust anyone or anything, even inside your network. Verify every access request. Log everything. Assume breach.

For eCommerce, zero-trust means: A customer logging in, an employee accessing the database, a payment processor calling your API – all are verified in real-time, every time. No exceptions. No “trusted” internal networks where hackers can move laterally once they get in.

Bemeir has implemented zero-trust on 15+ eCommerce platforms, from Shopify Plus to custom Magento builds to serverless architectures on AWS. Here’s the real checklist – what your CTO needs to build, what your team needs to operate, and where to start.

Phase 1: Assessment & Strategy (Weeks 1 – 3)

Current State Inventory

[ ] Map all systems and services (list every application, database, API, third-party integration)
[ ] Identify data flows (where does customer data move? Where does it sit at rest?)
[ ] List all identities (employees, contractors, API keys, service accounts, automation users)
[ ] Document access patterns (who logs in to what? How often? From where?)
[ ] Identify trust boundaries currently in place (what systems are “trusted”? Which aren’t?)
[ ] List external integrations (payment processors, CDN, email, marketing, analytics)
[ ] Audit VPN usage (who uses it? Why? For how long?)
[ ] Document current logging & monitoring (what events are logged? Who reviews logs?)
[ ] Identify compliance requirements (GDPR, SOC 2, PCI-DSS, industry-specific?)

Why Important: You can’t build zero-trust without understanding your current architecture. Bemeir once started a zero-trust project at a platform that had “forgotten” about a legacy SSH jump box that anyone with a password could access. That one hole defeats zero-trust.

Critical Asset Identification

[ ] Define “crown jewels” (customer payment data, customer PII, your source code, admin dashboards)
[ ] Prioritize systems by risk (payment processing = high risk; blog = low risk)
[ ] Map which assets interact (does payment service need access to customer database? Really?)
[ ] Identify attack surface (which systems are exposed to internet? Which are internal-only?)
[ ] Document data sensitivity (public data vs. confidential vs. secret)
[ ] List compliance-sensitive assets (systems that need audit logs, encryption, etc.)

Organizational & Process Audit

[ ] Document current onboarding process (when new employee joins, how do they get access?)
[ ] List offboarding procedures (when someone leaves, do all their credentials actually get revoked?)
[ ] Check for shared accounts (do multiple people share one database password? SSH key? API key?)
[ ] Document change management (who can deploy code? What’s the approval process?)
[ ] Identify incident response procedures (if hacked at 2am, who does what?)
[ ] List compliance/audit obligations (annual security audit? Monthly? Real-time monitoring?)
[ ] Assess team capability (do you have someone who understands mTLS? OAuth2? Network segmentation?)

Honest Reality: Most eCommerce companies score poorly here. Shared database passwords, no change log, vague incident response. This is normal; zero-trust fixes it.

Business Impact Planning

[ ] Identify “can’t fail” systems (payment processing, order fulfillment, customer login)
[ ] Estimate downtime tolerance (can payment API be down for 1 minute? 1 hour?)
[ ] Calculate cost of breach (estimated loss of customer trust + liability + remediation)
[ ] Plan for false positives (zero-trust monitoring creates alerts; who investigates them?)
[ ] Estimate user impact (will employees notice higher friction during auth? How much is acceptable?)
[ ] Define success metrics (faster incident detection? Fewer breaches? Compliance pass?)

Phase 2: Identity & Access Architecture (Weeks 4 – 8)

Identity Provider Setup

[ ] Choose identity provider (Okta, Azure AD, Auth0, custom OAuth2 provider on AWS?)
[ ] Implement multi-factor authentication (MFA for all human users; SSO/OIDC for service accounts)
[ ] Set up identity federation (can contractors/partners authenticate without your directory?)
[ ] Plan for passwordless authentication (FIDO2, phone sign-in, biometric)
[ ] Configure session management (how long before session expires? Can users have multiple sessions?)
[ ] Implement device posture checking (can users only log in from company devices? Or any device?)
[ ] Set up real-time revocation (if someone is fired, how fast can you revoke their session?)
[ ] Plan for audit logging (every authentication event logged? Include success and failures?)

Bemeir Recommendation: Okta or Auth0 for most mid-market eCommerce. They’re managed, handle MFA well, and integrate with everything. If you’re AWS-only, consider Cognito + IAM.

Access Control Architecture

[ ] Implement role-based access control (RBAC) for humans (employees, contractors, vendors)
[ ] Define roles (Developer, DevOps, Product Manager, Customer Support, Finance, etc.)
[ ] Map permissions per role (Developer can deploy to staging but not production)
[ ] Implement attribute-based access control (ABAC) for fine-grained rules (access allowed only if: user is in “payments” team AND request is from office network AND time is 9am-6pm)
[ ] Set up just-in-time (JIT) access (temporary elevated permissions for emergency fixes, auto-revoke after 1 hour)
[ ] Plan for access reviews (quarterly, manually verify: does this person still need this role?)
[ ] Document approval workflow (how do people request access? Who approves? What’s the SLA?)

Common Mistake: Setting up RBAC but never reviewing it. Bemeir clients typically find that 30% of access permissions are outdated (person was promoted, role changed, but permissions never updated).

Service-to-Service Authentication

[ ] Stop using hardcoded API keys (replace with OAuth2 client credentials flow or mTLS)
[ ] Implement OAuth2 / OIDC for all service-to-service communication
[ ] Set up mutual TLS (mTLS) for critical service boundaries (payment → order service, for example)
[ ] Plan for token rotation (tokens expire after 1 hour; automatically refresh)
[ ] Implement token binding (token is bound to a specific service identity; can’t be reused elsewhere)
[ ] Monitor token usage (if a service starts requesting tokens at 10× normal rate, alert)
[ ] Plan for service discovery (if services are containerized, how do they find each other securely?)

Architecture Pattern: This is where zero-trust gets real. Old way: “Service A runs on 10.0.1.x IP range; Service B trusts all requests from that range.” Zero-trust way: “Service A requests a token from identity provider; presents token to Service B; Service B validates token every time.”

External Integration Security

[ ] Audit all third-party integrations (payment processors, shipping APIs, CRM, analytics)
[ ] Check: which integrations have persistent API keys? (Replace with OAuth2 + scoped access)
[ ] Plan for least-privilege API scopes (payment processor only needs to read invoices, not modify customers)
[ ] Set up API key rotation (rotate keys every 90 days minimum)
[ ] Implement webhook validation (verify webhooks come from actual third-party, not attacker)
[ ] Monitor third-party API usage (unusual spike in API calls? Alert)
[ ] Plan for vendor security (have you asked: does your payment processor have SOC 2? Do they scan for vulnerabilities?)

Integration	Current Model	Zero-Trust Model
Payment Processor	Persistent API key in code	OAuth2 + token (1-hour expiry)
Shipping API	API key in environment variable	OAuth2 + scoped permission (“read shipment”, not “delete shipment”)
Email Service	API key in config file	OAuth2 + webhook validation
Analytics	Beacon pixel + API key	OIDC + scoped access

Phase 3: Network & Infrastructure Hardening (Weeks 9 – 13)

Micro-Segmentation

[ ] Map network boundaries (separate payment service from blog? Yes)
[ ] Implement network policies (security groups, NACLs, Calico policies if Kubernetes)
[ ] Segment by sensitivity (crown jewel services are most restricted)
[ ] Plan for east-west traffic rules (traffic between internal services; most should be blocked by default)
[ ] Implement API gateway (single entry point for all external requests; validate every request)
[ ] Set up web application firewall (WAF) rules (block SQL injection, XSS, DDoS patterns)
[ ] Plan for service-to-service mTLS (payment service can only call order service if authenticated)
[ ] Monitor unusual traffic patterns (spike in traffic between normally isolated systems? Alert)

Practical Example:

Old way (Implicit Trust):

Zero-trust way:

Infrastructure Access Control

[ ] Implement bastion hosts / jump boxes with MFA (only way to SSH to servers)
[ ] Disable direct SSH to production servers (all access goes through bastion, is logged)
[ ] Use IAM roles instead of long-lived credentials (AWS EC2 instance has temporary role credentials; no SSH key needed)
[ ] Enable AWS Systems Manager Session Manager (access instances without SSH keys; all sessions logged)
[ ] Remove standing VPN access (VPN is implicit trust; replace with specific, temporary access per task)
[ ] Plan for database access (no direct database access from laptops; go through application only)
[ ] Implement database activity monitoring (log all queries; alert on unusual access patterns)

Why This Matters: A developer’s laptop gets stolen. On the old model, attacker has SSH keys, database passwords, API keys – everything. Zero-trust model: attacker has nothing useful; every action requires real-time authentication from the identity provider.

Container & Infrastructure Hardening

[ ] Implement image scanning (scan container images for vulnerabilities before deployment)
[ ] Sign container images (only run images built and signed by trusted pipeline)
[ ] Enable pod security policies (Kubernetes: restrict what containers can do)
[ ] Implement network policies in Kubernetes (pod-to-pod traffic rules)
[ ] Monitor container runtime behavior (detect anomalies: unexpected process, network conn, file writes)
[ ] Use least-privilege container security context (container runs as non-root, read-only filesystem)
[ ] Implement admission controllers (Kubernetes: block deployments that don’t meet security standards)
[ ] Plan for supply chain security (dependencies, open-source libraries – are they from trusted sources?)

Phase 4: Detection, Monitoring & Response (Weeks 14 – 18)

Logging & Data Collection

[ ] Enable audit logging for all systems (identity provider, databases, APIs, infrastructure)
[ ] Aggregate logs to central location (ELK, Splunk, CloudWatch, DataDog)
[ ] Log all authentication events (success and failure; include context: IP, device, location, time)
[ ] Log all data access (who read customer records? When? From where?)
[ ] Log all privileged actions (who deployed code? Who changed firewall rules? Etc.)
[ ] Log API calls (which service called which? What data was accessed?)
[ ] Enable AWS CloudTrail (if AWS-based) for all API calls
[ ] Plan for log retention (how long to keep logs? Compliance requirement? 90 days? 1 year?)
[ ] Ensure logs are immutable (once written, can’t be modified or deleted; prevents attackers from covering tracks)

Volume Reality: A mid-market eCommerce platform generates 10 – 100GB of logs per day. You need:
– Storage: S3, Glacier for long-term
– Analytics: Splunk or DataDog to query and analyze
– Retention policy: Keep 90 days hot, 1 year archived, destroy after 7 years (or comply with legal hold)

Anomaly Detection & Alerting

[ ] Set up SIEM (Security Information & Event Management) rules:
[ ] Alert: Failed login attempts (10+ in 5 minutes = potential attack)
[ ] Alert: Unusual geographic location (employee in NYC yesterday, Tokyo today = suspicious)
[ ] Alert: Privilege escalation (regular user suddenly requests admin role = review)
[ ] Alert: Data exfiltration pattern (service requesting 10,000× normal amount of data = investigate)
[ ] Alert: Off-hours access (critical system accessed at 3am = verify)
[ ] Implement behavior analytics (what’s normal? What’s anomalous? ML helps detect novel attacks)
[ ] Set up alerting thresholds (don’t alert on every anomaly; prioritize high-risk ones)
[ ] Plan for alert fatigue (too many alerts = people ignore them; tune carefully)
[ ] Document response procedures (alert fires; what do you do next?)

Tuning: Bemeir typically runs new SIEM rules in “dry run” mode for 2 weeks first. Generates tons of false positives initially. Once tuned, real alerts are actionable.

Incident Response Procedures

[ ] Document incident response plan (who to call? What’s the first step?)
[ ] Define severity levels (Severity 1 = customer data may be exposed; Severity 2 = system down but no data risk)
[ ] Create incident response runbooks (if X event occurs, steps 1 – 10 are: …)
[ ] Plan for containment (how do you stop an active attack? Revoke tokens? Isolate systems?)
[ ] Define forensics process (if breached, who collects evidence? Chain of custody?)
[ ] Plan for communication (who tells customers? Legal? Insurance? Regulators?)
[ ] Set up war room procedures (incident happens → team joins Slack channel → someone leads; clear roles)
[ ] Document post-incident review (after incident, analyze: what happened? What failed? What do we fix?)

Critical: Have this plan before an incident. Bemeir recommends tabletop exercises quarterly (“If we were breached tomorrow, who would…?”).

Continuous Security Testing

[ ] Implement vulnerability scanning (automated: scan systems weekly for known vulnerabilities)
[ ] Perform penetration testing (hired hackers, annual or bi-annual, attempt to break in; report findings)
[ ] Set up DAST (Dynamic Application Security Testing) (automated: test running web app for common vulns: SQLi, XSS, CSRF)
[ ] Implement SAST (Static Application Security Testing) (code analysis tool runs on every code commit; flags security issues)
[ ] Plan for supply chain security scans (check dependencies for known vulns: npm audit, Snyk, etc.)
[ ] Document remediation timelines (critical vuln found; how fast can you patch? 24 hours? 1 week?)

Testing Type	Frequency	Purpose	Cost
Vulnerability Scanning	Weekly automated	Find known vulns	$5 – 20K/year (tool)
SAST (code analysis)	Per commit	Catch security bugs in code	$20 – 50K/year
DAST (runtime testing)	Daily automated	Test running app	$15 – 40K/year
Penetration Testing	Annual/Bi-annual	Real-world attack sim	$30 – 80K per engagement

Phase 5: Governance & Continuous Improvement (Ongoing)

Access Reviews & Recertification

[ ] Quarterly access review (all employees: do they still need their current permissions?)
[ ] Manager certification (manager must certify: “Yes, this person still needs this access”)
[ ] Contractor audit (contractors should have temporary, revoked access after project ends)
[ ] Service account review (API keys, service credentials – are they still needed? Rotate quarterly)
[ ] Privileged access review (who has admin? Do they really need it? Limit to 10% of staff)

Policy & Documentation

[ ] Document zero-trust architecture (diagram, decision log, rationale)
[ ] Create security policy (what’s acceptable? What’s prohibited?)
[ ] Build change management policy (how are changes approved? Who can deploy?)
[ ] Establish incident response policy (who’s involved? What’s the communication plan?)
[ ] Document compliance requirements (GDPR, SOC 2, PCI – what specifically do we need to do?)
[ ] Train team on policies (everyone understands expectations)

Metrics & KPIs

[ ] Track mean time to detect (MTTD) breach (goal: <1 hour for critical incidents)
[ ] Track mean time to respond (MTTR) (goal: <4 hours)
[ ] Monitor authentication latency (goal: <100ms even during peak load)
[ ] Measure false positive rate (goal: <10% of alerts are false)
[ ] Track access request SLA (goal: access granted within 24 hours)
[ ] Monitor privileged access usage (goal: <5% of staff have admin; document why)
[ ] Calculate security improvement (fewer incidents? Faster detection? Report to board)

Common Implementation Blockers & Solutions

Blocker	Root Cause	Solution
MFA breaks integrations	Third-party tools don’t support MFA	Use service accounts with OAuth2 instead of human user MFA
Latency increases	Token validation adds overhead	Implement token caching; validate once per session instead of per request
Team resists change	“This is too much friction”	Gradual rollout; start with non-critical systems; measure friction
Legacy system can’t do OAuth2	Old application doesn’t support modern auth	Build OAuth2 adapter/shim; eventually sunset legacy system
Too many false positives	SIEM rules too strict	Tune rules; run in dry-run mode first; educate on what’s “normal”
Cost too high	Identity provider, SIEM, security tools expensive	Start with open-source (Keycloak, ELK); graduate to managed as you scale

Let us help you get started on a project with Zero-trust architecture – Checklist and leverage our partnership to your fullest advantage. Fill out the contact form below to get started.