
The most common mistake in Adobe Commerce agency selection is choosing based on conversation. A second-most-common mistake is choosing based on case study. The most reliable method we have seen, in practice, is the structured bake-off: a paid, time-boxed exercise where two or three finalist agencies do a small piece of real work, side by side, on a representative slice of your codebase or a sanitized clone. You learn more in a two-week bake-off than you do in two months of meetings.
The bake-off works because it shifts evaluation from selling to doing. Anyone can write a proposal. Far fewer agencies can take an ambiguous backlog, scope it, build it, deploy it, and present the work to a technical audience inside 10 working days. The bake-off forces them to demonstrate, in the open, what their actual delivery operation looks like.
Bemeir has been on both sides of bake-offs and we recommend them whenever the engagement size justifies the investment. The mechanics matter, though. Run them wrong and you waste your time and the agency’s. Run them right and the result is so clear that the decision makes itself.
When a bake-off is the right call
A bake-off is appropriate when three conditions are met. First, the contract value is at least 150,000 dollars per year, which justifies the time investment from both sides. Second, you have at least two finalists you already believe could do the work, which means you have done your earlier diligence properly. Third, you have a representative slice of work you can scope cleanly in 10 to 15 business days. If any of these conditions fails, run a smaller diligence exercise instead.
A bake-off is the wrong call when the engagement is short, when only one finalist is viable, or when the work cannot be cleanly carved out. Forcing a bake-off in those cases produces friction without insight. Save the discipline for the decisions that deserve it.
Designing the brief
The brief is the single most important document in the bake-off. A good brief is specific enough that both agencies are building the same thing, and open enough that you can see the difference in how each one approaches the problem. Get the balance wrong in either direction and the comparison becomes apples to oranges.
A working brief covers seven elements. The business context for why the work matters. The technical scope, with explicit in-scope and out-of-scope lists. The acceptance criteria, written as testable statements. The technical constraints, including platform version, hosting environment, and integration requirements. The deliverable format, including code, documentation, and presentation expectations. The timeline and budget envelope. The evaluation criteria you will use to score the work.
For an Adobe Commerce bake-off, a typical brief might ask the agencies to build a small feature against a sanitized clone of your codebase. Something like adding a custom product attribute, a new checkout step, or a B2B-specific UI element. The work should take about 50 hours of senior developer effort to do well. Less than that and the agencies cannot show their depth. More than that and the cost gets meaningful.
The Adobe Commerce DevDocs module structure overview is a reasonable reference point for setting scope constraints. The work should require module structure that is recognizable to anyone who has built on Magento for more than a year.
How to pay for it
Pay both agencies. The bake-off only works if both sides are commercially incentivized to take it seriously. The going rate is between 5,000 and 15,000 dollars for a 10-day exercise, depending on scope. Agencies that refuse a paid bake-off are either too booked to take the work, too proud to compete openly, or too thin on senior talent to spare the bandwidth. Any of those is a useful signal.
Be transparent with both agencies about the structure. They should know they are competing. They should know the budget envelope is identical for both. They should know the evaluation criteria in advance. The point is to compete on craft, not on guessing what you want to see. Bemeir has run bake-offs both transparently and opaquely and the transparent ones produce better work from everyone involved.
The schedule that works
A 10-business-day bake-off has roughly this shape.
Day 1 is kickoff. Both agencies join a 90-minute call with you, the lead developer, and any product or design counterpart. You walk through the brief, answer questions, and provide access to the sanitized clone and the build environment. After the kickoff, the agencies go heads-down.
Days 2 through 4 are scoping and planning. The agencies should be reading the codebase, scoping the work in detail, and asking written questions. Treat the volume and quality of their questions as a signal in itself. Strong teams ask sharp questions. Weak teams build silently and produce surprises.
Days 5 through 8 are core build. Code is being written, pull requests are being opened on the bake-off repository, and a designated agency lead is pushing the work forward. You should be able to watch the commit history evolve in real time.
Day 9 is internal review and documentation. The agencies should be producing the supporting artifacts: a deployment guide, a brief architecture decision record, and a list of follow-up work they would do in a real engagement.
Day 10 is presentation. Each agency presents the work in a 60-minute session, walks through their approach and trade-offs, and answers questions from your team. Recorded if possible, so you can rewatch alongside the artifacts.
What to score and how
Score on five dimensions, with weights that reflect what matters to you. The table below is the format we recommend, with weights that work for a typical mid-market engagement.
| Dimension | Weight | What you are looking for |
|---|---|---|
| Code quality | 25% | Idiomatic Magento, clean module structure, sensible tests |
| Architectural judgment | 20% | The trade-offs they chose and why |
| Communication and questions | 15% | The volume and quality of their questions during the build |
| Documentation produced | 15% | Did they actually leave artifacts a successor could use |
| Presentation and reasoning | 15% | Could they defend their choices coherently |
| Adherence to brief | 10% | Did they build what was asked, or what they wanted to build |
The 10 percent on adherence is intentional. You do not want to reward agencies that build something fancier than the brief while skipping the actual ask. Discipline counts.
Score independently. Three evaluators on your side, each scoring privately, then comparing. If two evaluators score the same finalist top and the third disagrees, talk about the disagreement. Calibration is what makes the bake-off rigorous rather than performative.
What the bake-off reveals that nothing else does
Three things show up in a bake-off that you cannot see any other way. The first is how the agency handles ambiguity. Real briefs always have gaps. Watching how each team fills the gaps, what they ask, what they assume, and what they document, tells you exactly what working with them at scale will feel like. The second is how the lead developer thinks. The proposal-stage senior architect may or may not be the person doing the actual work. The bake-off forces the actual person to be visible, and you learn whether you would want them on your team. The third is how their tooling and process actually function. Pull requests show you the merge process. Documentation shows you the writing culture. The deploy guide shows you whether they think about operations or just about features.
Bemeir’s experience is that the difference between the strongest and weakest finalist in a bake-off is much larger than the difference between the strongest and weakest finalist in a proposal review. The bake-off compresses six months of working relationship into two weeks. That is what makes it expensive and that is what makes it worth running.
After the bake-off
Whichever agency loses should still get a real debrief. They invested real work for you and they deserve a 30-minute call where you tell them how the evaluation came out and what would have changed your mind. This is partly courtesy and partly self-interest: you may want to work with them in a year or two, and the bake-off industry is small enough that reputation travels.
The winning agency should start their real engagement with a head start. You already have a sanitized clone of your codebase, you already have a working build environment, and you already have a documented small feature with the test infrastructure in place. The bake-off has effectively been week one of the real engagement.
If you are running this exercise across multiple platforms, run a parallel bake-off against the relevant teams for each. Compare Bemeir’s Adobe Commerce practice against other agencies on Magento, run Hyvä finalists side by side, run Shopify Plus candidates head to head if Shopify is in the mix. The same brief structure works across all of them. Adjust the technical specifics, keep the evaluation discipline constant.
A well-run bake-off is one of the most defensible procurement choices a CTO can make. It compresses uncertainty, surfaces tradeoffs, and produces evidence that will hold up to scrutiny from the CFO and the board. Six weeks of meetings cannot do that. The investment is worth it on any engagement where the stakes are large enough to justify the rigor, and most multi-year Adobe Commerce contracts are exactly that.




