You can’t read code.
That’s fine. Neither can most non-technical founders who’ve built successful tech companies.
But here’s where most non-technical founders go wrong: they nod at code reviews, say “looks good,” and hope their developer knows what they’re doing.
This is dangerous.
You cannot judge the code. But you MUST judge the process.
The difference matters. “It works” is not the same as “it’s well-made.” A feature can work perfectly today and become an expensive disaster in six months. The code itself won’t tell you which—but the process will.
This framework gives you six observable signals that indicate engineering quality without reading a single line of code. Four come directly from Google’s DORA research (which studied thousands of teams over six years); two are qualitative signals that complement the data. Each can be assessed through a simple conversation.
TL;DR:
- You can’t evaluate code quality directly—but you can evaluate engineering practices
- Six signals indicate delivery performance and risk: rollback capability, testing discipline, deployment frequency, lead time, trade-off transparency, and failure handling
- Four of these (rollback/recovery, deployment frequency, lead time, change failure rate) are DORA metrics; two (trade-off transparency, failure handling) are qualitative add-ons
- Ask these questions monthly to catch problems before they become expensive
Why Process Beats Code Review
Here’s what the research tells us:
Google’s DORA program spent six years studying thousands of software teams to understand what separates elite performers from the rest. Their conclusion: process metrics predict outcomes better than technical metrics.
Microsoft’s research on code reviews found that the presence of subject-matter experts and participation levels were better predictors of code quality than the review content itself.
The pattern is consistent: rigorous engineering processes drive quality, not individual code inspection.
As a non-technical founder, you’re not qualified to judge whether code is good. But you’re absolutely qualified to judge whether your team:
- Can recover quickly when things break
- Tests their work before shipping
- Ships frequently in small batches
- Delivers changes quickly from idea to production
- Makes deliberate trade-offs they can explain
- Plans for failure scenarios
These six signals map to proven research. Let’s break them down.
Signal #1: The Rollback Test
The question: “If we deploy this and it breaks, how do we revert?”
Good answer: “One-click rollback to the previous version, takes under 5 minutes.”
Red flag: “We’d push a fix.” (This means they have no rollback plan.)
Why This Matters
Research from Amazon’s deployment practices shows they “fully prepare for rollback before every deployment.” They consider any version that can’t be rolled back safely as not ready for production.
The DORA research program found that recovery time—how quickly you can restore service after a failure—is one of four key metrics that separate elite engineering teams from average ones. Elite teams recover in under an hour; low performers take days.
What You’re Really Testing
This question reveals:
- Does your team plan for failure? Good engineers assume things will break
- Is deployment reversible? Or is every release a one-way door?
- How long would an outage last? Minutes vs. hours vs. days?
The Follow-Up Questions
If they say “one-click rollback,” ask:
- “When did you last test the rollback process?”
- “How long does rollback actually take?”
- “What data would we lose if we rolled back?”
If they say “we’d push a fix,” dig deeper:
- “How long would that take at 2 AM on Saturday?”
- “What happens to customers during that time?”
- “Have we ever had to do this?”
The benchmark: Elite teams recover from failures in under an hour. If your answer is “we’d figure it out,” you’re exposed.
Signal #2: The Testing Signal
The question: “Does this change include tests?”
You can’t read the test code. But you can count.
Good sign: Tests exist, even if you can’t evaluate their quality.
Warning sign: A change with 500 lines of code and 0 tests.
Why This Matters
The research here is nuanced but important.
A landmark study published at ICSE found that “coverage is not strongly correlated with test suite effectiveness when the number of test cases is controlled for.”
Translation: More tests doesn’t automatically mean better quality.
But the absence of tests is a reliable signal. Research on code review effectiveness shows that combining code review with testing catches significantly more defects than either practice alone.
What You’re Actually Looking For
You’re not evaluating test quality. You’re checking for the existence of testing discipline.
Zero tests on a significant code change means one of three things:
- The developer doesn’t write tests (process problem)
- The change was rushed (schedule problem)
- The code is untestable (architecture problem)
All three are red flags.
The Practical Application
Ask your developer: “For the last feature we shipped, can you show me the tests?”
Good answer: Shows you a list of test files, explains what they cover
Concerning answer: “We tested it manually” or “It works, we checked”
Warning sign: “Tests slow us down” or “We’ll add them later”
The data contradicts the “tests slow us down” argument. The upfront investment pays off: teams with testing discipline spend less time debugging and more time building.
What “Good” Looks Like
You don’t need 100% test coverage. Research from Apache systems found that test-related factors have “a limited connection to post-release defects” when controlling for other metrics.
What you need:
- Tests exist for critical business logic
- Tests run automatically before deployment
- Test failures block deployment
This is binary. Either these things happen or they don’t.
Signal #3: Deployment Frequency
The question: “How often do we deploy to production?”
Elite benchmark: Multiple times per day
High performer: At least weekly
Warning sign: Less than monthly
Why This Matters
Google’s DORA research has studied this for years across thousands of engineering teams. Their findings are clear: deployment frequency is a key indicator of software delivery performance.
Why? Because frequent deployment means:
- Smaller changes (less risk per deployment)
- Faster feedback loops (problems found sooner)
- Better automation (manual deployments don’t scale)
- More confidence (teams that fear deployment have problems)
The DORA data shows elite teams deploy on-demand—multiple times per day—while low performers deploy monthly or less. Elite teams are twice as likely to meet or exceed their organizational performance targets.
What Infrequent Deployment Reveals
If your team deploys monthly or less:
- Changes are batched - More changes per deployment = more risk per deployment
- Deployment is manual - Someone has to “do the deployment” rather than it happening automatically
- Deployment is scary - The team holds their breath and hopes nothing breaks
- Feedback is slow - A bug introduced today won’t be found for weeks
The Conversation
Ask: “How many times did we deploy last month?”
Good answer: Specific number, ideally weekly or more
Concerning: “A couple times” or “when we have features ready”
Red flag: “We do big releases quarterly”
If deployment is infrequent, ask why:
- “What would it take to deploy weekly?”
- “What’s blocking more frequent releases?”
- “Is deployment manual or automated?”
Industry Context
DORA defines Change Failure Rate as the percentage of deployments causing problems. Elite teams keep this under 15%; the industry average is significantly higher.
If your team deploys rarely AND has high failure rates when they do deploy, you have compound problems: big batches of risky changes going out without adequate safety nets.
Signal #4: Lead Time
The question: “Once code is written, how long until it’s live for customers?”
Elite benchmark: Less than one day
High performer: Less than one week
Warning sign: More than one month
Why This Matters
Lead time for changes is the fourth DORA metric. Technically, DORA measures from code commit to production deployment—but for a non-technical founder, the practical question is simpler: once your developer says “it’s done,” how long until customers see it?
This matters for two reasons:
- Fast feedback: Shorter lead times mean you learn faster whether something works
- Competitive advantage: Teams that ship in days can respond to market changes; teams that ship in months cannot
What Long Lead Times Reveal
If your team takes weeks to ship a small change:
- Too much process: Approvals, reviews, and handoffs are creating bottlenecks
- Manual steps: Deployments require human intervention at multiple points
- Large batches: Changes are being bundled together instead of shipped incrementally
- Environment problems: Getting code from development to production is difficult
The Conversation
Ask: “If we wanted to add a small feature—say, changing button text—how long until customers see it?”
Good answer: “A day or two, including review and testing.”
Concerning: “A week, maybe two.”
Warning sign: “We’d bundle it into the next release, so probably a month.”
The feature itself might be trivial, but the answer reveals your entire delivery pipeline.
Signal #5: Trade-Off Transparency
The question: “What did we compromise to ship this?”
Every feature has a cost. Senior engineers know this and can articulate it. Junior engineers (or engineers trying to hide problems) pretend everything is perfect.
Good answer: A specific list of trade-offs made and why
Warning sign: “Nothing, it’s done right” (this is almost never true)
Why This Matters
Every piece of software carries technical debt—shortcuts taken to ship faster, optimizations deferred, documentation skipped. This isn’t inherently bad; it’s a conscious business decision to trade speed now for work later.
The problem is invisible debt. When engineers don’t acknowledge trade-offs, debt accumulates silently until it becomes a crisis.
The question isn’t whether your team makes trade-offs—they do. The question is whether they’re aware of them and transparent about them.
What Honest Answers Sound Like
“We shipped the feature, but:
- The database queries aren’t optimized—fine for 100 users, will need work at 10,000
- Error messages are generic—users won’t know exactly what went wrong
- We hardcoded some values that should be configurable later
- The mobile experience is functional but not polished”
This is healthy engineering. Trade-offs were made consciously and documented.
What Concerning Answers Sound Like
- “It’s production-ready” (with no caveats)
- “We built it right the first time”
- “There’s no technical debt”
No software is debt-free. If your developer claims zero debt, they’re either not aware of it (concerning) or not telling you (more concerning).
How to Use This
After a feature ships, ask: “Walk me through the trade-offs you made.”
Listen for:
- Specificity - Vague answers suggest they haven’t thought about it
- Business context - Good engineers connect technical choices to business impact
- Future awareness - What will need attention later?
This isn’t about catching them in problems. It’s about ensuring they’re thinking about the full picture.
Signal #6: The Failure Demo
The question: “Show me what happens when [X] fails.”
Don’t ask if it works. Assume it works. Ask what happens when it doesn’t.
Good answer: Demonstrates graceful error handling, clear user messages, logging
Red flag: “I’m not sure” or the app crashes
Why This Matters
Systems fail. APIs go down. Users enter unexpected data. Networks disconnect. The question isn’t whether these things will happen—they will. The question is whether your software handles them gracefully or crashes spectacularly.
The Scenarios to Test
Pick scenarios relevant to your business:
- “What happens if the payment API is down?”
- “What if a user enters a 10,000-character message?”
- “What if the database is slow?”
- “What if someone submits the form twice quickly?”
What Good Failure Handling Looks Like
User sees a helpful message - Not “Error 500” but “We couldn’t process your payment. Please try again or contact support.”
The system doesn’t crash - One failure shouldn’t take down everything
Errors are logged - Someone can investigate what happened
Data isn’t corrupted - A failed transaction shouldn’t leave things in a broken state
The Conversation
Pick one realistic failure scenario and ask: “Can you show me what users would see?”
If they can demo it confidently, good. If they say “let me check” or the demo reveals generic error messages and crashes, you’ve found work to prioritize.
Why This Signal Matters
Your goal isn’t to eliminate all failures—that’s impossible. Your goal is to ensure failures are handled gracefully and visibly. The difference between a minor incident and a customer crisis is often just error handling.
Putting It Together: Monthly Check-Ins
Don’t ask all six questions at once. Instead, rotate through them:
Week 1: Rollback Test
- “If our last deployment broke something, how would we revert?”
Week 2: Testing Signal
- “For the feature we just shipped, can you show me the tests?”
Week 3: Deployment Frequency + Lead Time + Change Failure Rate
- “How many times did we deploy this month?”
- “Once the code was ready, how long until it was live?”
- “How many of those deployments caused problems we had to fix?” (This is the change failure rate)
Week 4: Trade-Off Transparency + Failure Demo
- “What compromises did we make on [recent feature]?”
- “What happens if [relevant failure scenario]?”
Building a Dashboard
Track these over time:
| Metric | This Month | Last Month | Trend |
|---|---|---|---|
| Deployments | 12 | 8 | ↑ Good |
| Change failure rate | 1/12 (8%) | 2/8 (25%) | ↓ Good |
| Recovery time | 5 min | 15 min | ↓ Good |
| Lead time (avg) | 3 days | 5 days | ↓ Good |
| Tests per feature | Yes | Yes | → Stable |
You don’t need sophisticated tools. A spreadsheet works fine. The point is visibility over time.
When These Signals Reveal Problems
Scenario A: Your Team Scores Well
Great. You have engineering discipline in place.
Still worth doing:
- Continue monthly check-ins to maintain standards
- Use our website scanner to verify externally visible issues
- Document the processes so they survive team changes
Scenario B: One or Two Signals Are Weak
Common. Most teams have gaps.
What to do:
- Don’t panic or assign blame
- Prioritize: Rollback capability and basic testing first
- Work with your developer on a plan to address gaps
- Revisit in 30 days
Scenario C: Multiple Signals Are Red Flags
You may have a systemic problem.
Indicators of systemic issues:
- No rollback capability AND no tests AND manual deployments
- Defensive answers to basic questions
- “You don’t need to worry about that”
What to do:
- Get an independent technical assessment
- Consider whether this is a skills gap or a priorities gap
- Book a consultation for objective evaluation
The Real Pattern
These six signals share common traits:
✅ They’re observable without reading code ✅ They’re backed by research and industry best practices ✅ They correlate with delivery performance, reliability, and risk ✅ Any competent developer should be able to discuss them openly
You’re not evaluating the code. You’re evaluating whether your team follows practices that lead to good code.
The DORA research is clear: process metrics predict performance better than technical metrics. Teams with good practices build good software. Teams without them don’t—regardless of how talented the individuals are.
What to Do Next
Start here:
- Pick Signal #1 (Rollback Test) for your next conversation
- Ask the question without judgment
- Note the answer and follow up if needed
If you want objective data:
- Scan your website for externally visible issues
- Assess your vendor dependency risk
- Compare what you find to what your developer tells you
If you need help:
- Book a 30-minute consultation
- I’ll help you interpret the signals and build an action plan
- No jargon, no sales pitch—just honest assessment
Sources & Further Reading
- DORA Metrics: The Four Keys - Google’s DevOps Research and Assessment (six years of research across thousands of teams)
- Ensuring Rollback Safety During Deployments - Amazon Builders Library
- Characteristics of Useful Code Reviews - Microsoft Research
- Coverage and Test Suite Effectiveness - ICSE 2014 (ACM)
- Code Review Effectiveness: An Empirical Study - IET Software, Wiley
- Test-Related Factors and Software Quality - Empirical Software Engineering, Springer


