Component Integration Testing

We ended What Is Your Test’s Value Proposition? with a proposal:

What if instead of focusing on passing tests… we focused on making sure the tests actually expressed the needs of the customer.

This proposal sat inside a valid but hypothetical scenario: new legislation forced you to apply zone-specific discounts when processing payments towards invoices. We updated our unit tests; they passed and the build stayed green. The contract, however, did not. Invoice processing was incorrect for two weeks before customers gave us the feedback that something was wrong.

This article is the second part of that story. Instead of writing a test that validates the code we wrote behaves as expected, we propose a different type of test (not a unit test and not a full-blown end-to-end test): the component-integration test.

A component here is a logical module. The most important one for our purposes is the HTTP controller (or your ecosystem’s equivalent). That’s the module that sits at the boundary between the outside world and your code, and it’s what the test targets. We want to check how the component responds to the user and which side effects it produces. From a tooling perspective, we automate the interaction via HTTP just like a real user would, which lets us verify two important aspects of what the endpoint does:

The contract: for HTTP, that’s the status code and the request/response body and headers.
The side effect: what the system does when the user uses the contract (an invoice balance updated, a PaymentProcessed record persisted, a receipt sent, etc.).

The test we ended the previous article on didn’t really prove either. It only proved that the controller called the methods we told it to call. What good does that do? What does that do for its value proposition? The component-integration test proves the endpoint works, but more importantly it validates that your system adheres to its contract.

This article focuses on HTTP services. The same shape applies to other entry points: a CLI tool (the contract is stdout/stderr/exit code), a message-driven worker (the contract is the broker payload), a scheduled job (the contract is the run’s exit status and any side effects it produced). We’ll sketch the worker variant later on; the others are out of scope here.

The test we left off with

Previously, we had a unit test of InvoiceController.applyPayment with InvoiceService mocked. Here is our Java code from before:

What can we do with this style of test? We can refactor every internal in this controller and as long as applyPaymentTowards is called, the test stays green. Technically, we could intentionally break the controller. Return the wrong status code, deserialize the wrong field, forget the receipt entirely. The test would still pass. The test is coupled to implementation detail and does not express the important detail: the contract.

Improving the value proposition

Instead of mocking the controller class, focus on what we are actually trying to test. Two things: the specification, and the feedback loop that validates it. What is our primary feedback loop? It should mirror how our user uses the app. So let’s automate that loop by defining how a real user would use the application. That is our specification.

For a web app, the test does four things:

Stands up the smallest real version of your app: actual routes, middleware, validation, serialization.
Sends a real HTTP request through the framework’s test client.
Asserts on the response: status, body, headers.
Asserts on the side effects: what’s in the service’s datastore (did we persist the data the spec says we should?), any downstream impact (did we publish that event correctly?).

Optimizing automated feedback loops

If we focus too much on mocks, with every dependency stubbed, the test validates only that the controller calls the methods we told it to. Sometimes you even inherit code where the tests mock the component being tested. The mocks define their own truth, and the test verifies adherence to that fiction, not adherence to the customer’s contract or to whether things actually work. Focus too much on the real world, and your feedback loop becomes a system-regression end-to-end test. Those tests call third-party APIs, real email gateways, real payment rails. They’re slow, flaky, and reliant on production-adjacent infrastructure. Your team stops running them, or worse, trusts them less than the unit tests they were supposed to replace.

My general guideline, which I’ll formalize in the table further down: interact with the real things that are part of the contract you’re verifying; simulate the things that aren’t yours and the things too costly to be real in a fast loop. Your router, validation, controller, service, repository, schema, and datastore should all be real. Third-party payment processors, the email gateway, and your cloud provider’s SDK should all be simulated one way or the other. A good question to ask when in doubt: if I personally broke this line of code in production, would the customer notice? If yes, that code should be tested. If the code is related to an external component that doesn’t define your business logic, it can be simulated under test.

The component integration test: validation of the HTTP boundary

The endpoint we’re really validating is POST /invoices/:id/payments. It looks up an invoice, applies a payment (with optional zone discount, per the new legislation), persists the result, and emits a downstream notification (a published event, a queued receipt, a webhook, carrier pigeon, your stack’s flavor) so other services can react. Three things we should assert as part of our test or specification:

Status code 201 with a PaymentProcessed response body containing the right amount and discount applied (API contract).
The invoice’s outstanding balance dropped by the expected amount (amount - discount) and a payment row was persisted (persistence side effect).
A notification of the payment was emitted to downstream consumers (notification side effect).

For the notification side effect, we use a test double. Mocks are a form of test doubles, but I propose a new, more feature-complete test double: the testable implementation. A testable implementation is similar to a test fake as defined by Martin Fowler, but more strictly defined. I suggest when writing tests to utilize testable implementations over a verify-style assertion on a mock. A testable implementation:

It always runs in memory. Generally, no I/O but can write/read to local files if needed, no external connections, no network.
It implements the production component’s API exactly, and replicates its observable behavior. (e.g. addFile will result in internal state that allows us to later confirm that a file was added to the abstraction)
It adds test-friendly helpers (findEvent(), entries(), clear()) that the production component doesn’t provide, so a test can query it directly while the system under test still interacts with it through the production interface.
It should be configurable to replicate error conditions (throwing errors, returning nil, null, empty, etc.).

The fake broker that collects published events, the fake mailbox that holds queued receipts — those are testable implementations of RabbitMQ, Kafka, or your SMTP gateway. Your test initializes them or queries them through their helpers; the system under test calls them through the production-shaped interface. Testable implementations enable asserting (1) on side effects and (2) on integration with third-party platforms that are not easily testable.

Testable implementations can be used in unit tests too, or even at the system-test layer when an unstable external service is causing havoc and you want a deterministic test. But component integration is the natural home for this testing “tool”.

A proper enumeration of test doubles (dummies, stubs, spies, mocks, fakes, and now testable implementations) is a topic in its own right. A future post in this series will cover it. Martin Fowler’s Test Double entry is the canonical reference if you want to go deeper. For this article we use the testable implementation to enable our feedback loops. Did we publish the expected data to this destination? It is not about writing tests that confirm RabbitMQ, Kafka, or AWS SQS are themselves working. If those are broken, we have bigger problems.

Notice what the test is not doing:

It’s not instantiating InvoiceController. The test never references the class. It POSTs to a route.
It’s not stubbing InvoiceService.applyPaymentTowards. The real service runs, the real math happens, the real persistence runs.
It’s not verifying internal method calls. No verify(invoiceService).applyPaymentTowards(...). The signature can change without breaking the test.

What it is doing:

Sending JSON through the real router, middleware, validation, and serialization, just like a client would do.
Verifying the response a real client would see.
Verifying the rows the database actually has after the request, just like the developer or AI agent might do when debugging.
Verifying the notification actually landed in a downstream channel. Queried through a testable implementation, not asserted via verify(mock).called(...). Something we may hear about from the downstream if this ever breaks.

Consider how these automated validations improve the quality and speed of the feedback loop.

The compliance scenario, revisited

The previous article described a real problem: tests passed even when the requirements changed. New legislation added zoneDiscount to applyPaymentTowards. The mock-based test was easy to “fix” (add the argument to the when(...) setup) and stayed green. Production didn’t.

The mock-based test caught nothing because updating the mock setup to match the new signature was a smaller change than actually using the new argument in production. The mock incentivizes the wrong refactor. The component-integration test can’t be “fixed” by tweaking setup. Its assertions are tied to customer-visible behavior, so you have to implement the behavior to make them pass.

Walk through the component-integration test with the same change:

Did the response include discountApplied? If the controller forgot to forward it, the assertion body.discount_applied == "0.05" fails. ✓
Did the balance reflect the discount? If the service ignored zoneDiscount, the balance would drop by the full 25.26 and refreshed.outstanding would be 74.74, not the 74.79 the assertion expects. The test fails loudly. ✓
Did the receipt go out for the payment? The fake collected one event with amount == 25.26. It isn’t catching the missed discount on its own, but it proves the side effect happened. ✓

The test isn’t coupled to the function signature, which can change. It’s coupled to the behavior the customer cares about: the right amount comes off the invoice, the right receipt gets sent, the right response comes back. Refactor the design to improve the code and the test stays valid. Break the behavior and the test fails. That is the ideal feedback loop.

Why is this the ideal feedback loop?

The previous article framed tests as feedback loops: a way to learn from failures, incorporate feedback, and validate the contract. Feedback loops come in all shapes and sizes. Ideally, you want to reduce the time of your feedback loop. Don’t we all want high-quality feedback, and as soon as possible? A test that takes 1 minute is one you skip during iteration and run before you push. A test that takes a few milliseconds to 200 milliseconds is one you run on every save. The difference matters more than the durations suggest, because the loops you don’t optimize rot. You stop running them. You stop trusting them. You stop adding new ones.

Component-integration tests fit in the tight loop in a way unit-with-mocks and full end-to-end don’t:

They run on save, not on push. A 200 ms test against a real web application and an isolated test database is fast enough to fire after every file change. You get nearly instant feedback on regressions on your local machine: before the context switch, not after a code review, not after a deploy.
Real feedback. A test bound to the controller class, or any implementation, will fail when that component is refactored even when the behavior is identical. False failures incentivize engineers to ignore tests. A test bound to the HTTP contract fails only when the contract is broken. That’s the feedback we want.
Automated specification of the API your customer expects. The customer doesn’t call InvoiceController.applyPayment(GinContext). The customer (whether a UI or an API client) POSTs to /invoices/123/payments. Coverage at the customer’s boundary is coverage of the contract you promised. Internal coverage metrics can show 100% while the API still has unverified edges: wrong status codes, missing fields, serializer bugs that escape line coverage entirely.

Taken together, these properties describe the feedback loop the previous article was reaching toward: the ability to quickly iterate, meaningful feedback to trust, and focus on your customer. This feedback loop reinforces your value proposition.

Why don’t we test the implementation?

Conventionally, developers and AI agents writing a test for InvoiceController tend to test it directly. Instantiate it. Mock its dependencies. Call its method. Assert. It passes! Call it a day. The previous article cautioned against this. To reiterate: the controller class is an implementation detail. The customer doesn’t know it exists. They don’t care that there’s a class named InvoiceController with a method named applyPayment. What they care about is that the right behavior happens when they POST to a URL.

When the test references the class directly, consider the potential impact to a test’s value proposition:

Refactoring overhead. Split the controller into two handlers? Tests break. Rename the method? Tests break. Inline a helper, extract a validator, replace one DI pattern with another? Tests break. Every internal refactor now costs you test-maintenance time. This is time you could have spent on real work, or on writing tests that actually catch regressions.
False confidence. The test passes when the class behaves the way the mocks expect. It says nothing about whether the request deserializes correctly, whether request validation is actually working, whether the response serializes the correct fields. The compliance scenario from the previous article is the canonical case. applyPaymentTowards got a new argument, the test “passed”, production broke.
Test decay. A test that breaks on every refactor stops being trusted. Engineers learn to “fix” it by updating mock setups until it goes green again. Eventually the test no longer asserts what it claims to. The green checkmark is present, but the assertions have drifted away from any meaningful contract. The test has rotted in place.

Testing the application functionality through its boundary (e.g., HTTP) avoids all three. The internals can change without the test caring. The test fails only when the customer-impacting behavior changes. When it fails, you know the behavior changed, not that someone moved a method around.

What’s the catch? The unit-test-the-controller-class approach was never actually testing the controller’s value to the customer. It was testing the developer’s ability to validate mock-defined behavior. A few years ago, I would have said the catch was ramp-up time and the additional cognitive overhead. As test-engineering principles have matured and AI-powered workflows have made setup cheaper, those reasons may feel like excuses. Tests in this style require a small investment of time and provide better feedback in return. The trade-off does depend on the application’s age. This is much easier on greenfield apps. But starting to think about applications in terms of inputs and outputs, and how to verify them in isolation, makes older codebases easier to understand too.

When to mock? When to not mock?

My rule of thumb:

Interface directly with the code you own. Simulate the things you don’t.

Layer	Default	Why
HTTP router & middleware	Real	This is part of what the test is checking
Request validation	Real	Bad-shape requests should fail at this layer; the test should see that
`InvoiceService.applyPaymentTowards`	Real	The whole point: exercise the business logic
Database	Test DB (real schema, isolated data per test)	Catches SQL mistakes, schema issues, fixed-point precision issues
Receipt service	Testable Implementation	Don’t want flaky network in tests; the in-memory stand-in lets you assert on what was sent
External payment processor	Testable Implementation	Same reason. Use a recorded stub if the test depends on a specific real-API response shape
Time / clocks	Inject a controllable clock	Otherwise tests fail at month-end or fiscal year boundaries

Notes

A note on test databases: in-memory data stores like SQLite or H2 are fast but in my experience tend to hide bugs: case sensitivity, JSON column behavior, fixed-point precision, and so on. If your production datastore is Postgres, the test that depends on Postgres semantics should run against Postgres. Just use Docker. A bit slower than a pure unit test, but much faster than a system test. And much closer to the real system at a fraction of the cost.

A second note, still on the database: isolation between runs is its own can of worms, and I’ll cover it in a dedicated follow-up. The short version: most ecosystems have a built-in primitive for this; use it. Throw it on your CI as a service container. GitHub Actions, GitLab CI, and CircleCI all support that pattern.

A third note, on test data setup: when you persist real rows, you also have to build real data. The examples above create a customer and invoice inline; both are trivial. Patterns exist for this at scale, such as factories: FactoryBot in Ruby, factory_boy in Python, mother-pattern helpers in Java. Out of scope for this article, but it’s a real shape of work in production codebases. Treat it as one.

A fourth note, on authentication: for this article, we assumed auth was already in place for the endpoints we’re testing. Authentication is a cross-cutting concern, and the auth service itself should be tested in isolation. What we should test at this layer is that our component integrates with that service correctly. The mechanics of how you substitute the identity provider are stack-specific. We’re not on the hook for the auth service’s internals, only for the integration boundary. 401 and 403 responses are part of the contract and worth verifying once per endpoint; exhaustive auth-edge-case coverage belongs in the auth service’s own suite.

Another example: a negative test scenario

Let’s say a payment exceeds the outstanding balance. In this scenario, we should return 422 and not change anything. There should be no side effects: no payment row, no balance change, no notification emitted. A unit test would struggle to express that third assertion (verify(receipts, never()).send(any()) is brittle and depends on mock call history). The component-integration test just queries the same testable implementation we used in the positive case, this time asserting it’s empty. The receipt assertion that wasn’t pulling weight in the positive walkthrough is doing the work here. It proves the side effect didn’t happen at all.

This is a business-rule issue. The literal contract was valid, but the expression of the business rule was not. There’s a second category worth thinking about: the expression of request validation, which is usually invoked before any business logic runs (a missing field, a wrong type, an amount with thirty decimal places). The shape of that test is the same: POST a bad body, assert a 400 with the validator’s error envelope, assert nothing in the database changed, assert the fake stayed empty. The “real router and real validation” promise from earlier pays off there.

Here’s a question I am asked routinely. Do we need to write this level of integration test if we have unit tests? My response is it depends. You can write unit tests, the best unit tests ever, 100% coverage, but if you haven’t validated they are integrated with the controller layer, what good are those great rock-star unit tests? I suggest starting with at least one negative case. This confirms the integration of these components. If you have relatively simple validation, a few more component-integration tests don’t hurt. This is also where professional judgement matters. If you have dozens of scenarios all testing business rules at this layer, apply professional judgement and move the edge cases down to unit tests.

Three assertions about what didn’t happen, expressed plainly. No “verify the mock was never called” gymnastics. Just an equality check against the collected state.

Cost and where this layer fits

Arguably, these tests ARE slower than unit tests. But consider the order of magnitude:

Test type	Per test
Pure unit	~1 ms
Component integration	5 ms - 200 ms
Browser e2e	5 s - 5 m

100 component-integration tests = 5–100 seconds in your suite. A 10,000-test end-to-end suite is a different beast entirely. The split that keeps a test suite valuable:

Cover happy paths at this layer: one per endpoint per major flow. Does the endpoint function?
Cover branch-y validation in unit tests: every edge case of “is this amount well-formed?” doesn’t need a router.
Cover multi-page workflows in e2e: the component-integration test can’t span a sign-up → email-verify → first-payment flow.

And two cases where I’d skip the component-integration layer entirely:

Pure compute: the function that calculates the discount percentage from a zone code. Unit-test it.
Cross-page workflow: “user clicks payment, sees confirmation, lands on receipt page.” That’s e2e.

What about a background worker that pulls jobs off a queue? It’s still a component-integration test, just with a different input and output. The queue payload is the contract; the side effects (DB writes, downstream notifications) are what you assert against.

Closing: a new story?

The previous article asked: What is your test’s value proposition?

For the test we ended that article on, the one mocking InvoiceService.applyPaymentTowards, the value proposition was unclear. It validated nothing the customer cared about. When the customer-facing contract changed, the test still passed. The contract didn’t. Developers were left putting out fires.

Functional validation is just a small fraction of building production software. Component-integration testing is also only one part of that story. But the component-integration test makes the value proposition explicit. Each test corresponds to customer-facing behavior: “if I provide a payment, I expect an acknowledgement, the right balance, and the right receipt.” Each failure point corresponds with a regression that would reach users.

That alignment is what the previous article was alluding to when it framed tests as a way of guaranteeing your value proposition. The component-integration test makes that guarantee operational: a validation process that’s fast enough to be in the inner loop, targeted enough at the API surface to cover what the customer actually pays for, and decoupled enough to survive internal refactors.

What’s next?

Future posts related to this subject matter will cover:

Contract-driven testing with Pact
When to use test doubles
Integrating these tests into your continuous integration pipeline
Building feedback loops that utilize agentic development
Part 3: How to introduce these sorts of feedback loops to your current team