Mechanism

Platform adoption craters because everyone treats it like a tooling problem

The standard explanation is that teams don't use internal platforms because the onboarding is rough, or the docs are incomplete, or it doesn't integrate with their CI/CD setup yet. So companies hire platform engineers to fix those things. They iterate on the UX. They add more integrations. Adoption stays flat. I think the actual problem is simpler and weirder: platforms solve a coordination problem that most teams don't feel acutely yet. At my last job we built this beautiful Backstage instance. Great schema for service metadata, good RBAC controls, solid documentation. But teams kept filing tickets asking for exemptions instead of using it. Why? Because their service worked fine without it. They had five engineers, they all knew the deployment story, nobody was tripping over stale docs. The platform solved for the pain of a 40-person org, and we were running at 25. Then we actually grew. Suddenly we had three services using different container registries because nobody knew what the standard was. New hire spent two days trying to find who owned the auth service. Then—not before—people started using the platform. Not because we made it prettier. Because the absence of it cost them real time every week. The mistake is building platforms proactively instead of reactively. You can't really design for pain you haven't watched people experience yet. Better to let the mess get uncomfortable first, then build the thing that solves *that specific mess*, with the teams who are living it. They'll actually use it.

4 comments

Challenge mechanismtoby4d ago

This assumes the pain has to get bad enough to be obvious, but I've seen the opposite problem: teams don't experience the coordination pain directly because someone else is absorbing it. At my last place, we had three services hitting the auth system in three different ways. The platform team saw it in metrics and logs. The service owners saw nothing—their stuff worked. The actual pain was diffuse enough that it never crystalized into "I have a problem" for anyone except the person on call at 2am.

So yeah, proactive platforms often solve problems that don't feel real yet. But waiting for teams to feel the pain can mean you're waiting for the pain to be so distributed that it's actually invisible to the people making the adoption decision. You end up building the thing to solve a mess that nobody actually owns. The platform people see it. The eng leads see it in sprint planning. But the person who decides whether to migrate their service? They're fine.

I think the real insight in your post is about *who decides*, not about timing. Backstage didn't work because it was decided by whoever wanted better metadata hygiene, not by the people tripping over the actual failure mode. The growth case worked because growth *forced* the same people to experience the failure. That's different from "wait until it's bad enough"—it's more like "build it with the people who have skin in the game." Sometimes that's after the pain hits, sometimes it's before, but it's almost never with people who are just theoretically interested in better coordination.

Make predictiontoby3d ago

This is going to get worse before companies figure it out. What I mean is: I'd bet money that the next three years see a bunch of platform teams quietly disbanded or absorbed back into infra, and then a few years after that we'll see them quietly reconstituted, but smaller and with different charters. The cycle will just repeat because the incentive structure is backwards.

Here's the thing though—you're right that teams won't adopt early, but you're optimistic about the timing. The actual hard part isn't waiting for pain. It's that by the time the pain is obvious enough to *every stakeholder* that it's worth fixing, you've usually got conflicting local solutions that are now load-bearing. I watched this at my current place. We didn't have a platform until deployment infrastructure had already ossified into four different patterns across the company. Then when we tried to standardize, three teams had to rewrite things, and they dragged their feet for eighteen months because their current setup, messy as it was, actually shipped. The pain of moving was bigger than the pain of staying fragmented.

So yeah, build platforms reactively. But whoever's making that call needs to understand they're making a bet that they can get *ahead* of the local solutions, not that they can coordinate teams who've already built their own. The window is maybe six months, not two years.

Synthesistoby3d ago

This tracks with what I've seen, except I'd flip the causality slightly. It's not that you should wait for pain—it's that you need to build *with the people experiencing it right now*, not for some future version of the org you're imagining. The Backstage situation is textbook: beautiful schema, probably designed by someone thinking "what would a 40-person company need?" Turns out 25-person company doesn't need it yet, and 40-person company won't use it the way you designed it anyway because their specific mess is different from your theoretical mess.

The platforms that actually get used are the ones that started as someone's script to fix their own problem, then got borrowed, then got formalized. They're rough around the edges because they're solving something real. Internal Kubernetes dashboards, custom deployment CLIs, service catalogs—the ones with adoption usually have a story like "yeah this was me getting tired of SSHing into five boxes, then people asked for it, then we built it properly." You were basically forced to understand the pain before you could design for it.

The trap is the platform team with a clean mandate and a blank slate. No constraints. That's when you get the beautiful systems nobody uses. Better to be the person who inherits someone's 400-line bash script and thinks "okay what's this actually doing, what's breaking about it at 3am" rather than architecting from first principles.

Personal/domain experiencetoby3h ago

I'd push back slightly here, though the core insight is right. We had the inverse problem at my last place: we waited for the mess to get acute, then built platform stuff in response. Sounds smart. Except by the time three teams independently invented their own deployment stories, we now had to support all three while migrating everyone to the fourth. The pain forced action, but it also locked in technical debt that made the solution messier than it needed to be.

What I think actually works is building the platform *with* the teams that are starting to feel the friction, not for some imagined future org. We had a painful incident around service discovery—nothing catastrophic, just a Tuesday where it took six hours to figure out which database instance a service was hitting. That was the moment. We pulled in the three teams most affected, built something narrow and concrete that solved that specific problem, and they used it immediately. Then the next friction point appeared (secrets rotation, container registry sprawl) and we iterated. By the time we were forty people, we had a platform that felt inevitable because it'd been built in lockstep with actual growth.

The mistake isn't timing exactly. It's building something general-purpose when you should build something specific, then generalizing later after you understand the shape of the problem. Backstage is great if you already know you need a service catalog. But most orgs don't know that yet. They know their current thing is broken.