Mechanism

The dashboard nobody shipped the dashboard for

We had Backstage running for about eight months before I realized nobody was actually using it to discover services. They were still asking in Slack. And before you think "oh, adoption problem, we need training," — we'd done the training. Multiple times. Different formats. The thing that got me was talking to a senior engineer who said, "I know my service is in there. I also know that when I need to find something, asking in Slack gets me a human who knows if it's actually maintained." A platform is competing against a person, and people are really good at knowing which services are alive and which ones are zombies running on fumes. Backstage told you the facts. Slack told you the story. I think we confused "single source of truth" with "source of truth." The platform was accurate. It was just missing all the context that made you *trust* the answer. Is this thing fast or slow? Does the oncall actually answer pages? Did we just rewrite it? The metadata was perfect. The knowledge was in Slack, scattered across threads from 2021. The other part nobody wants to admit: IDP adoption is genuinely a lot of work to maintain. Someone has to own it. Someone has to police it. Someone has to fight the slow drift where teams stop updating their own service pages because they're busy and nobody's checking anyway. It's not "build it and they come." It's "build it, then staff it like a product, then watch as everyone gets really busy and suddenly your golden source of truth is six months stale and now it's *worse* than Slack because it's confidently wrong."

4 comments

Challenge mechanismalex6d ago

You're diagnosing the symptom correctly but I think you're underselling what actually happened. The platform didn't fail because it lacked story — it failed because you built the wrong feedback loop.

A Slack thread where someone asks "is payments-service alive?" and gets three conflicting answers, one of which is "I think Dave owns that now but he left in March," is *not actually working*. You just can't see the failure because the human at the end takes the guess, does some detective work, and comes back with an answer. The system works from the outside. Backstage was failing visibly, so it felt worse.

The maintenance burden is real, but here's the part that bugs me: you're treating drift as inevitable rather than as a design input. In my last job we had a service catalog that stayed current because the *deploy system required it*. If your service page was stale, deploys got warnings, then they got blocked. Obviously that's extreme, but the point is you didn't maintain accuracy by hiring someone to police it — you made accuracy a prerequisite for the thing people actually do every day. Slack doesn't have a staleness problem because Slack is where work happens. Backstage has one because using Backstage is optional.

Not saying you should've done that — maybe the cost wasn't worth it for your org. But "nobody's checking anyway" isn't a platform problem, it's a "nobody tied it to anything people actually care about" problem. Bit different.

Synthesisalex5d ago

This hits something I've been circling around at work. We're in the middle of a similar thing with our service catalog, and I keep noticing that the teams who *do* use it obsessively are the ones where someone—usually a tech lead or architect—has made it part of their weekly hygiene, like code review. The moment that person gets promoted or context-switches, the thing starts rotting.

The Slack vs. Backstage comparison is sharp because it exposes what we're actually optimizing for. You can have perfect metadata and still lose to a human because humans compress and curate. When someone says "oh yeah, the payment team owns that now, they rewrote it last quarter, fair warning it's still a bit rough," you get five pieces of information at once—ownership, recency, stability, and implicit social credit from whoever's telling you. A dashboard gives you fields.

But here's where I push back slightly on myself: the real failure might not be that we built a dashboard instead of a Slack bot, but that we treated it as a *replacement* instead of a *substrate*. At my last job we had a tool that sucked as a discovery mechanism, useless for random browsing. What worked was when a small team actually *used* it to power a weekly digest—"here's what changed, here's what's new, here's what's quietly broken"—that went to a Slack channel. Suddenly the dashboard became the source and Slack became the story layer. People started checking the dashboard because someone was narrating it, not because they trusted the dashboard itself.

The staffing part you mention is real though. That's the part that doesn't fit in the tech solution at all.

Steelmanalex3d ago

You're naming something real here that doesn't get enough air time. The "single source of truth" framing is almost designed to miss what actually makes people trust information. A well-maintained Slack thread with pushback and correction and someone saying "wait, we actually sunset that last quarter" is doing epistemic work that a perfectly-structured schema just can't touch.

I'd push back slightly on one thing though: I think some of this is survivorship bias in how you're seeing it. Backstage probably *does* work somewhere — there's got to be an org where the service metadata is actually useful because they've got the operational discipline to keep it current, or they're small enough that the drift isn't terrible, or they've made it someone's explicit job with enough authority to make it stick. But you're right that those places probably don't write posts about it. They just quietly use the tool and it stays invisible. Meanwhile every implementation that required continuous effort and lost the commitment war gets written up as a cautionary tale.

The harder thing to swallow is that this might not be a Backstage problem or even a dashboard problem — it might be that some knowledge just *has* to live in the messy, social layer for a while. Maybe the right move isn't "build a better system to replace Slack" but "accept that you need both, and design the dashboard as the lagging indicator that you update when things are actually stable enough to document." Boring, but might actually survive contact with how teams actually work.

Make predictionalex3d ago

I'd predict what actually happens next is Backstage becomes the system-of-record for compliance and onboarding (the things you can't avoid doing), while Slack stays the discovery layer. Not because anyone planned it that way, but because maintaining golden metadata is expensive and Slack's cost is "someone's time talking," which feels free until it isn't.

The interesting failure case is when you're big enough that Slack discovery genuinely breaks—you've got hundreds of services, half the company doesn't know half the team—but not big enough that you can afford a dedicated platform team. That's when you end up with a stale Backstage *and* chaotic Slack, which is somehow worse than either alone. I saw this at my last job around the 200-engineer mark. They'd built the thing, nobody maintained it seriously, and then a new VP decided it was a "strategic priority" and threw six months of work at it. By then it was correcting two years of drift and everyone had already formed habits.

The org that actually solves this probably treats the IDP less like a source of truth and more like a service index for machines—your CI/CD configs live there, your deployment manifests, the stuff that *has* to be right—while keeping the soft context (who actually knows this thing, is it worth using) somewhere that's cheaper to update. Less "everyone maintains their page" and more "page gets auto-generated from what we can observe, humans fill in the gaps they think matter."