Modernity != Value
Having a modern data stack doesn't mean an analytics team will actually provide value to the company—it takes more than just adopting shiny tools.
I really dislike the term “Modern Data Stack”. This focus on modernity has taken us further away from the reason behind using certain tools in the first place.
Adopting a particular tool doesn’t mean it’s used in the intended way. You can use Census/Hightouch to write raw data into a Google Sheet or to some other Postgres database for stakeholders to manipulate themselves (read: not a great approach). Having multiple dbt repos across a 1000+ person organization will likely result in duplicate models even though there are 3 separate teams of 10 analytics engineers being paid to model data. Oh and metrics—just because a metrics layer exists doesn’t mean the metric definitions are agreed upon and communicated across the entire company.
It doesn’t matter what tools we use—what matters is how analytics teams can influence the trajectory of their respective organizations. Some tools considered to be part of the modern data stack make delivering this value easier, sure, but aren’t the end-all be-all.
So how do analytics teams actually drive change at an organizational level?
Back to first principles: delivering value
Analytics teams aren’t there to build data platforms. Sure, it’s what they do, but it’s not their reason for being. Outside of data professionals, I fundamentally don’t think other teams should care what tools are used. Why would a sales professional care if they’re getting Tableau or Looker dashboards emailed to them?
Not caring does not mean lack of curiosity. Should data teams educate other teams how they work? Sure. Can those other teams function just the same without this information? I’d argue yes.
Taking a snippet from MongoDB’s definition, a data platform “enables the acquisition, storage, preparation, delivery, and governance of your data, as well as a security layer for users and applications” and is the “key to unlocking the value of your data”. Data platforms are built by and serve analytics teams, just like the team itself serves the rest of the organization.
Analytics teams exist to allow and advise the organization in making data driven decisions, ultimately leading to improvements in high level metrics.
Data teams that are focused too much on tooling forget how they actually bring value. Spoiler alert: it’s not having the most modern tools. If newer tools help deliver insights faster and better, that’s a win—if it just looks nice to use, again, that’s not where value comes from. Analytics teams deliver value by advising other teams to change behavior that is expected to make them more efficient.
The team at Continual did an excellent job of summarizing how the tools considered to be part of the modern data stack came to be—namely, previous tooling took too much effort to maintain. This does not mean, however, that we need to over complicate these new stacks without understanding how over complication actually influences top line KPIs. We need to constantly be mindful of the origination story: use tools that make the task at hand simpler, faster, or possible. What every other tool on top brings to the organization is of fractional value.
The minimum viable data stack
A problem definition isn’t a description of a solution.
Not a problem: “We don’t have a metrics layer”
A real problem: “Different departments can’t agree on definitions like what an active user is and they spend far too much time trying to get on the same page”
Similarly, evaluating what data stack is the right one for a particular organization isn’t defining the tools first.
Not a good starting point: We definitely need ETL, warehousing, dbt, reverse ETL, oh and of course streaming.
A better starting point: We need all data from third-party marketing apps in one place so we can calculate cost per signup and make campaigns more efficient.
I subscribe to the 80-20 rule. In the above marketing use case, an over complication would be going so far as to set up audiences in the marketing tools using reverse ETL to optimize campaign efficiency. This is an optimization, and likely a good one for a more mature analytics organization; but will it result in the largest drop in cost per signup with respect to the time put in to implementation? According to the rule, no. An alternative first step is to simply report on the worst-performing campaigns and move their budget into the best performing ones.
The set of tools that are easiest to set up and maintain while resulting in the majority of the value I consider to be the minimum viable data stack.
It’s not going to be the same for every company—it depends on the team, the budget, and the product to just name a few variables. Larger team with a smaller budget? Build. Smaller team with a larger budget? Pay for as much as you can. Product that delivers real-time reporting to end customers? Welp, streaming and building for scale might just be unavoidable.
When it comes to the Modern Data Stack narrative, we’ve over indexed on modernity of tooling as opposed to the problems analytics teams face in the first place. There are many large, successful companies that don’t use any of these tools (particularly finance companies).
Should we all build companies modeled after finance institutions? Not exactly.
Criteria for evaluating tools
I’ll leave you with this: criteria for evaluating if a tool fits in the minimum viable data stack paradigm.
Problem to value: Can the problem the tool solves be tied directly to value? Value can be evaluated by comparing top line metrics like increased conversions, revenue, lifetime value, etc versus cost. If the problem isn’t tied to a metric or the cost is higher than the gain, ask a different question to reframe the real problem. For instance, reverse ETL doesn’t bring value if the data being pushed isn’t used to influence performance.
Maintenance: Can the tool be realistically maintained by the existing team while keeping value constant? Code-first tools specifically require a particular technical acumen. If the talent is already stretched to onboard the tool, a simpler solution will decrease that time to value.
Scaling features: Is outgrowing the tool already in sight? While over-generalization can take up a lot of time and is over-compensation for this, make sure all existing use cases are covered with the tool choice.
Thanks for reading! I talk all things data and startups so please reach out. I’d love to hear from you.