Reinventing the Wheel of Data Activation
Data activation means so much more outside of the data team.
In our data bubble, we’ve been arguing about operational analytics, data activation, and warehouse-native apps. Let me summarize the conversation for you: after data gets to the warehouse, how should it get to its next destination? Some say it’s the warehouse vendor’s problem—Snowflake has stepped in here with their app development. Others say vertical vendors should step in like Supergrain, Pocus, or others to pull directly from the warehouse.
Different viewpoints on these topics impact answers to the underlying questions:
How will Snowflake/GCP/AWS continue to build on the apps framework to benefit the jobs of data teams?
Will tools like Census/Hightouch become redundant or get bought out?
To boil it down, all we are really doing is moving data around. That’s all we’re ever doing. Fivetran? Moving data. dbt? Moving data1. Difficulty in moving data increases as there’s just more data to move and more ways to use it once it gets to the destination.
Activating data means allowing some tool to use data natively. Teams have been doing this for years whether we acknowledge it or not.
Data activation “bells and whistles”
Heard of Zapier, Make (formerly Integromat), or Workato? They are in the business of activating data, even data that bypasses the warehouse. Any pipeline that doesn’t touch a database or warehouse can make us data people cringe—I’ll touch on this a bit later.
First, let’s start with the basics. Both Zapier and Make have existed since 2011 and 2012, respectively. Zapier’s story is an interesting one, allowing all users to trigger data passing through APIs from one app to another in an intuitive platform. People latched on to this capability leading to their early profitability.
During my time at Perpay, I led the data engineering team through building out the data integration from the internal warehouse to Iterable, a cross-channel marketing platform. First the integration was built in house, then transferred into Census to not maintain ever changing API requirements. It never crossed my mind to use Zapier.
If the source was PostgreSQL, it would have certainly been possible2. Browsing Zapier’s integrations now, a new row in a source database can create or update a user’s data in Iterable. That’s basically the same thing that Census and Hightouch do, right?
In this particular case, the answer is fundamentally, yes: they both are moving data from a database to an operational tool (Iterable). Census and Hightouch have both positioned themselves as vertical applications beloved by data teams. The supplementary features data teams care about like testing, version control, and API-enabled workflows have made them more attractive than Zapier.
Before anyone thinks I’m recommending companies use Zapier for everything, hold your horses. While the theoretical functionality Zapier offers is great, you know what they say—only in theory are practice and theory the same. Data teams value software engineering principles that enable them to build scalable systems. The key features of testing, version control, error monitoring, etc simply don’t exist in Zapier.
Building scalable systems is hard no matter the operational tool used, with extra functionality for scale being more than just bells and whistles: it’s critical. Now back to my original questions surrounding the transformation of the automation space.
Integration bridges will always exist
Zapier would be defunct if applications just talked to each other, just like Census and Hightouch would be defunct if applications just pulled directly from the warehouse. While some may say this is the future, there will always be applications that aren’t warehouse native.
A bridge between the warehouse and these applications will need to exist—maybe as its own product, or as a portion of a larger suite of products. Either way, it needs to exist.
As a child, every summer my family went to Cape Cod and traveled over Bourne Bridge to get there. There’s traffic on almost every bridge I can think of, and in this case it was a necessary evil to get to the best beaches in the Northeast3.
To get back to my original questions:
Snowflake’s releasing new apps, huh?
Will data activation/reverse ETL/[your name of choice here] become redundant?
Snowflake’s expanding set of tools and warehouse-native apps are making the claim that integrations aren’t a necessary evil. The traffic in this analogy is more potential points of failure introduced when moving data through yet another tool.
Across teams there will always be more than one tool used—someone needs to integrate with it in a scalable way where complexity and errors don’t grow exponentially. Even if we reduce the number of bridges, there will always be water and some will have to exist. And I will continue loving Cape Cod.
Thanks for reading! I talk all things data and startups so please reach out. I’d love to hear from you.
And also transforming it, but moving data between tables nonetheless.
Use your imaginations that Zapier could easily build a comparable integration to Snowflake/Redshift/BigQuery if they wanted to.
Really, this is what you’re contending with?
During my time at Integromat (Make), I have seen so many great use cases of Data Activation. And believe it or not, the most common sources have always been database tools -- from the likes of Google sheets and Airtable to Redshift, BQ, and now Snowflake (Make supports them all). Also, FWIW, a lot of the testing and visibility features offered by rETL tools have long been offered by Make -- unfortunately they're not so obvious to early users.
I've always told people that Integromat is first a data tool and then an integration tool. So yeah, I agree that a LOT of rETL use cases can be taken care of by tools like Make (and Zapier to an extent) -- they (Make etc.) have just missed the opportunity to cater to the needs of data teams.
Definitely interesting to highlight that what the current rETL tools do is very similar to what the IaaS tools have offered for a long time. I've recently used Workato to send data from the warehouse to SFDC and it's pretty great in that regard, full version control (including versioning in git), good user management, etc.
There's definitely a lot the rETL tools can learn from IaaS, and it's clear existing IaaS tools have missed a major marketing trick in jumping into this category as well.