What Substack Analytics Engineers Must Be Thinking
Questions Substack's writer dashboard raised from the perspective of a curious data professional.
Believe it or not, this post has actually been in my drafts for weeks (before the double-view-counting mishap). The event doesn’t necessarily come as a surprise if looking a bit deeper.
It’s probably ironic to write a post about my Substack onboarding experience on Substack, but I just couldn’t resist. Relating back into the world of analytics, I constantly question whether providing a definition for a dataset or metric results in more clarity or more confusion. I wonder if the analytics engineers at Substack wondered the same.
Let me paint a picture for you.
Shortly after launching a post, I want to see how it was performing. After all, enabling other people to make data driven decisions is my motivation for basically everything I do professionally.
For those unfamiliar, here’s an example taken from Substack’s explanation of what post metrics mean. I’m no stranger to marketing attribution, but still couldn’t help but ask myself: What’s a view? What’s a reader?
And the biggest question of them all: do dashboards exist that don’t result in an onset of a series of questions and if so, what qualities might distinguish them?
The purpose of dashboards
Substack’s writer dashboard is there to help writers understand “performance”. I put this in quotes because, well, what does performance really mean? For someone operating a paid newsletter, it probably has something to do with which posts draw more revenue. For someone building an audience for free, they’re probably just focused on reach and engagement.
In short, dashboards exist for the purpose of quickly and concisely answering questions.
There are so many questions. Many good questions, and many, many more that aren’t worth answering. Of course, there’s a reason those questions are being asked. They’re supposed to drive action or a change in behavior. This is data-driven decision making.
I maintain my stance that dashboards are only there to answer questions. They enable decisions, but human intervention is needed to actually implement the decision and act on it. This isn’t unique to any particular organization, internal or external dashboards. Dashboards alone can’t enact change—they trigger it by communicating recommendations to those in the power to act.
The movement to operationalize analytics is a strong one; a lot has been written about reverse ETL by Census, Hightouch, and others. However, many businesses and individuals are still catching up and don’t have many operational tools. They don’t know what actions they want to take, or even what actions are possible.
Understanding must come before action. Understanding also helps us ask the right questions.
Understanding by itself isn’t enough, but it’s a necessary first step. When I go to Substack’s writer dashboard, I’m there to know how many people might care about what I’m writing and how I can engage them further—for instance, figuring out what type of content they’re interested in. I’m sure my understanding of what’s possible in the creator world will grow and I’ll have more questions, or deem some of my previous questions largely irrelevant.
They say, the more you travel the more you realize how much of the world you have yet to see. Similarly, the writer dashboard gives me numbers but leaves me questioning how those numbers are being calculated.
The purpose of definitions
You’d think no one can really argue what a “reader” or a “view” is at a high level. It’s a human with a heart and brain who has read your post. The bottom line is that having readers is a good thing, and you’d want that number to get larger over time.
Obviously, the numbers shouldn’t be artificially large or double counted. One view should correspond to one human opening the post.
However, if someone opened a post in a new tab, but then never actually navigated to that tab and never read the post, is that considered a reader and a view? At the end of the day, it doesn’t really matter. Maybe they are, maybe they aren’t. I’m asking the question because I want to make sure that my interests are aligned with the message the dashboard is giving me on engagement, as opposed to patting me on the back where in reality I can do better.
The biggest problem with Substack’s double-view-counting mishap is these intentions weren’t aligned. The writer’s understanding of views was far from reality.
Definitions exist to get people on the same page but the details are largely arbitrary.
It’s not uncommon (unfortunately) for numbers to slightly differ between teams at a company with thousands of employees. I’ve been in the room when two teams spent hours discussing whether the week should be defined to start on a Sunday or a Monday for the purposes of reporting week-over-week performance. I’d argue the most important outcome of this discussion is not when the week starts, but being sure the two teams just agree on a day so we don’t deliver two conflicting numbers to a C-suite executive. And agree that the new forecast represents reality.
Similarly, I’d like to align my understanding of a view, reader, and subscriber to Substack’s definitions. But of course, those definitions can’t be objectively misrepresentations.
A quick tangent on attribution lingo: why we’re perpetually confused
Web analytics is confusing because it’s near impossible to equate a human person to a single series of web visits.
Facebook and Google have both written at length about what clicks and unique visits might mean in the context of paid ads. In short, if one person on one device (say, the Chrome web browser on their laptop) clicks on a search ad 5 times, that’s 5 clicks but one unique click. If that person were to go in an incognito window on their phone and not be logged in anywhere, they could click the ad again and be considered a completely different click. Same heart, same brain, different device and browser. This issue is largely unavoidable without going deep into the hole of advertising privacy rights, as Apple has taken a stance on previously.
Consistency and understanding are key to interpreting metrics with the right context.
Confusion comes from delivering numbers relating to concepts with loose definitions, implying they’re exact. Substack has several posts about interpreting post metrics and dashboard metrics. And I honestly don’t have an issue with their goal, because they try to make web analytics as digestible as possible. If you know web analytics, you know they have to be taken with a grain of salt. Is that a trap?
The most explicit definition I could find (across views, subscriptions, signups, opens, clicks, etc) was for free signups, defined as “the number of people who subscribed to your email list from that post.” Are unsubscribes excluded from this number? If every single person that subscribed from a post then unsubscribed, I could think my post was doing great when really its performance was atrocious?
I know, I know, this could be an extreme. But if there’s anything I’ve learned from the analytics profession it’s that edge cases are rarely actually edge cases.
All definitions aren’t perfect, but we can’t live in a state of perpetual confusion.
Did curiosity really kill the cat
I rarely find myself answering a question directly. In that vein, yes and no.
I’m asking a lot of questions of newsletter metrics because I am genuinely curious, but also was recently validated that asking questions may be what exposed an underlying issue. Short of that, I shouldn’t let this block me from actually gaining insight from the dashboard. I can see a rough idea of how Twitter versus LinkedIn perform in terms of getting readers and subscribers within the context of a single post. I can also see whether posts drive subscribers or not. And maybe I’m asking questions for the sake of writing about them. In this case, curiosity is simply that: curiosity. In reality, the risk of going too far down this route is analysis paralysis. Make sure to distinguish between questions for the sake of human curiosity and questions that influence understanding or expose an issue.
Analytics teams should always strive to create dashboards that are either standalone or include links to provide the relevant context. Context is also important to get alignment on expectations. Curiosity saves the day when it prevents business users from misinterpreting results or analytics teams from misreporting them.
Thanks for reading! I love talking data stacks. Shoot me a message.
Unsubscribes are not taking into account when counting free signups. So if four people enter their email and two of them unsubscribe, the post activity will count four free signups.