Marketing Attribution: A Guided Map for Success
Originally a talk at PLGTM 2024, you'll learn about the small hills I'm willing to die on when it comes to marketing tracking and attribution.
If interested, find the recording of the talk below, initially prepared for PLGTM 2024.
Alright, let's dive into the world of marketing attribution - a topic that's often misunderstood but absolutely crucial for any marketer looking to make data-driven decisions. I recently gave a talk about this at PLGTM in San Francisco, and while the recording was lost into the ether, I'm excited to share these insights with you here.
One key thing to remember as we go through this: implementation is 80% of the battle. This follows the classic 80/20 rule - you'll get 80% of the value from 20% of the effort. So don't get bogged down trying to create the perfect attribution system from day one. Start simple, and iterate.
Why Attribution Matters
Picture this: you're looking at a graph of user signups over time. There are spikes, there are dips, but what's causing them? That's where attribution comes in. It's all about explaining your growth and identifying those magical levers that drive conversions.
Now, imagine that same graph, but with each spike color-coded to show its source. Suddenly, you can see that the big blue spike came from a webinar, while the brown spikes are from social media posts.
At its core, marketing attribution allows you to understand conversion behavior and double down on what's working. But before we get into the nitty-gritty, let's bust some myths and share some tips to make your attribution journey smoother.
Myth #1: Ad-blockers are the enemy
You might think that with the rise of ad-blockers and privacy concerns, tracking user behavior is a lost cause. But here's the truth: cookies are changing, but they're not dead. And ad-blockers? They're not as widely used as you might think – even by developers!
Let me share an experiment from our experience at Prefect. We compared our server-side signup data (the source of truth, in the database) with our client-side tracking data (in Amplitude). The result? 90.2% of the signups in the database were present in our front-end event-tracking tool (Amplitude). This means that - on average, 90.2% of user signups were still trackable via cookies, and not using ad-blockers. That's huge, especially when the understanding of the developer space is that ad-blockers run the world 🤯.
💡 Tip: Set up a reverse proxy to make your tracking requests first-party. This can help bypass some ad-blockers and give you more accurate data.
We implemented a reverse proxy at Prefect. Instead of your website communicating directly with third-party tracking services, requests are first sent to your own server, which then forwards them to the tracking service. This makes the requests appear as first-party, helping to bypass some ad-blockers and giving you more accurate data.
An important note: at Perpay we got to 85% trackable signups, and did not implement a reverse proxy, but this was also not a developer audience. Run a tracking test without it to assess importance for your industry.
Implementing Attribution: Step by Step
Now that we've dispelled that myth, let's talk about how to actually implement attribution. It's a three-step process:
Event Tracking
Identity Resolution
Attribution Logic
Let's break these down.
Step #1: Event Tracking
Event tracking is all about capturing user behavior on your website or app. You'll want to track things like page views, button clicks, and form submissions. Here are the key elements:
Initialization: Set up your tracking tool (we use Amplitude at Prefect)
User identification: Tie events to specific users when they log in or sign up. This is done through calling an
.identify()
method usually.Custom event emission: Track specific actions that matter to your business. Ensure page views are tracked automatically, and emit custom events (like button clicks) as you see fit.
Here's a basic example of how you might implement event tracking using Amplitude:
// Initialize event tracking SDK
amplitude.init(AMPLITUDE_API_KEY, {
defaultTracking: {
attribution: true, // track UTM parameters and referring domains
pageViews: true, // auto-emit page views
sessions: true, // track sessionId
formInteractions: false,
fileDownloads: false,
},
});
// Identify authenticated user
const identifyEvent = new amplitude.Identify();
amplitude.setUserId('user@amplitude.com');
amplitude.identify(identifyEvent);
// Emit custom tracking event
amplitude.track({
event_type: 'event type',
event_properties: { eventPropertyKey: 'event property value' },
groups: { 'orgId': '15' }
})
💡 Tip: Make sure to auto-track page views and UTM parameters/referrers. This will save you a lot of headaches down the line.
Step #2: Identity Resolution
This is where things get interesting. Identity resolution is all about tying multiple online identities back to one individual. It's crucial for understanding the full user journey across devices and sessions.
Imagine this scenario: a user browses your website on their phone during their commute, checks it out again on their work computer, and finally makes a purchase on their home laptop. That's three different browsers on three different devices, but it's all one user journey.
Identity resolution aims to connect these dots. Here's how it typically works:
Browser level: Each browser has a unique identifier, usually stored as a cookie. These can be reset and cleared by the user, though.
Device level: We try to connect multiple browser sessions on the same device by passing cookie IDs.
User level: Finally, we aim to tie all these interactions back to a single user, often through login events or other identifying actions. The user must voluntarily identify themselves to be picked up (like logging in).
Read more in depth on identity resolution here.
To make identity resolution as thorough as possible:
Choose a consistent user ID (UUID is often better than email)
Implement 'identify' calls whenever a user authenticates
Coordinate across domains if your product spans multiple sites
💡 Tip: Don't be conservative with your 'identify' calls. Add them anywhere a user might log in or sign up.
This gets particularly tricky when trying to identify across domains. For instance, Prefect’s website is at prefect.io while the application is at app.prefect.cloud. This means authentication happens separately (on a different domain, and thus a different cookie) from where page views particularly useful for attribution are tracked. We solve this by passing deviceIDs across domains. There are flaws here, but it gets the job done most of the time.
Here's a code snippet that demonstrates how you might implement cross-domain identity resolution:
// Identify authenticated user
const identifyEvent = new amplitude.Identify();
amplitude.setUserId('user@amplitude.com'); // Email or UserID?
amplitude.identify(identifyEvent);
// Cross-domain identity resolution: Add device ID as URL parameter
const addUrl = (event) => {
const deviceId = amplitude.getDeviceId() ?? ''
const { href = '' } = event.target
const url = new URL(href)
url.searchParams.append('deviceId', deviceId)
event.target.href = url.toString()
}
Myth #2: Anonymous data is enough
You might be tempted to rely solely on tools like Google Analytics, but here's the truth: they often don't have all the relevant data to track conversions accurately. Proper identity resolution is key to getting the full picture.
This is a small hill I will die on - and can go on for ages about why implementing a tracking tool like Amplitude is worth it, but I’ll leave you with this: it’s the most thorough way to segment your user base.
Step 3: Attribution Logic
Now we're getting to the heart of it. How do you actually attribute conversions to specific marketing channels?
Myth #3: You need complex data science models
Here's some good news: your first attribution model doesn't need to be a fancy marketing-mix model. Start simple with a first-touch, last-touch, or what I call a "ranked touch" model.
The ranked touch attribution model
A ranked touch model takes either the first or last event in a user's journey but prioritizes certain channels over others. For example, you might rank paid ads higher than organic social media traffic to truly understand how your ad money is performing. Here's how it works:
Assign a rank to each marketing channel (e.g., 1 for paid search, 2 for organic search, 3 for social media).
For each user conversion, look at all the touch points in their journey.
Choose either the first or last touch point (depending on your preference).
If there are multiple touch points with the same timing (first or last), choose the one with the highest rank.
Here's a simplified version of what the SQL for this might look like. The first section ranks all the events based on UTM/referrer; the next section flags the first even for a first-touch model; the last section filters the events to one event per user, and takes that attribution source.
with ranked_events as (
-- Assign Rank:
-- 1 = Paid social, paid search
-- 2 = Organic search, social, 1st party email
-- 3 = First party website lands
-- 4 = Everything else
select
user_id,
event_id,
event_time,
coalesce(
-- Rank 1: Checking ClickIDs, specific UTM medium/sources to catch paid traffic.
if(coalesce(gclid, fbclid, twclid, wbraid) is not null
or contains_substr(utm_source, 'sponsorship')
, 1, null),
-- Rank 2: Organic search and social. Checking UTMs and referrer in addition.
if(contains_substr(referrer, 'google') or contains_substr(utm_source, 'google')
or contains_substr(referrer, 'duckduckgo') or contains_substr(utm_source, 'duckduckgo')
-- Fill check other search libraries
or contains_substr(referrer, 'twitter') or contains_substr(utm_source, 'Twitter')
or contains_substr(referrer, 'linkedin') or contains_substr(utm_source, 'LinkedIn')
-- Fill check other social libraries
or contains_substr(utm_medium, 'email')
, 2, null),
-- Rank 3: First party lands (not captured referrer/UTM)
if(event_type like 'Page View: Website', 3, null),
-- Rank 4: Everything else
4
) as attribution_rank
from amplitude_events
),
highest_ranked_first_event as (
select
*
from (
select
*,
row_number() over (partition by user_id order by attribution_rank asc, event_time asc) as row_number
from ranked_events
)
order by user_id desc, row_number asc
)
select
hrfe.*,
case
when hrfe.attribution_rank = 1 and coalesce(gclid, wbraid) is not null then 'Paid: Search'
when hrfe.attribution_rank = 1 and coalesce(fbclid, twclid) is not null then 'Paid: Social'
when hrfe.attribution_rank = 1 and utm_source like '%sponsorship%' then 'Paid: Sponsorship'
when hrfe.attribution_rank = 1 then 'Paid: Other'
when hrfe.attribution_rank = 2 and referrer like '%google%' or utm_source like '%google%' then 'Organic: Search'
-- Fill in with other search platforms like Bing, etc
when hrfe.attribution_rank = 2 and referrer like '%linkedin%' or utm_source like '%LinkedIn%' then 'Organic: Social'
-- Fill in with other social platforms etc
when hrfe.attribution_rank = 2 then 'Organic: Other'
when hrfe.attribution_rank = 3 and page_url like '%/blog%' then 'First Party Web: Blog'
-- Fill in with other pages of interest like: features, pricing, etc
when hrfe.attribution_rank = 3 then 'First Party Web'
when hrfe.attribution_rank = 4 then 'Other'
else null end as attribution_name,
re.attribution_rank,
re.referrer
from highest_ranked_first_event as hrfe
left join ranked_events as re on re.event_id = hrfe.event_id
where hrfe.row_number = 1
This SQL snippet groups touch points by user and selects the highest-ranked channel (lowest number in our ranking system) as the attributed channel.
💡 Tip: Make sure your attribution logic reflects the channels you actually spend time and money on.
Reporting: Making Sense of the Data
Once you've got your attribution system up and running, it's time to turn that data into actionable insights.
Step #1: Identify Your Key Conversion Metric
Focus on the metrics that matter most to your business. They’re going to be different for different industries, sales motions, and so on. Consider the differences between:
User signups (for top-of-funnel PLG)
In-product activated users (for bottom-of-funnel PLG)
Sales leads (for sales-assisted PLG)
Then, look at conversion percentages to determine: how efficient are your top of funnel source from a revenue perspective? how efficient are your ad channels downstream? Consider:
Visitors to signups
Signups to leads
Cost per acquisition (CAC)
Each visual in reports must answer a question. Otherwise, the visual is not actionable. Reports using attribution answer the following questions:
Which sources bring the most top-of-funnel awareness?
—> spend carefully but strategicallyWhere do your most active users come from?
—> want product feedback? lean into thisWhich channels produce the highest ROI sales leads?
—> want money? spend as long as you have the budget
The answers to these questions should directly influence where you invest your marketing efforts.
Step #2: Test, Test, Test
Here's a crucial point: attribution is not a one-and-done deal. As your business evolves and new marketing channels emerge, you need to continually test and refine your attribution logic.
First, it starts with ensuring the events captured reflect reality. Better done in a staging environment, but sometimes only possible when released into production - pull up your event tracking tool in parallel to your site in an incognito window. When you view pages and click buttons, do the events in the tool represent your behavior?
Usually in the first version the answer is no - you might see events that fire when they shouldn’t fire, or events that don’t exist when they should. Validating the source data is correct is key to making sure you don’t have a garbage-in-garbage-out problem with your attribution model.
💡 Tip: Regularly test your tracking by setting up a web session and comparing it side-by-side with your event tracking data. You might be surprised by what you find!
Myth #4: Attribution is set-and-forget
Remember, your attribution system needs ongoing attention and refinement. It's not something you can set up once and forget about.
Final thoughts
Attribution might seem daunting at first, but it's an invaluable tool for any marketer looking to make data-driven decisions. By dispelling common myths, following best practices, and continuously refining your approach, you'll be well on your way to understanding and optimizing your marketing efforts.
Remember, the goal isn't perfection from day one. Start simple, learn from your data, and iterate. Before you know it, you'll be making informed decisions that drive real growth for your business.
If you want to dive deeper into this topic or have any questions, feel free to reach out. You can find me on LinkedIn. And if you're looking for a tool to help deploy and observe your Python code (that may or may not implement attribution), give Prefect a try!
Really useful overview, thanks for sharing