Guide to Anonymous Identity Resolution
Fully personalized user journeys can't exist without correlating anonymous source events across applications to a single authenticated user.
This article would not be possible without learning from Nicholas Brown, Jonathan Yu, and other fantastic engineers at Prefect while building our own internal identity resolution system with Amplitude for user attribution. I write this from a mixed perspective of a marketing, engineering, architecture design, and general analytics consumption.
Web applications usually live across subdomains (app.mycompany.io versus login.mycompany.io) or even across completely different domains (mycompany.io vs mycompany.com). In theory, the user experience seamlessly transitions across the applications and sites with links and consistent branding. Not so hard to create a slick experience for the user yet, is it?
Not so fast. What about triaging a support ticket for crashing documentation pages, or customizing user onboarding based on how they heard about your company? In both of these cases, you need to map unauthenticated data (documentation and website visits) to a specific user with an email and login.
It gets even more blurry. Consider this: you built your product on app.mycompany.io, but finally got ownership of mycompany.com so the marketing site lives there. The documentation also lives where the marketing site is. The support ticketing system is completely separate and only requires a user’s email, not their potentially different username for the product login. Naturally, you don’t require users login to view documentation or the website.
Core problem: personalizing user experience for efficient communication and support.
Derived problem: having complete marketing and product data.
Solution: identity resolution between anonymous and authenticated users.
Steps of identity resolution
First, whose identities are we resolving?
Definition: An anonymous user is a web or app visitor that cannot be tied to self-identifying information. They haven’t provided it and it also cannot be logically deduced.
Definition: An authenticated user is created from an anonymous user after they give information that ties their online session to personal information like an email, phone number, username, etc. A visitor can also be authenticated through deduced information, but that’s usually after being already authenticated elsewhere.
Identity resolution comes down to collecting anonymous visitor data, tying each visitor to an authenticated user (when one exists), and de-duplicating authenticated accounts that correspond to the same person.
Diving in a bit deeper into each of these steps:
Implement events on each application.
Implement a tracking plan that satisfies the most important questions at hand like: where are signups coming from? or how many visitors come to the website?. Fire relevant events like:Page View
,Button Click
, orForm Submit
with properties to designate between pages and buttons with URLs and titles.Identify real people as authenticated users across applications and domains.
Create anonymous user or session identifiers using cookies and URL parameters. Ensure a first time anonymous visitor to any of the company’s owned domains can be correlated to a signed-up user in the first application they encounter. In this first encounter, they must identify themselves through a piece of personal information.Coordinate authenticated user data across sources.
Identifiers across sources could differ from userID, email, phone numbers, username, and likely more. This last step is about mapping identifiers to one, universal user identifier that can be used across platforms.
By the time data teams are involved, we’re at step 3—identity resolution conversations with analytics teams focus on data that is authenticated. Specifically, user data that comes from systems where a user’s email, username, or userID is known (think: in-app, support systems, etc).
Event-generation is where identity resolution starts (naturally, we start at step 1). Product analytics tools are implemented to track page views, button clicks, or any custom event from mobile apps, web apps, web pages, whatever. Tracking plans should be designed across product, engineering, and analytics teams by gathering thorough requirements.
Event tracking implementations across different applications and domains need to talk to each other. Without this, warehouse-native identity resolution simply doesn’t matter. The crux of the solution, then, is figuring out step 2: identify real people as authenticated users across applications and domains.
Correlating anonymous and authenticated users
👉 The meat of this article is here with implementation details for the anonymous to authenticated user journey.
Consider this common situation: visit mycompany.com, click “login”, get redirected to app.mycompany.io. This simple log-in would have events from two different sessions, one on each domain. Correlating them involves using a few techniques in tandem.
For the purposes of illustration, the code snippets below are for implementing event tracking with Amplitude. There are other event tracking tools which work similarly—features specific to Amplitude will be noted.
Cookies
Event tracking works by instantiating a cookie when a website or application loads.
amplitude.init(API_KEY)
Once a cookie exists, all events will be consolidated under one identifier for a visitor on the same device, representing the application and cookie instance. In Amplitude’s case, this is stored as the deviceID
. If implemented custom and in-house, consider the deviceID
(naming of the variable doesn’t matter and is used for clarity) to be an identifier for the cookie being set which can be passed as an identifier into the event tracking tool. If cookies are cleared, the identifier is reset as it is stored within the browser.
All events tracked (like below) with the cookie present are then consolidated under the deviceID
.
amplitude.track('Page View: Website');
Device URL parameters
The deviceID
is constant for a cookie, but different cookies will exist across domains (one for mycompany.com where the marketing site might live and another for mycompany.io where the application might live). The deviceID
must then be passed across domains.
One such way to do this is appending the deviceID
to all URLs as a parameter that lead visitors across the different domains.
const addUrl = (event) => {
const deviceId = amplitude.getDeviceId() ?? ''
const { href = '' } = event.target
const url = new URL(href)
url.searchParams.append('deviceId', deviceId)
event.target.href = url.toString()
}
Starting with the code snippet above, regex search all URLs within your ecosystem and run the function to add the parameter programmatically upon click or touch.
On each domain, you’ll also have to fetch the deviceID
from the URL in a similar manner and either initialize event tracking with it or set it like so:
amplitude.setDeviceId(uuid());
Amplitude does a lot of the heavy lifting here, which isn’t the case for some other tools like Segment. First, the deviceID
is an already generated identifier by Amplitude. Second, simply making the deviceID
available will enable Amplitude to de-duplicate and merge users without any other work.
Merging users
Users will often have anonymous sessions on different browsers or devices, and can authenticate themselves on each of those devices. All sessions and visitors have to be covered under the same umbrella, which is the one authenticated user with a userID
.
Consider this:
Person visits mycompany.com on their web browser.
deviceID = 1
Person visits mycompany.io on their web browser (no link click).
deviceID = 2
Person visits mycompany.com on their web browser again (
deviceID = 1
) but clicks a link to mycompany.io, creating a link wheredeviceID 1 = deviceID 2
Person logs in to app.mycompany.io.
userID = ABC
The deviceID’
s have to be merged under one user persona up to step 3. Amplitude again stands out here by keeping all anonymous device behavior associated with an amplitudeID
, which essentially consolidates events across devices when there are links in behavior across cookies. At step 4, the same deviceID’
s have to be associated to the given userID
. Because the user has authenticated themselves, the now best umbrella for the user persona is the userID
.
Amplitude again stands out here by automatically associating multiple anonymous sessions under one user persona, as long as the deviceID
and/or userID
are passed and initialized.
Blind spots
A complex problem usually means a complex solution. This is one piece of the solution, but has some key blind spots.
First, it does not cover anonymous visitors navigating directly to another one of your domains (mycompany.io) by typing it in, instead of clicking on a link (link on mycompany.com which will append a deviceID
). This can be addressed by some of the following: using a custom identifier that incorporates gclid
or other 3rd party properties; doing further user merging with other properties like IP address; and likely others I haven’t thought of.
Second, it does not resolve unauthenticated visitors across different devices (visit mycompany.com on your browser and your phone) if they do not authenticate on each device. This is a harder problem to solve, which gives the leg up to 3rd-party services like Google Analytics. These services use other login information like your Google account to identify you.
Final thoughts
Throughout this article I’ve mentioned tracking anonymous user data. Take this with a grain of salt, as it ranges from creepy to obvious. If you think about it, Google Analytics and social pixels are creepy—they use all sort of authenticated data across ecosystems on your device completely unrelated to the site you’re browsing. However, I’ve definitely succumb to Instagram ads that are actually highly relevant to my needs, style, and interests.
On the other side of the spectrum, anonymous user data can be borderline obvious: a page view from app.mycompany.io/login can easily be followed with and correlated to an authenticated page view after logging in without even using a cookie.
Not relying on Google Analytics has three key advantages:
Not as creepy
Easier to implement custom analytics and events
Keep product analytics within your app deployment for more complete data
The happy medium is somewhere in the middle, and different people’s middle will be in different places: use data to build personalized user experiences that are genuinely helpful without invading a user’s privacy.
Thanks for reading! I talk all things product, marketing, data, and startups so please reach out. I’d love to hear from you.
Great post, Sarah.
This is what we think about nonstop here at Heap. And we do a pretty darn good job addressing these challenges. :)
Thanks.