randomdatathoughts.
Full project · Marketing

The channel that "works" depends on which model you trust

First-touch and last-touch attribution tell two different stories about the same campaign data. Here's what changes when you switch models — and what doesn't.

The business question

A marketing team with five channels and one budget eventually has to answer an uncomfortable question: which channel actually deserves the credit for a conversion? Get the answer wrong and you cut a channel that was quietly doing the real work, or keep funding one that was just along for the ride.

Why this matters

Attribution sounds like a measurement problem. It's actually a modeling choice dressed up as one — and the choice you make changes the recommendation you'd give a CMO, even when nothing about the underlying customer behavior has changed at all.

The approach

I used anonymized e-commerce session data to build out both a first-touch and a last-touch attribution model on the same conversion events, specifically so the comparison would be clean — same customers, same conversions, only the crediting rule changes.

SQL — channel performance, ranked

SELECT
  traffic_source,
  COUNT(DISTINCT session_id) AS sessions,
  COUNT(DISTINCT CASE WHEN converted = 1 THEN session_id END) AS conversions,
  ROUND(100.0 * COUNT(DISTINCT CASE WHEN converted = 1 THEN session_id END)
        / COUNT(DISTINCT session_id), 2) AS conversion_rate,
  RANK() OVER (
    ORDER BY 100.0 * COUNT(DISTINCT CASE WHEN converted = 1 THEN session_id END)
             / COUNT(DISTINCT session_id) DESC
  ) AS rank_by_conversion
FROM sessions
GROUP BY traffic_source;

Python — two models, one dataset

import pandas as pd

# first_touch credits the channel that started the journey;
# last_touch credits the channel right before conversion.
first_touch = (
    sessions.sort_values("session_start")
            .groupby("customer_id")
            .first()["traffic_source"]
)

last_touch = (
    sessions.sort_values("session_start")
            .groupby("customer_id")
            .last()["traffic_source"]
)

comparison = pd.DataFrame({
    "first_touch": first_touch.value_counts(normalize=True),
    "last_touch": last_touch.value_counts(normalize=True),
}).fillna(0)

comparison["swing"] = comparison["last_touch"] - comparison["first_touch"]

That swing column is the actual finding. A channel can look strong under one model and mediocre under the other — and the size of that swing tells you how much of the channel's apparent performance was really "introduces people to the brand" versus "closes the sale."

Interactive dashboard — toggle between attribution models

Findings

Paid search looked strongest under last-touch — unsurprising, since people often search for a brand right before buying it, after discovering it somewhere else first. Organic social and content channels looked meaningfully better under first-touch, which matches their actual job: introducing people to something they hadn't considered yet.

Takeaway

Neither model is "correct." A team that only looks at last-touch will systematically defund the channels doing the hardest, least-credited work — getting someone's attention in the first place.

What I'd do next

A proper multi-touch or data-driven attribution model would split credit across the full path rather than picking a single winner. That's the honest next step — this project deliberately starts with the two simplest models because the gap between them is itself the most useful finding, before adding more modeling complexity on top.