Ending Ad Fraud: A Distributed and Punitive Approach

Over the past 20 years of digital advertising, we have seen a drumbeat of news stories about fraud. Often, these stories include some terrifying sum of advertising money that is being wasted buying fraudulent traffic. These stories are often seeded by an anti-fraud vendor or their proxy, who stand to benefit directly if the market adopts their technology to prevent all of this fraud. At the same time, brands and agencies have developed a Pavlovian response: when an article comes out about fraud or brand safety, they shift spend away from the risky world of digital advertising and back to legacy channels like television and radio. We need to stop ringing the fraud bell and convince brands that digital advertising is safe and effective. And to do that, we must eliminate the conditions that make fraud possible and profitable.

The Perverse Incentive Driving Ad Fraud

Almost all internet businesses buy traffic - whether from search engines or social networks or content marketing services. Paying to get visitors to your website makes sense; it's how search engines print money. If you can convert those visitors into subscribers or paying customers, great.

A more challenging traffic acquisition model is when your revenue model is based on advertising - in other words, when you're a publisher. If you pay $0.10 for a click, you need to serve 20 ads to the visitor to break even (assuming a $5.00 CPM, or cost per thousand ads). If the cost of traffic acquisition is less than the ad revenue you get for a visit to the site, it's a profitable business model. If you want to make more money, you have two choices: you can pay less for traffic or you can generate more revenue from each visit. The latter is pretty hard to do; you have to get more ads on a page or get users to visit more pages or get agencies to pay more for ads. So, why not find ways to buy cheaper clicks? This is the perverse incentive at the heart of the fraud problem in advertising. Since there's a strong incentive for publishers to buy cheap traffic, there's a significant market for bad actors to generate traffic and sell it to them - as long as they can avoid detection.

When a bad actor wants to make a bunch of money, they generate traffic and sell it to a publisher, creating ad impressions that flow through the digital advertising ecosystem, eventually soaking up money from well-meaning marketers. Traffic that looks human and gets past detection systems will drive revenue for the publisher and the bad actor. This is a fundamental failure of the supply chain.

To eliminate fraud, we have to change the incentives that align the publisher with the bad actors. If basic economics tell us that publishers make more money by gaming the system, why wouldn't they try? I believe the only way to end fraud outright is to make it punitively expensive to get caught. The first step is to start detecting fraudulent traffic.

Detecting Fraud is a Turing Test

One way to think about our anti-fraud efforts is that we’re applying a Turing test to separate out valid human traffic from traffic generated in an automated fashion. The Turing test was proposed by Alan Turing as a way to test for artificial intelligence. A human interrogator asks a person and an AI a series of questions through some interface that hides their physical manifestation. If the human can’t tell which is the AI, it passes the Turing test: it is indistinguishable from a human.

What makes the Turing test so challenging is that the AI has no idea what the interrogator is going to ask. “What did you have for dinner last Saturday?” requires the AI to both simulate a human response and to understand the vicissitudes of human memory. If you asked me that question, I would have to think for a bit. I might say “well, let’s see. On Saturday I went to the Georgia – Notre Dame game with my dad. We walked around for a while, then we – oh! Right, I had a cheese steak.” That would be a heck of a story for an AI to tell. And then the interrogator would follow up: “Why was this game important to you?” or “What did the campus remind you of?” or something else connected and tangential. It’s not an easy test for an AI!

In doing digital fraud detection, we are effectively asking the same kinds of questions, except at internet scale. We have to assess the validity of tens of billions of ad impressions every day. Of course, we can’t see the computer screen; we just have to look at what we know about the IP address, the HTTP headers, the behavioral history, the timestamps1. In ad fraud terminology, we’re asking questions like “Is this a legitimate IP address?” and “Does this user browse the web like a human?” As a basic example of how we’d evaluate this, humans tend to go to sleep. If we see a user browsing the internet 24 hours a day, it’s probably not a human.

So how do you create an “AI” that can generate high levels of traffic to a particular set of sites that you want to monetize through advertising – and that can trick the interrogator (the anti-fraud team) into thinking you’re human? First, you find some way to obfuscate your IP address. This is easy; you can use any number of proxy services that can route your traffic through computers behind residential IP addresses. Second, you record human behavior browsing through a series of sites, randomize it a bit, and then replay it across this network of borrowed IP addresses. Third, you watch carefully to see if you get blocked or banned by the major anti-fraud companies, and if you do, you change your techniques a bit until you’re not blocked any more. This isn't really artificial intelligence or machine learning; it's smart humans building automated tools to mimic human behavior at scale.

Here’s the big difference between a true Turing test and the world of fraud detection: If the interrogator is human, they can ask you anything. You have to be prepared for almost any kind of question, and then the follow-ups. However, fraud detection is generally automated and limited to a small set of data. The questions are the same every time. You don’t have to build a general AI, just one that can answer these basic questions correctly. When the questions change, as they do every few months as the anti-fraud companies adapt their systems, you change your approach.

The key point is that you have to know whether you’re getting detected or not. To protect clients, the anti-fraud vendor needs to block bad traffic, but in so doing, it provides a clear signal to the bad actor that they need to change their behavior – in essence, to learn why they failed the Turing test and update their answers. The largest anti-fraud vendors – Integral Ad Science (IAS), Double Verify (DV), MOAT, increasingly White Ops – are the primary targets of this learning. If you search the web for “IAS safe traffic”, you’ll find a number of very cheap ways to buy traffic from bad actors that have effectively gamed the IAS Turing test. It’s a cottage industry. This isn’t IAS’s fault, nor something they can control. It’s the reality of being a gateway to a lot of money: bad actors are going to try to find a way around you.

Why Verification Companies Can't End Ad Fraud

Who controls the most money? Google, who operates the largest advertising network in the world. If you search Google for AdSense safe traffic you find a bunch of results - many of which guarantee that the traffic you buy won't get you kicked off of AdSense. I think that's a key distinction. If you buy fraudulent traffic and Google figures it out, they'll kick you off of AdSense - probably costing you one of your largest revenue sources. AppNexus operates the same way: if we see bad traffic, we'll blacklist the domain or app. Since neither AppNexus nor Google sells anti-fraud technology as a standalone product, the only way you can use this technology is to operate inside the AppNexus and Google marketplaces. This makes it much harder for a bad actor to know if they’ve been detected, and thus to change their algorithm. And the cost of getting detected is high: you lose access to your largest revenue sources, possibly permanently.

In the Turing test context, this adds a huge amount of risk to trying to beat the system. Imagine if, in order to try to beat the Turing test with your fancy new AI, you have to put half your salary on the line. If you can’t fool the interrogator, you’re out a huge amount of money. In the online advertising context, it makes you think twice before buying traffic from anybody you don’t deeply trust. If you buy bad traffic, you lose access to the largest advertising marketplaces and a huge chunk of your revenue. These punitive consequences are a huge part of why third-party anti-fraud vendors can’t solve the fraud problem. They can definitely find fraud in some cases, or they’d be out of business. But without the ability to disproportionately penalize the bad actors and the supply chain, they don’t change the fundamental incentives that enable bad actors in the first place.

Some media and advertising companies announce partnerships with anti-fraud vendors, hoping to address the fraud problem (or at least give that perception). I worry about these partnerships, because they don't address the underlying incentive problems. First, it tells the bad actors who’s running the Turing test, and thus which interrogator they'll have to trick to get paid. I’m willing to bet that search results for “White Ops safe traffic” increase dramatically over the next couple of months after the recent Trade Desk (TTD) partnership announcement. Second, it provides an easy scapegoat. I can’t tell you how many publishers have said to me “it’s not my fault I have bad traffic; I use vendor X to detect it.” It is your fault if you have bad traffic! Somebody in your supply chain is buying bad traffic, and you’re hoping to get away with it. I want to be clear that I respect White Ops as an anti-fraud vendor, and I know that TTD is actively trying to do the right thing for their clients. The problem is that their partnership doesn’t stop the behavior at its source.

Here’s a simple path for us as an industry to attack the ad fraud problem at its source:

  1. Ad tech companies should build their own proprietary anti-fraud solutions 2. This is the equivalent of an AI having to be interrogated by many different people in order to pass the Turing test, each with a different set of questions. If you don't have the resources (or ethics) to invest in anti-fraud technology, you shouldn't be running an ad tech platform.

  2. Use verification vendors to ensure that every company in the value chain – including Google and AppNexus – are doing a good job at rooting out bad traffic.

  3. Stop working with any publisher, network, or exchange that has a non-trivial amount of fraudulent traffic. This is the punitive part. Look at the reaction when brands pull their spend from Google or Facebook for brand safety or fake news issues. It gets their attention and causes them to change their behavior.

If we do these three things, we force the digital advertising value chain to hold itself accountable for its behavior. If you are a publisher and you buy cheap clicks to drive traffic, you will regret it. If you’re an ad network or exchange and you trust your publishers not to buy cheap traffic, you will regret it. If you’re an agency or marketer and you don’t validate that your vendors are taking these measures, you will regret it.

Know Your Customer: RFP Questions You Must Ask

It’s time for the advertising industry to move from an opt-out basis, where we assume that our vendors and publishers are good until proven wrong, to an opt-in basis. We need to follow the “Know your customer” mantra and only work with companies that take punitive action to stop fraud. Here’s an easy questionnaire:

Ask Publishers:

  • Do you buy traffic? From whom? What percentage of your traffic is organic vs paid?
  • How do you know the traffic you buy is legitimate? (note: any answer that refers to engagement time or bounce rate is useless; bad actors know how to game these stats)
  • How much do you pay for clicks / likes from each traffic source? Look for unusually cheap traffic.
  • What percentage of the traffic you purchase uses an ad blocker? What percentage of purchase visitors register or sign into your site? How does the click rate on ads differ between purchased and organic traffic?
  • Are you willing to stake your business on this traffic source (since if it’s sending fraudulent traffic, we’ll terminate you)?

Ask networks and exchanges:

  • How many publishers have you terminated or suspended in the past 3 months for fraudulent traffic? How many new publishers have you signed up?
  • Do you use proprietary techniques to detect fraudulent traffic? How many of the terminated publishers were due to proprietary detection?
  • Who leads your anti-fraud team? Whom does she report to? How do you ensure she operates independently of revenue concerns? How many people are on the anti-fraud team? How much money do you spend on data sources that help you find non-human traffic?
  • If I detect non-human traffic on one of your publishers what action will you take? Correct answer: 1. I will immediately terminate them and 2. I will update my anti-fraud detection techniques so I catch this behavior across all my publishers.

Ask DSPs:

  • Which ad exchanges and networks do you work with?
  • For each one, how did you confirm that they have adequate anti-fraud techniques.
  • How long has it been since you did an audit on each network and exchange to validate that they are taking action on their publishers?
  • Do you allow multiple layers of intermediaries (network resale) or do you require your exchanges and networks to have direct relationships with publishers? (correct answer: I don’t allow indirect relationships and will not work with companies that do, ideally including a reference to ads.txt enforcement)
  • Who leads your anti-fraud team? Whom does she report to? How do you ensure she operates independently of revenue concerns? How many people are on the anti-fraud team? How much money do you spend on data sources that help you find non-human traffic?
  • How many networks and exchanges have you suspended or terminated in the past three months? How many new ones have you added?
  • Do you use proprietary techniques to detect fraudulent traffic? How many of the terminated networks and exchanges were due to proprietary detection?

Marketers and Agencies:

  • Please, please ask DSPs these questions and require them to answer.

Investors and Boards:

  • Many publishers and ad tech companies feel pressure from their investors to “juice” revenue and hit financial targets. Ask your CEOs these questions. Ask them to produce a supply chain quality report each year, and make this a part of executive compensation. Just as with diversity and inclusion issues, you need to give the management team permission to do the right thing, even if costs you money in the short term.
Accountability Leads to a Better Internet

I know that these guidelines will have short term revenue impact throughout the supply chain. I also recognize that they won’t fully eliminate bad actors. However, I believe that we can eliminate excuses. If a vendor isn’t following these practices and has a fraud incident, hold them fully accountable. If every link in the supply chain demands accountability, we change the incentives to reward quality and integrity.

I want an internet where the best content gets the most money. The best content producers, the best publishers in the world, don’t need to buy cheap traffic because consumers want to read their articles, hear their music, play their games. Brands are built on great content and human eyeballs. We can’t rely only on technology to make this happen; we need ethical standards, common sense, and the courage to take action to create a better internet.


Acknowledgements

Thank you to Marc Angelico, Michael Misiewicz, David Bookspan, and Mikko Kotila for their thoughtful feedback! Also thanks to the eBay Kleinanzeigan ads team for gently reminding me that it's been a while since I wrote a blog post.

Footnotes
  1. Sometimes clients complain that we accidentally blocked their IP address for unusual traffic patterns. Our clients are unusual - they visit their own properties a lot; they click on ads a lot (usually to test them); and generally don't browse the internet like ordinary people. It's a great data set for us to understand the boundaries of what is "unusual" and what's fraudulent. Clients, your inconvenience is contributing to a good cause!

  2. This means we should NOT create industry standards for how we detect fraud, and we should not post our proprietary detection methods publicly. Let's make it harder for the bad guys to figure out how we're catching them.