Request for Comments: Trusted Data Partners

In a previous post I explained how data leakage happens. In this post I will propose a framework to solve this problem systematically.

Levels of data leakage protection

As publishers think about preventing data leakage, there is an implicit tradeoff between protecting data and maximizing revenue. In general, we think of 1 and 2 below as "walled gardens", and thus far, only massive publishers have been able to create leverage with buyers pull off this strategy. Option 3 is appealing, but the ecosystem isn't ready for it yet, and it would reduce programmatic demand to near 0. Option 5 is the current disaster.

  1. Don't allow any third parties (Google and Bing search)
  2. Don't allow third parties except a few trusted measurement providers (Facebook)
  3. Require all creatives to be hosted; don't attach user IDs to bid requests (nobody yet)
  4. Whitelist third parties and enforce this list across the ecosystem (proposal below)
  5. Hope that data doesn't leak (almost everybody)
A "Crawl, Walk, Run" approach

I believe that the first step to solving the data leakage problem (and potentially to a number of related privacy problems) is for publishers to whitelist the companies that they are willing to share data with, and require every company they work with to honor this whitelist. This makes it difficult for a bad actor to get access to the ecosystem, and limits the number of companies that a publisher, or publishers in general, need to manage in order to protect their data. It also opens the door for a certification program or audit for technology vendors to demonstrate that they are good stewards of consumer data. Without said audit, many publishers will not allow a vendor onto their whitelist, deeply inhibiting their ability to access consumer data. This allows publishers to make the tradeoff between leakage protection and revenue incrementally, which is critical if we want to see change happen.

The obvious implication here is that publishers need to be cautious about the partners they work with, and not just in the programmatic context. Every widget that you put on a page carries data leakage risk. How many publishers that installed AddThis to make it easy to share content realized that it would create a data asset worth $200M to Oracle? How much of that value would have gone to publishers had they implemented basic data leakage protections?

Trusted Data Partners: Technical Design

To create a TDP manifest - effectively the whitelist for trusted data partners - the publisher will create a JSON file that lists trusted data partners:

{"trusted_data_partners": [
     {"domain": "", "name": "AppNexus", "comments": "programmatic partner"},
     {"domain": "", "comments": "trusted ad server"},

A data partner can be any third party that the publisher works with directly or indirectly. For companies that use multiple domains to deliver content, each domain should be listed in a separate line in the manifest:

     {"domain": "", "name": "Atlas ad serving"},
     {"domain": "", "name": "Atlas analytics"}

The manifest should be published on a public URL (by default, I propose /tdp.json) so that partners can retrieve it and so that it is easily available to the general public.

The TDP manifest should be sent on every programmatic ad request through a simple extension to the OpenRTB protocol, adding a tdp element to BidRequest with the public TDP manifest URL as the value. (Could also send the full list of domains on every request, but that seems heavyweight as a protocol).

Creative delivery

The TDP manifest should be read by the publisher's ad server and SSP in order to screen out creatives that load assets that are not on the manifest. To do this effectively, the creative must be fully rendered to see what URLs are loaded. This may be something that third-party tools like The Media Trust could to add to their audit process.

SSPs and Exchanges

The primary programmatic endpoint for most publishers is their SSP. The publisher should configure the SSP with the TDP endpoint so it can stay in synch with the publisher's preferences.

The SSP must:

  • Not send bid requests to companies not on the TDP manifest
  • Not drop user synch pixels for companies not on the TDP manifest
  • Not allow creatives that load assets not on the TDP manifest
  • Implement the TDP audit protocol described below
  • Send the TDP URL to all bidders and require them to support TDP as well

Bidders must respect the TDP manifest sent on the OpenRTB request, and only send data to companies that are on the list (including pre-bid brand safety integrations).

Consumer audit trail

One challenge of server-to-server data interactions is that there is no way for a consumer to see where their data is going as they can when the browser makes requests. To allow consumers to see what's happening, I propose that all third party technologies return a list of every company that they have sent consumer data to without browser interaction (in other words, server-to-server).

To request an audit, the browser should include an HTTP header in the request (likely added by the browser or a plugin):

GET /tt?id=1234 HTTP/1.1  
X-TDP-Audit: 1  

In the response, the third-party (in this case, AppNexus) would return a list of every data partner that they interacted with:

HTTP/1.1 200 OK  

This gives the consumer knowledge of where her data is being sent. I think if this is implemented, it will be a shocking insight. If we think the long list of companies that we see getting our data from the browser is bad... wait until you see how many are interacting server-to-server.

Consumer server-side opt-out

TDP could also be implemented by the consumer as a way to opt out of tracking even from server-to-server interactions (where a cookie optout won't actually work). One easy way to implement this would be to have the browser or a plugin add a new header to the HTTP request that lists the consumer's trusted (or untrusted) data partners. For instance:

GET /index.html HTTP/1.1  

would be a very restrictive whitelist of acceptable data partners, whereas:

GET /index.html HTTP/1.1  

would request that the publisher not send data to or

In both cases, the browser should refuse to send any cookies or identifiers to domains that aren't trusted data partners. The addition of these headers asks the publisher not to send any identifying data as well.

It might seem that this is overly restrictive and could break the user experience on certain sites, but it's better than outright ad blocking, and addresses the privacy issues more fully than blocking ads (which doesn't address the data leakage and privacy impact of widgets and other third-party code on the page).

To be clear, I don't expect consumers to know enough about the various ad tech and widget companies to know which to trust. I do think, though, that there is an opportunity for browsers or plugins to implement "privacy blocker" functionality, and potentially to link this to a central list of compliant privacy-safe platforms. Or, even better, what if ad blockers were to block ads and widgets on publishers that are not TDP-compliant?

Who needs to do what to implement TDP

Publishers need to decide that data leakage is important to them, and ask their key partners to support TDP. Publishers need to be willing to remove technologies from their sites that don't adhere to this standard. If a majority of the top 50 publishers stand behind TDP, it will quickly become a market standard. There will be short-term revenue implications as a number of significant companies and products will not be able to operate their business models without explicit data leakage, and each publisher will have to decide whether they can tolerate some short-term pain for significant medium-term gain. Publishers need to remove widgets from their site that aren't TDP-compliant.

Ad servers, especially DFP, need to enforce domain whitelists on all creatives that they deliver. This probably means the end of the "network" or "rotating" creative, where the buyer gets total control over what ad to show.

SSPs, wrappers, and exchanges need to enforce the publisher's TDP list in all programmatic transactions including creative delivery, bid requests, and user synching.

Bidders need to evaluate their business model and technical architecture to ensure that they are not storing any user data involved in programmatic transactions except to execute targeting and implement frequency caps.

DMPs need to implement TDP to insure that data is never shared with partners that the publisher doesn't trust. DMPs need to prevent any blending of data across publishers without explicit consent.

Browser plugins like Ghostery could support TDP and do an audit in the widget of which third parties are on the page that are not in the TDP manifest. This would help publishers diagnose leaky partners. In addition, the plugin should surface the TDP audit data, as the server-to-server connections are potential privacy concerns for the consumer.

Regulators might see TDP as a better way to give consumers insight and control over how their personal data is being used. The "This site uses cookies" slider is quite useless; it would be far better for the consumer to know exactly who's getting their data when they use a site. Part of me would love it if regulators required disclosure of every entity that touches consumer data (which TDP would enable) and pushed the industry to create an independent audit of how consumer data is used. This should have minimal impact on good actors and overall monetization.

Browsers could look at TDP as a less-restrictive implementation of Do Not Track - one that is fair to third parties and won't completely destroy ad revenue.

Industry groups like the IAB should help make data leakage a top-of-mind issue and assist in the standardization and adoption of TDP or a similar framework. Industry groups should also make certification and audit of all ad tech and widget products a requirement, not just a nice to have, in order to prevent bad actors from taking advantage of publishers.

Consumers should be protected without doing anything proactively. We're not there yet, but shining some light on the use of data will be a very good start.

Ecosystem Impact

The IAB recently released statistics that show that US ad spend on the open internet is flat year-over-year, growing only $40 million against $2.9 billion of growth in spend on Google and Facebook. While this data isn't perfect, it's a reasonable place to start when we talk about the problems that premium publishers face.

I believe that the open internet faces a vicious cycle. Poor outcomes for marketers force publishers to compromise on user experience and lower their ad quality standards by working with networks, showing low-viewability ads, and allowing their data to leak to lower-quality sites. Consumers fight back with ad blockers or just abandon these publishers. Marketer outcomes continue to deteriorate, and the cycle continues.

Fixing data leakage won't magically solve all of the problems that premium publishers face, but it will start to create data scarcity, which will increase prices and the share of wallet that goes to quality publishers. This will especially benefit publishers that have made investments in proprietary data, building DMPs and encouraging users to log in.

I would love to see active discussion of this proposal by industry participants. Please contact me if you have feedback or want to get involved.


How do you define a trusted partner?

Any company involved in the delivery of content that has access to consumer data is effectively a data partner, whether the publisher is aware of it or thinks of it that way. This includes ad servers, SSPs, DSPs, DMPs, rich media vendors, widgets, analytics tools - pretty much anything you see in Ghostery is getting data from the publisher. Now, whether you trust these companies or not is a different question.

Who are some examples of potential trusted partners?

Just off the cuff, you'd start with the list you see in Ghostery. Here's what I see on the New York Times:

I have no idea what relationship the Times has with these 61 companies, but I would start by doing a lightweight audit to decide which are worth the potential data risk.

How does a marketer use a trusted partner?

Marketers tend to use a few key technologies:

  • A primary ad server to deliver creative and track impressions
  • One or more attribution or measurement tools to track outcomes
  • Rich media and programmatic creative tools to build and serve creatives
  • Brand safety products to ensure that ads deliver on valid inventory
  • A DMP to manage first-party data and append third-party data
  • One or more DSPs to bid on programmatic inventory
  • Possibly ad networks

For any of these technologies to interact with the publisher, they must be on the publisher's trusted data partner manifest. Obviously, this creates tension between the marketer (or agency) and the publisher, which is the point. Today, the buyers effectively do whatever they want, and publishers have very little visibility or control over where their data goes.