July 6, 2012

Why Higgs Bosons Matter to SEO Science

Posted by Brian Klais – July 6, 2012

This week’s announcement of the Higgs boson discovery was inspiring to me — both as a lover of science, and as a practitioner of digital marketing disciplines like SEO. I’ve long believed (as othershave) that the SEO discipline shares some bizarre and surprising similarities with the quirky realm of quantum physics (e.g., fundamental uncertainty, observational influence, strange looks, to name a few). But it goes deeper than that.

For scientists, the discovery of the Higgs boson was a big deal because it signals the presence of the so-called Higgs field. This field was theorized a half-century ago as a force that acts at a subatomic level to give particles their mass. The discovery confirms a hypothesis that the universe has a built-in feature that enables order to come from chaos at a fundamental level. Why did it take 50 years?

(See helpful video below for more.)

So we have this quest to understand how order is created from chaos at a fundamental level; the challenge of proving a force which can’t be observed directly; a heavy statistical analysis process required to derive any meaningful conclusions… this is sounding familiar!


The Higgs Search

At a high level, SEO of course has a specific aim of maximizing rankings/traffic, making it an applied science (at best; black art, at worst). But the deeper science behind the outcomes requires a similar understanding and/or  pursuit of how search engines bring order out of chaos.

Not unlike our universe, the search world has become sophisticated and dynamic: hundreds of ranking variables, which change in number over time, and contribute at varying levels to the relevance (mass?) of a given URL “particle.”

How exactly does one isolate, quantify, and know with certainty which, and how much, each – or all – of these factors are acting upon a URL’s “mass” with respect to a given keyword – at a given time? It’s like 4-D chess. Unless you’re omniscient (or in this case, the search engine), SEO practitioners lack the technology capable of assessing:

  1. which variables are being applied by the engine at a given time
  2. the resulting contribution of individual variables towards the sum
  3. the compound contributions of variables toward the sum
  4. all of the above, with respect to time
  5. all of the above, with respect to competing URLs in a market

Here we get down to a struggle common to both pursuits: knowledge of a real force is sought, but the nature of it is such that only its effects can be seen. It’s impossible to observe directly, though for different reasons. With the Higgs, scientists also lack the technology to see the boson particle before it decays; their hope is to see the result of the particle to conclude whether it, and by extension, the Higgs field exists. Making matters worse, the predicted effects of the Higgs particle is evidently tiny.

So it’s intriguing to consider the solution for both endeavors: experiments and data analysis. That’s the reason CERN built the $10 billion underground LHC collider instrument – to run 40 million light-speed collision experiments every second, 24×7, generating billions of data points for analysis.

As I understand it, Higgs physicists approached their mountain of data by running the data sets through a few theoretical lenses – one in which the Higgs field does exist and one in which it does not. This rigorous analytical comparison is essentially what allows them to have confidence that they are seeing the unseeable – tiny signs of the Higgs boson’s predicted impact.

Here I think the similarities between disciplines are instructional; it’s this kind of rigor that can elevate SEO from an art to science. Experiments can be designed, executed, analyzed and repeated to determine causality vs correlation and inform conclusions on the nature of reality.

The Science of SEO

Here’s an example of how I’ve seen this look in practice.

At a previous SEO company, we’d conduct large-scale concurrent tests across tens of thousands of proxy pages we hosted — dynamically modifying elements of the test cell pages, like the keyword order in the title tag, or URL structure, or page headings, inbound anchor text, body copy, user generated content, etc., in order to measure the results. (Our equivalent to the Large Hadron Collider.)

The goal was to validate (or not) theories espoused internally or generally getting air play. Are fewer title tag keywords better? Is user-generated content dilutive? Does page de-duplication or canonical tags make a difference? Where’s the line between “best practice” and “most practical.”

So we’d design and deploy the experiments, careful to isolate the control group of pages from changes during the test. We’d look for sustained ranking and traffic delta over a 4-8 week period against the control. If we found significant delta, we’d then apply a suppression technique to undo the element previously optimized and compare against the control. If proportionate ranking and traffic degradation was observed and sustained over similar periods of time, we’d conclude a causal relationship was likely and decide whether to permanently apply the tactic, or run it against a new challenger scheme.

(As a footnote, I published our findings of the UGC experiment – which proved counter to internal views that UGC would dilute keyword density and harm rankings.)

This kind of experimentation and data analysis would then form the basis of our dynamic SEO strategies. The findings might inform the first iteration of a site’s information architecture, the page elements we wanted reserved for algorithmic optimization, the algorithm itself, the frequency of change, all the above.

In my experience, there are at least three requirements to developing this gene:

#1 – Experimental Technology:

Like the CERN collider instrument, running experiments minimally requires access to large volumes of diverse pages that can be modified in crucial ways with little overhead. In our case, we found proxy pages to be the perfect vehicle for conducting experiments. If configured properly, they provided the closest thing SEO has to a legitimate A/B testing environment.

#2 – Executive Support:

Top-level support of a scientific methodology can be a big hurdle for executives who are accountable to investors and clients for time-bound results. What’s the payoff for organizations that believe deep down in their DNA that data paves the fastest route to search performance? Innovation and company culture get unlocked in powerful, sustaining ways. Few things can so effectively democratize strategy and counter big egos.

#3 – Dedicated Resources:

Once you’ve got #1 and #2, you need dedicated data analysis resources. These are data geeks with statistical analysis experience, with the ability to translate theories into experiments and communicate results. Dedicate them to the function of controlling experiments and data analysis.

The final quality I find inspiring about the Higgs discovery is less about science, and more about the human spirit. To pursue what you believe to be truth for upwards of 50 years is remarkable. Most of those years were without the CERN (or similar Fermilab) collider technologies needed to produce the requisite experiments and data. But they persisted. They dealt with the critics and ideologues. Now they own the conversation.

Of course, I don’t know the future of SEO. But if it has one, I suspect it will look a lot like the science and spirit we witnessed this week. As we attempt to understand a multi-device, social and local world of search, new hypotheses are needed to explain the nature of reality. What are the “Higgs fields” of this new world? What scientific methods will be used to identify them? As search complexity increases, a commitment to this sort of scientific method is more important than ever.

An obvious challenge is that the SEO industry is resourced quite differently than CERN – namely by paying clients impatient for results, rather than governments interested in scientific discovery! Nonetheless, since this type of SEO science is already possible – and being practiced by some organizations – my prediction is that it will soon become a survival gene (perhaps driven by social and multi-device complexity if nothing else). This science gene will positively differentiate the companies that have it to the detriment of those that do not, causing them to win more business, hence funding more scientific discovery.

Submit a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Posted By