Kevin Indig, in a piece published on Search Engine Land, argues that publishing proprietary data is necessary for winning an AI citation but not sufficient on its own. The stakes sit inside GEO, generative engine optimization, the practice of shaping content for LLM-driven answers, because Indig is arguing that owning a number and getting credited for it are two separate contests. His own citation research says the second contest is winnable by whoever presents the data best, not whoever generated it. Ownership is only the entry fee.
An information gain study from On-Page.ai, cited in Indig’s piece, examined 150 pages holding a top three Google ranking, drawn from 50 keywords across 10 industry verticals. Each page was graded on how much unique substance it contributed beyond its ranking peers, on a 0 to 100 scale. The median page scored 52, and the trait most correlated with that score was original data, ahead of page length and every other factor the study tracked. Pages carrying at most one unique figure averaged a 40.2 information gain score. Pages carrying 15 or more averaged 62.1, with the score rising at each step in between.
The bar for beating today’s top-ranking pages is not high. The study found that the average top-visible result in classic Google search carries only four unique data points, so a page built around five or more original claims, figures, or direct answers already clears that threshold. Indig frames this as a low-lift opportunity rather than a research department project, since most businesses already generate data worth publishing as a byproduct of running the business itself.
That framing is where Indig’s caveat cuts against conventional advice. Most guidance on proprietary data assumes that originating a number secures the credit for it. Growth Memo, the analysis firm Indig writes for, found that entity types tagged as DATE and NUMBER are the strongest predictors of ChatGPT citation. The citation frequently lands on whichever page presents that entity most legibly to a language model, not necessarily the page that generated it first. An aggregator that repackages a brand’s benchmark into a cleaner, answer-ready page can collect the citation the original research earned.
A separate Growth Memo analysis of 18,012 verified ChatGPT citations found what Indig calls a ski-ramp distribution. Roughly 44.2 percent of citations concentrate in a page’s opening 30 percent, while the middle band, running from 30 to 70 percent, earns 31.1 percent. Material buried in the final third of a long post is cited roughly 2.5 times less often. A follow-up study across seven verticals sharpened that target: AI systems read hardest between the 10th and 20th percent of a page. They skip over the opening tenth as navigation or intro filler, and give the closing tenth only 2.4 to 4.4 percent of citations no matter the industry.
Applied to a data-driven page, the research points to a specific build order:
- Put the strongest statistic in the first screen.
- Define what the metric measures and its population in one sentence.
- Box the sample size and collection method for attribution confidence.
- Rank secondary findings by strength instead of narrative sequence.
The long buildup that rewards a human reader, the payoff saved for the closing paragraph, works against a system that reads like a rushed editor rather than a patient student.
The operator decision this creates is not about the next research project. It is about the benchmark reports a brand has already published. Any SEO or content team sitting on an existing data study should run a quick audit:
- Check whether the strongest number sits in the page’s first screen rather than three paragraphs down.
- Check whether the methodology is boxed in a labeled block rather than buried in a closing paragraph.
- Check whether a competitor’s shorter recap of the same numbers is now outranking the original study in citations.
That decision does not require new data. A brand that already owns proprietary numbers should spend the next quarter restructuring its best existing study for extraction before it commissions another one, since the findings already clear the originality bar and only the layout is costing it citations.
Kevin Indig’s analysis was published on Search Engine Land on July 2, 2026.