Four in five ChatGPT product recommendations change the moment web search is switched on. That single finding from a Visibility Labs study, reported by Search Engine Land on June 17, reframes how GEO and AEO teams should allocate optimization effort across AI shopping queries.
Jeff Oxford, founder and CEO of Visibility Labs, ran 1,000 product-recommendation prompts ten times each with ChatGPT search enabled and ten times with search disabled, generating 20,000 total responses. Only 19.8% of products surfaced in the no-search condition also appeared when search was active. The discontinuity is not marginal noise. It is the dominant pattern.
The counterintuitive result sits at the study’s center. Oxford expected the most consistently recommended products to hold their position when search came on. They did not. Among products that appeared in 100% of no-search responses, only 15.8% carried over once ChatGPT could access the web. The products ChatGPT trusted most from training data were the ones most likely to be displaced by live retrieval.
Search also compressed the recommendation set. With search enabled, ChatGPT returned an average of 5.2 products per response and 19 unique products across ten runs of a single prompt. Without search, those figures were 6.2 products per response and 21.8 unique products. A smaller, tighter pool of products gets recommended when the model can pull current sources.
The study tracked whether products mentioned in ChatGPT’s cited sources appeared more often in its recommendations. Oxford reported a 0.4 Pearson correlation between cited-source mentions and recommendation frequency, using a “Visibility Score” defined as the percentage of runs in which a product appeared for a given prompt. Products named more frequently in cited sources tended to score higher. The analysis, however, did not establish that source citations caused higher recommendation rates. The correlation is real. The mechanism is not confirmed.
That methodological limit matters before drawing tactical conclusions. This was an observational study. It tells practitioners what co-occurs, not what drives what. Whether ranking in the specific sources ChatGPT cites at query time matters more than broad web visibility across thousands of pages remains an open question the data cannot answer.
What the data does clarify is the strategic split. Training-data optimization and live-citation optimization are different games with different levers. If your brand’s product authority lives in evergreen content that fed pre-training corpora, that equity does not automatically transfer to the retrieval layer. When search is on, ChatGPT is pulling from sources available at query time. Appearing in those sources is a separate task from appearing in the model’s parametric memory.
For GEO teams, this suggests auditing which sources ChatGPT actually cites for your product category when search is enabled, then treating presence in those outlets as a distinct objective alongside broader content authority. Review placements, comparison articles, and editorial roundups in the publications that appear in ChatGPT’s citation panel carry weight in the retrieval condition. Training-data coverage does not substitute for them.
Over the next 90 days, GEO practitioners should run their own prompt sets under both conditions for product categories they care about, compare the overlap, and identify which sources recur in ChatGPT’s citation panel when search is active. That list of sources becomes a media and PR target, not just an SEO one. The Visibility Labs findings suggest the gap between parametric recommendations and retrieval-augmented recommendations is wide enough that brands optimized for one condition may be invisible in the other.
Figures and study details reported by Danny Goodwin in Search Engine Land on June 17, 2026, citing research by Jeff Oxford of Visibility Labs.