The problem with how we measure LLM traffic today

As an SEO consultant, I’m increasingly asked to do GEO (Generative Engine Optimization). To my disdain, I see a lot of hype from SaaS builders trying to sell prompt-based LLM monitoring that is not helpful to marketers. Let me explain why.

The current approach to “AI visibility monitoring” typically involves tracking how LLMs respond to specific prompts. The fundamental problems with this:

  • No volume data. There is no search or chat volume data that indicates how often a prompt is actually used. It’s like doing keyword research without any data, just making up a list without grounding in actual search volume.
  • Probabilistic busywork. I don’t like this probabilistic kind of monitoring which seems to only keep us busy without real value add for businesses. You’re measuring responses to prompts you invented, not actual user behavior.

In my search for a more meaningful way to report on LLM user activity, I dove into server logs and discovered something surprising: LLM request volumes are remarkably high. Much higher than any referral traffic would make it seem.

I started with an assumption: user agents like ChatGPT-User make requests when a user asks a question. Therefore, these requests might function as a kind of “impression”, drawing a parallel to Google Search Console metrics.

But the request counts were way too high. It dawned on me that query fan-out, where an LLM performs multiple searches through APIs like Google Search, could cause multiple pages from the same website to be requested to answer a single user question. If I could proof this is indeed the case and detectable, it would mean server-side query fan-out tracking could be possible and a groundbreaking insight.

From this insight, I started researching if and how we could bundle multiple requests into some kind of session. That research is what I’m sharing today. And from its conclusions, I can confidently introduce the Query Fan-Out Session as a metric I recommend to anyone who wants to make server-side LLM user-agent activity more meaningful.

Important distinction: The “session” in Query Fan-Out Session is not like a Google Analytics session on your site. It is definitionally closer to a GSC impression: a GSC impression says something about the use of Google SERPs, and a Query Fan-Out Session says something about the use of ChatGPT and other LLMs with distinct “user” bots.

Let me explain how we got here and what we found.

Understanding query fan-out behavior

Here’s what actually happens when someone asks an AI assistant “What are the best options for [topic]?”:

  1. The AI queries a search API (like Google)
  2. It receives a list of relevant results (including your site and competitors)
  3. It simultaneously crawls multiple pages, often within 10-20 milliseconds of each other
  4. It synthesizes information from all sources into a single answer

This behavior is called query fan-out. The AI fans out its retrieval across multiple sources to answer one question. Those 4-5 “separate” requests to your site in your logs? They likely all came from a single user question.

The query fan-out illustrated

Counting individual requests dramatically inflates your numbers while hiding the actual insight: how often does your content help answer real user questions?

Moreover it is important to note that a bot might hit the same page twice, so deduplication is also needed to lower the number of requests to unique urls.

Let’s now dive into what I call a Query Fan-Out Session.

What is a Query Fan-Out Session?

A Query Fan-Out Session is a bundle of web requests from an LLM chat assistant that originated from a single user question and can be uncovered through server-side query fan-out tracking using server logs.

My research analyzed the timing patterns of LLM bot requests, to find the ideal time window to group LLM requests of ChatGPT-user

To find out how to bundle ChatGPT-user requests into a related session, I ran a rigorous experiment.

The results show a remarkably consistent behavior:

Observation Value
Most common gap between requests 9ms
Median gap 10ms
84% of request gaps ≤ 20ms
90% of request gaps ≤ 53ms

These ultra-tight timing patterns are distinctive. When an LLM answers a user question, it doesn’t wait, it fires off requests nearly simultaneously. This creates a clear signature we can use to group requests into meaningful sessions.

You might be wondering: how do we know which requests belong together? The answer lies in the timing. LLM systems dispatch their web requests in parallel, creating bursts of activity that are unmistakable once you know what to look for.

The research: finding the right time window

The core challenge is determining the optimal time window: too narrow and we split genuine single-query bursts; too wide and we merge unrelated queries together.

We systematically tested time windows from 50ms to 5,000ms using semantic coherence as our quality metric. The semantic coherence measures whether the URLs in a session are thematically related. If they are, the session likely represents a single focused query.

The results were clear:

Window High-Coherence Sessions Assessment
50-100ms 91-92% ✅ Recommended
200-500ms 87-90% ✅ Viable alternatives
1,000ms+ <83% ❌ Degradation begins
5,000ms 67% ❌ Not recommended

Visual analysis of window performance

Session bundle size increases with larger windows while semantic coherence (color) degrades. The 100ms window (green circle) achieves optimal results.
Session bundle size increases with larger windows while semantic coherence (color) degrades. The 100ms window (green circle) achieves optimal results.

A closer look to how semantic coherence degradates in larger bundles

The paper attached below shows more details.

Our recommendation: 100ms as the standard window.

At this threshold:

  • Over 91% of sessions maintain strong thematic coherence
  • We capture the vast majority of legitimate request bursts
  • There’s sufficient buffer for network latency variations
  • Over-bundling (merging unrelated queries) is virtually non-existent at 0.04%

We validated this on hold-out data with 100% ranking agreement. The methodology is robust.

What Query Fan-Out Sessions reveal about your content

Query Fan-Out Sessions reveal insights that individual request counting cannot. Let me walk you through the three most important ones.

1. True AI query volume

Instead of seeing “ChatGPT made 500 requests today,” you see “Your content answered approximately 220 AI-assisted user questions.”

This is the real number that matters. It tells you how often your content is being used to answer real user questions in AI interfaces.

2. Topical authority signals

When an LLM pulls multiple pages from your site in a single query fan-out, it indicates topical authority. The AI found multiple pieces of relevant content from your domain, a strong signal that your coverage of that topic is comprehensive.

Single-page sessions are normal and expected. Your site typically appears once per query alongside competitors. But multi-page sessions? That’s where you’re winning. The LLM is pulling 2, 3, or even 4 pages from your site to answer one question. That’s topical dominance.

3. Content relationship mapping

By analyzing which pages appear together in sessions, you discover natural content clusters from the AI’s perspective. This reveals how AI systems understand your content topology.

For example, if your mortgage calculator page frequently appears alongside your first-time buyer guide, the AI sees these as related. This is valuable intelligence for internal linking and content strategy.

Makes sense?

A query fan-out session with multiple urls reveals a lot about the decision journey that may be covered in the answer of the LLM.

The Query Fan-Out Session data model

Based on our research, we’ve developed a complete specification for tracking Query Fan-Out Sessions. Here’s what a production implementation should capture:

Core session data

Field Purpose
session_id Unique identifier (UUID)
session_start_time First request timestamp
duration_ms Session length in milliseconds
bot_provider OpenAI, Perplexity, etc.
request_count Total requests in session
unique_urls Distinct pages requested
confidence_level Quality classification (high/medium/low)
url_list All URLs in the session

Session naming: making data actionable

Raw session IDs (UUIDs) are great for databases but useless for reporting. We need human-readable names that stakeholders can immediately understand. I’ve developed two approaches.

Approach 1: URL slug derivation (current POC)

Extract the session name from the first URL’s path segment:

First URL Session Name
example.com/blog/home-buying-guide “home buying guide”
example.com/tips/first-time-buyer-checklist “first time buyer checklist”
example.com/ “homepage”

This provides immediate context without any additional processing. It’s simple and works well for sites with descriptive URL structures.

Approach 2: AI-powered labeling (future enhancement)

For more sophisticated reporting, a lightweight language model (think GPT-5-nano or similar cost-efficient model) could analyze the URL list in each session and generate a meaningful topic label:

Session URLs AI-Generated Label
/mortgage/calculator, /blog/home-buying-guide, /tips/first-time-buyer, /faq/closing-costs “First-Time Home Buying Decision Support”

This transforms raw log data into actionable reporting categories. Imagine a dashboard where you see “Your content answered 47 questions about First-Time Home Buying this week.” That’s the level of insight we’re working towards.

Semantic coherence: validating session quality

How do we know our sessions are meaningful? We measure semantic coherence, the thematic similarity of URLs within each session.

Using TF-IDF embeddings of URL paths, we calculate pairwise cosine similarity and classify sessions:

Level Mean Similarity Interpretation
High ≥ 0.7 URLs are strongly related thematically
Medium ≥ 0.5 URLs share common themes
Low < 0.5 URLs may be loosely related or diverse

At the 100ms window, over 91% of sessions achieve high coherence, indicating they likely represent single, focused user questions.

Key metrics for LLM visibility reporting

Based on our research, here are the KPIs that matter for Query Fan-Out Session reporting:

Metric Definition Why It Matters
Sessions per Day Count of unique query fan-out sessions True AI query volume
Multi-Page Session Rate % of sessions with 2+ pages from your site Topical authority indicator
Avg Pages per Session Mean unique URLs per session Content depth signal
Fan-Out Ratio Total requests ÷ Total sessions Request amplification factor
High-Confidence Rate % of sessions with high coherence Data quality indicator

The multi-page session rate and seeing the actual grouped urls is particularly interesting. A higher rate indicates that LLMs are finding multiple relevant pages on your site for user queries, a sign of strong topical authority. This is the AI equivalent of ranking for multiple keywords in a topic cluster.

Moreover, seeing multiple urls in one query fan-out session can reveal what kind of answers are given and what kind of question may be asked that ties those different urls in the group together. A query fan-out session with multiple urls reveals a lot about the decision journey that may be covered in the answer of the LLM.

Methodological transparency

I want to be upfront about limitations as this is new research, but far from completed.

What we validated:

  • 100% ranking agreement on hold-out data (80/20 train/validation split)
  • Consistent results across bot providers (OpenAI and Perplexity show similar burst patterns at 17-22ms median)
  • Session quality metrics remain stable across the validation set

What we acknowledge:

  • Without ground-truth labels from the LLM providers, we use semantic coherence as a proxy for session quality
  • The methodology was validated on data from a single mid-sized content publisher
  • Results may vary for sites with significantly different content structures

However, the consistency of our findings across validation sets and the tight alignment with known LLM retrieval patterns give me confidence in the practical utility of this approach.

Looking ahead: collision detection and refinement

What happens when two unrelated user queries arrive within the same 100ms window? This rare scenario creates “collisions”, sessions that incorrectly merge different queries.

Our analysis shows this affects less than 1% of sessions at the 100ms window. For even higher precision, we’ve designed a two-stage refinement approach:

  1. Stage 1 (Temporal): Apply the 100ms window bundling
  2. Stage 2 (Semantic): For anomalous bundles (size ≥ 4 AND coherence < 0.5), apply graph-based splitting to separate collided queries

This hybrid approach maintains processing efficiency while addressing edge cases.

The bottom line

As LLM-powered search becomes an important gateway to web content, publishers need new metrics that accurately reflect this reality. Traditional page views and requests tell you what was accessed. Query Fan-Out Sessions tell you why: how your content contributes to answering real user questions.

Key takeaways:

  1. 100ms is the optimal window for grouping LLM bot requests into sessions
  2. 91%+ of sessions maintain high thematic coherence at this threshold
  3. Multi-page sessions indicate topical authority, the LLM finds multiple relevant pages from your domain
  4. Session naming (via URL slugs or AI labeling) makes reporting actionable
  5. The methodology is validated and ready for production implementation

As AI assistants handle an increasing share of information retrieval, understanding your visibility in this new paradigm is essential for content strategy, SEO, and digital marketing measurement.

Automated Query Fan-Out Session tracking

The research methodology and code to transform Cloudflare logs into Query Fan-Out Session reporting is open source and free to use, available on GitHub:

Server-Side Query Fan-Out Session monitoring & Reporting (Github Repo)

This will allow you to implement session tracking on your own infrastructure using Cloudflare’s log data. More supported soon.

Citation

You can read the technical research paper here: Query Fan-Out Session Analysis: Determining Optimal Time Windows for LLM Bot Request Bundling

If you reference this research in your work: 

Remy, R. (2025). Query Fan-Out Session Analysis: Determining Optimal Time Windows for LLM Bot Request Bundling. Conversem Research Report.

This article is part of Conversem’s ongoing research into AI-driven web traffic and the evolving landscape of content discovery.

My follow up research to read next: Why IP Addresses Don’t Help Detect Query Fan-Out Sessions in server logs