Why IP Addresses Don't Help Detect Query Fan-Out Sessions in server logs

A follow-up to “Introducing the Query Fan-Out Session“

The research question that seemed obvious

After publishing my research on Query Fan-Out Sessions, I was thinking: “Could IP addresses help detect when two unrelated queries accidentally get bundled together?”

The intuition makes sense. If a temporal bundle contains requests from multiple IP addresses, perhaps that indicates we’ve accidentally merged requests from different users. Different IPs, different users, right?

I set out to test this hypothesis systematically. The results surprised me, and led to a deeper understanding of how LLM systems actually work under the hood.

Spoiler: IP address is useless for collision detection within split second time windows for query fan-out session bundling. But the reason why it’s useless is fascinating.

The hypothesis we tested

Our original Query Fan-Out Session methodology uses a 50 to 100ms time window to bundle requests. This works because LLM systems dispatch web requests in parallel bursts, with 84% of request gaps ≤ 20ms.

But what about edge cases? What if two different users happen to ask questions within the same 100ms window? These “collisions” would incorrectly merge unrelated queries into a single session.

Session collision within the temporal window that defines a query fan-out session

The hypothesis: IP diversity within a bundle indicates a collision. If requests come from multiple IP addresses, they might be from different users.

To test this, I analyzed approximately 90,000 requests which consolidated into about 43% as many temporal bundles.

What we measured

We computed several fingerprint metrics for each bundle:

Metric	Definition
IP Homogeneity	Percentage of requests sharing the most common IP
Subnet Homogeneity	Same, but using /24 subnet prefix
Country Consistency	Percentage of requests from the same country
MIBCS	Mean Intra-Bundle Cosine Similarity (semantic coherence)

MIBCS is our ground truth for collisions (see previous article). If URLs in a bundle are thematically unrelated (low MIBCS), that’s a collision. We then tested whether IP diversity correlates with low MIBCS.

The surprising findings

Finding 1: Most bundles with 2+ unique urls have multiple IPs

Metric	Value
Bundles with single IP	15.4%
Bundles with single /24 subnet	23.0%
Mean unique IPs per bundle	2.04

You could have expected the opposite. If each bundle represents a single user query, shouldn’t most requests come from the same IP? Instead, 84.6% of bundles contain requests from multiple IP addresses.

This was my first hint that something interesting could be going on.

Finding 2: IP diversity doesn’t correlate with collisions

Here’s the key analysis. We compared IP homogeneity between “clean” bundles (high MIBCS, thematically coherent) and “collision” bundles (low MIBCS, thematically incoherent):

Bundle Type	IP Homogeneity
Clean (MIBCS ≥ 0.5)	43.0%
Collision (MIBCS < 0.5)	42.4%
Difference	0.6%

The difference is negligible. Clean bundles and collision bundles have virtually identical IP homogeneity rates.

The correlation coefficient between MIBCS and IP homogeneity? r = 0.023. Essentially zero.

Conclusion: IP diversity cannot distinguish collisions from legitimate sessions.

Finding 3: Geographic diversity follows the same pattern

Cohort	Country Consistency
High coherence (MIBCS > 0.7)	67.9%
Low coherence (MIBCS < 0.5)	59.2%

There’s a modest 8.7 percentage point difference here, but it’s not actionable. You can’t use a 60% vs 68% metric to make binary decisions about session validity.

Why doesn’t IP work? The atomic bundle hypothesis

The data forced me to rethink what’s actually happening inside LLM infrastructure. Here’s my hypothesis, informed by both the data patterns and my experience building a distributed system for my keyword research automation SaaS.

How I think query fan-out actually works

When you ask ChatGPT “How do I prepare for my first marathon?”, the system doesn’t just search once. It:

Decomposes your question into multiple search queries:
- “marathon training plan beginners”
- “marathon nutrition guide”
- “marathon running gear essentials”
Creates atomic bundles for each search query. An atomic bundle is a unit of work: one SERP fetch plus the subsequent webpage crawls (typically 5-10 pages).
Distributes these bundles across worker nodes for parallel execution. Node A might handle queries 1 and 3. Node B handles query 2.
Aggregates and deduplicates at response generation time, not before crawling.

This architecture explains everything we observed:

From query fan-out to atomic bundles

A single user question decomposes (fan-out) into multiple search queries, each search query plus webpage requests becoming an atomic bundle distributed across worker nodes with different IPs. Deduplication happens downstream at response generation.

Why this explains the data

Why multiple IPs per bundle? Because a single user question gets distributed across multiple worker nodes. Different nodes have different IPs.

Why IP doesn’t correlate with collisions? Because IP diversity reflects infrastructure load balancing, not user boundaries. Both clean bundles and collisions are subject to the same distribution pattern.

Why do we see duplicate URL requests across IPs? Related search queries return overlapping results. When Node A and Node B each crawl the top 10 results for related searches, they’ll fetch some of the same pages. This indicates deduplication happens downstream, not before crawling.

This isn’t speculation. I see the same pattern in my own work building a SaaS that distributes atomic task bundles across worker nodes. Each node operates independently, unaware of what other nodes are fetching. The data seems to suggest the aggregation layer handles deduplication.

What about the 15% with single IP?

You might ask: if load balancing distributes bundles across nodes, why do 15% of bundles have a single IP?

Simple explanation: small bundles. If a user asks a narrow question that generates only one search query, that’s one atomic bundle sent to one node. No distribution needed.

This aligns with our observation that single-URL bundles (which inherently have MIBCS = 1.0) skew the single-IP statistics.

Practical implications

What this means for session detection

Temporal bundling remains correct. The 100ms window captures the full query fan-out from a single user question, regardless of how many nodes execute it.
IP-based splitting would be harmful. Splitting by IP would incorrectly fragment single user sessions that happened to be distributed across nodes.
MIBCS is the reliable collision signal. Semantic coherence, whether the URLs are thematically related, is what distinguishes collisions from legitimate sessions.

What this reveals about LLM infrastructure

The IP diversity pattern tells us something about how major LLM providers architect their systems:

They prioritize speed. Distributing work across nodes enables parallel execution.
They accept redundant crawling. It’s faster to fetch duplicates and deduplicate later than to coordinate upfront.
They use stateless workers. Each node handles its atomic bundle independently.

This is a sensible architecture. The “inefficiency” of duplicate requests is a trade-off for lower latency. When you’re racing to answer a user question in seconds, you optimize for speed, not bandwidth.

The refined methodology

Based on this research, here’s our recommended approach to collision detection:

Step	Method
1. Temporal bundling	100ms window, group by bot provider
2. Collision detection	MIBCS < 0.5 indicates potential collision
3. Semantic refinement	Graph-based splitting for low-coherence bundles

IP address is explicitly not used. It provides no signal. I will soon update the Github script to add step 3.

Summary

Finding	Implication
84.6% of bundles have multiple IPs	Load balancing distributes work across nodes
IP homogeneity: 43% (clean) vs 42% (collision)	IP cannot distinguish collision from legitimate session
MIBCS-IP correlation: 0.023	No meaningful relationship
Duplicate URLs across IPs	Related searches return overlapping results

My hypothesis which is coherent with the research outcomes: IP diversity within a query fan-out session reflects infrastructure load balancing, not multiple users. You can’t track a user across time based on IP because the same IP might handle different users’ queries, and the same user’s query might be handled by multiple IPs.

Semantic coherence is the only reliable signal for detecting when unrelated queries accidentally merge. The 50-100ms temporal window remains our recommended approach.

Implications for server-side LLM activity monitoring

This research reinforces why Query Fan-Out Sessions matter. Unfortunately, IP addresses don’t seem to help us make query fan-out sessions cleaner.

The fact that LLMs distribute their crawling across nodes, creating multiple IP sources per session, is just another layer of abstraction we need to see through. The session, which is the bundled burst of requests, is the meaningful unit.

Why IP Addresses Don’t Help Detect Query Fan-Out Sessions in server logs