A follow-up to “Introducing the Query Fan-Out Session“
The research question that seemed obvious
After publishing my research on Query Fan-Out Sessions, I was thinking: “Could IP addresses help detect when two unrelated queries accidentally get bundled together?”
The intuition makes sense. If a temporal bundle contains requests from multiple IP addresses, perhaps that indicates we’ve accidentally merged requests from different users. Different IPs, different users, right?
I set out to test this hypothesis systematically. The results surprised me, and led to a deeper understanding of how LLM systems actually work under the hood.
Spoiler: IP address is useless for collision detection within split second time windows for query fan-out session bundling. But the reason why it’s useless is fascinating.
The hypothesis we tested
Our original Query Fan-Out Session methodology uses a 50 to 100ms time window to bundle requests. This works because LLM systems dispatch web requests in parallel bursts, with 84% of request gaps ≤ 20ms.
But what about edge cases? What if two different users happen to ask questions within the same 100ms window? These “collisions” would incorrectly merge unrelated queries into a single session.

The hypothesis: IP diversity within a bundle indicates a collision. If requests come from multiple IP addresses, they might be from different users.
To test this, I analyzed approximately 90,000 requests which consolidated into about 43% as many temporal bundles.
What we measured
We computed several fingerprint metrics for each bundle:
| Metric | Definition |
|---|---|
| IP Homogeneity | Percentage of requests sharing the most common IP |
| Subnet Homogeneity | Same, but using /24 subnet prefix |
| Country Consistency | Percentage of requests from the same country |
| MIBCS | Mean Intra-Bundle Cosine Similarity (semantic coherence) |
MIBCS is our ground truth for collisions (see previous article). If URLs in a bundle are thematically unrelated (low MIBCS), that’s a collision. We then tested whether IP diversity correlates with low MIBCS.
The surprising findings
Finding 1: Most bundles with 2+ unique urls have multiple IPs
| Metric | Value |
|---|---|
| Bundles with single IP | 15.4% |
| Bundles with single /24 subnet | 23.0% |
| Mean unique IPs per bundle | 2.04 |
You could have expected the opposite. If each bundle represents a single user query, shouldn’t most requests come from the same IP? Instead, 84.6% of bundles contain requests from multiple IP addresses.
This was my first hint that something interesting could be going on.
Finding 2: IP diversity doesn’t correlate with collisions
Here’s the key analysis. We compared IP homogeneity between “clean” bundles (high MIBCS, thematically coherent) and “collision” bundles (low MIBCS, thematically incoherent):
| Bundle Type | IP Homogeneity |
|---|---|
| Clean (MIBCS ≥ 0.5) | 43.0% |
| Collision (MIBCS < 0.5) | 42.4% |
| Difference | 0.6% |
The difference is negligible. Clean bundles and collision bundles have virtually identical IP homogeneity rates.
The correlation coefficient between MIBCS and IP homogeneity? r = 0.023. Essentially zero.
Conclusion: IP diversity cannot distinguish collisions from legitimate sessions.
Finding 3: Geographic diversity follows the same pattern
| Cohort | Country Consistency |
|---|---|
| High coherence (MIBCS > 0.7) | 67.9% |
| Low coherence (MIBCS < 0.5) | 59.2% |
There’s a modest 8.7 percentage point difference here, but it’s not actionable. You can’t use a 60% vs 68% metric to make binary decisions about session validity.
Why doesn’t IP work? The atomic bundle hypothesis
The data forced me to rethink what’s actually happening inside LLM infrastructure. Here’s my hypothesis, informed by both the data patterns and my experience building a distributed system for my keyword research automation SaaS.
How I think query fan-out actually works
When you ask ChatGPT “How do I prepare for my first marathon?”, the system doesn’t just search once. It:
- Decomposes your question into multiple search queries:
- “marathon training plan beginners”
- “marathon nutrition guide”
- “marathon running gear essentials”
- Creates atomic bundles for each search query. An atomic bundle is a unit of work: one SERP fetch plus the subsequent webpage crawls (typically 5-10 pages).
- Distributes these bundles across worker nodes for parallel execution. Node A might handle queries 1 and 3. Node B handles query 2.
- Aggregates and deduplicates at response generation time, not before crawling.
This architecture explains everything we observed:
From query fan-out to atomic bundles

Why this explains the data
Why multiple IPs per bundle? Because a single user question gets distributed across multiple worker nodes. Different nodes have different IPs.
Why IP doesn’t correlate with collisions? Because IP diversity reflects infrastructure load balancing, not user boundaries. Both clean bundles and collisions are subject to the same distribution pattern.
Why do we see duplicate URL requests across IPs? Related search queries return overlapping results. When Node A and Node B each crawl the top 10 results for related searches, they’ll fetch some of the same pages. This indicates deduplication happens downstream, not before crawling.
This isn’t speculation. I see the same pattern in my own work building a SaaS that distributes atomic task bundles across worker nodes. Each node operates independently, unaware of what other nodes are fetching. The data seems to suggest the aggregation layer handles deduplication.
What about the 15% with single IP?
You might ask: if load balancing distributes bundles across nodes, why do 15% of bundles have a single IP?
Simple explanation: small bundles. If a user asks a narrow question that generates only one search query, that’s one atomic bundle sent to one node. No distribution needed.
This aligns with our observation that single-URL bundles (which inherently have MIBCS = 1.0) skew the single-IP statistics.
Practical implications
What this means for session detection
- Temporal bundling remains correct. The 100ms window captures the full query fan-out from a single user question, regardless of how many nodes execute it.
- IP-based splitting would be harmful. Splitting by IP would incorrectly fragment single user sessions that happened to be distributed across nodes.
- MIBCS is the reliable collision signal. Semantic coherence, whether the URLs are thematically related, is what distinguishes collisions from legitimate sessions.
What this reveals about LLM infrastructure
The IP diversity pattern tells us something about how major LLM providers architect their systems:
- They prioritize speed. Distributing work across nodes enables parallel execution.
- They accept redundant crawling. It’s faster to fetch duplicates and deduplicate later than to coordinate upfront.
- They use stateless workers. Each node handles its atomic bundle independently.
This is a sensible architecture. The “inefficiency” of duplicate requests is a trade-off for lower latency. When you’re racing to answer a user question in seconds, you optimize for speed, not bandwidth.
The refined methodology
Based on this research, here’s our recommended approach to collision detection:
| Step | Method |
|---|---|
| 1. Temporal bundling | 100ms window, group by bot provider |
| 2. Collision detection | MIBCS < 0.5 indicates potential collision |
| 3. Semantic refinement | Graph-based splitting for low-coherence bundles |
IP address is explicitly not used. It provides no signal. I will soon update the Github script to add step 3.
Summary
| Finding | Implication |
|---|---|
| 84.6% of bundles have multiple IPs | Load balancing distributes work across nodes |
| IP homogeneity: 43% (clean) vs 42% (collision) | IP cannot distinguish collision from legitimate session |
| MIBCS-IP correlation: 0.023 | No meaningful relationship |
| Duplicate URLs across IPs | Related searches return overlapping results |
My hypothesis which is coherent with the research outcomes: IP diversity within a query fan-out session reflects infrastructure load balancing, not multiple users. You can’t track a user across time based on IP because the same IP might handle different users’ queries, and the same user’s query might be handled by multiple IPs.
Semantic coherence is the only reliable signal for detecting when unrelated queries accidentally merge. The 50-100ms temporal window remains our recommended approach.
Implications for server-side LLM activity monitoring
This research reinforces why Query Fan-Out Sessions matter. Unfortunately, IP addresses don’t seem to help us make query fan-out sessions cleaner.
The fact that LLMs distribute their crawling across nodes, creating multiple IP sources per session, is just another layer of abstraction we need to see through. The session, which is the bundled burst of requests, is the meaningful unit.
