What technical factors cause Lighthouse lab scores to consistently overestimate or underestimate real-world Core Web Vitals performance?

You ran Lighthouse and got a 95 performance score. Your CrUX data says you fail LCP. You ran Lighthouse again and got a 72 on a page that passes in the field. Both discrepancies are normal and predictable once you understand that Lighthouse and CrUX measure fundamentally different things under fundamentally different conditions. Lighthouse simulates a single page load on your machine with artificial throttling. CrUX captures thousands of real page loads across diverse devices, networks, and browser states. The two will never agree, and expecting them to is the first mistake in performance optimization workflow.

Simulated Throttling Versus Real-World Network and CPU Constraints

Lighthouse applies CPU and network throttling to simulate a mid-tier mobile device on a 4G connection. The throttling is applied at the software level through Chrome DevTools Protocol: CPU processing is slowed by a multiplier (typically 4x slowdown), and network bandwidth is constrained to approximately 1.6 Mbps download with 150ms round-trip latency. This simulation runs on the developer’s hardware, meaning the underlying CPU, memory, and GPU resources are those of a modern desktop or laptop machine, merely throttled to approximate slower execution.

Real-world mobile devices operate under fundamentally different constraints. Hardware-level thermal throttling reduces CPU frequency when the device heats up during sustained processing — a common occurrence on mid-tier Android phones running complex JavaScript. Limited RAM (2-4GB on devices in the CrUX 75th percentile population) causes browser tab reloads and memory pressure that slows rendering. GPU memory limits affect image decoding pipeline throughput. Cellular network conditions fluctuate within a single page load as the device moves through different signal strength zones.

Simulated throttling approximates the average case but cannot model the tail behaviors that CrUX captures. CrUX reports the 75th percentile, meaning 25% of user experiences are worse than the reported value. This 75th percentile captures the worst-case combinations of slow device, poor network, background app interference, and thermal throttle — conditions that software throttling on a developer machine cannot replicate.

The delta between simulated and real throttling is most significant for JavaScript-heavy pages. Lighthouse’s CPU throttle multiplier applies uniformly, while real device throttling is non-linear: a phone that starts rendering quickly may dramatically slow down as the CPU heats during JavaScript execution, producing a performance degradation curve that a constant multiplier cannot model.

Warm Cache and Connection Reuse in Lab Environments

Lighthouse initiates each test with a cold browser cache (no cached resources from prior visits) but runs on a machine with pre-established network conditions. DNS resolution for the test domain may be cached at the operating system level from prior tests. TCP connections may benefit from connection pooling at the OS or proxy layer. TLS session tickets from prior connections may enable abbreviated handshakes.

Real users, particularly on mobile, frequently arrive with fully cold network stacks. After switching from WiFi to cellular, DNS caches are often cleared. New TCP connections must be established. TLS negotiations proceed from scratch. These cold-start costs add 100-500ms to real-world TTFB that Lighthouse does not consistently incur.

Conversely, Lighthouse does not model the performance benefit of browser cache on repeat visits. A returning user with cached CSS, JavaScript, and images experiences dramatically faster LCP than a first-time visitor. CrUX captures both first-visit and return-visit experiences in its distribution, and the proportion of return visitors affects the 75th percentile. A site with 70% returning visitors (most with warm caches) may show better CrUX LCP than Lighthouse predicts, because Lighthouse always simulates a first-visit cold-cache scenario.

The cache asymmetry means Lighthouse is pessimistic about return-visit performance and potentially optimistic about first-visit network performance. The direction of the lab-field discrepancy depends on the site’s visitor return rate and the network conditions of its actual user population.

Single-Page Scope Versus Full-Session Measurement

Lighthouse measures a single page load event. It does not scroll, click, navigate, or interact with the page after the initial load. This single-event scope creates systematic measurement differences for each CWV metric:

CLS in Lighthouse captures only layout shifts that occur during the initial page load and stabilization period. Shifts triggered by user scroll (lazy-loaded images without dimension reservation, sticky headers that reposition on scroll, infinite-scroll content injection) are invisible to Lighthouse. CrUX measures CLS across the entire page session, including all scroll-triggered and interaction-triggered shifts. This difference explains why Lighthouse frequently reports lower CLS than CrUX for pages with scroll-dependent dynamic content.

INP is not measured by Lighthouse at all. Lighthouse reports Total Blocking Time (TBT) as a lab proxy for interactivity, measuring the total milliseconds of main-thread blocking during page load. TBT and INP measure different phenomena: TBT captures load-time blocking, while INP captures the responsiveness to actual user interactions throughout the session. A page with low TBT (fast initial load) can have high INP (slow response to button clicks or form interactions after load). Optimizing for TBT to improve the Lighthouse score may not improve INP in the field.

LCP in Lighthouse captures the LCP element as determined by the lab rendering conditions. The LCP element’s identity can vary by viewport size (Lighthouse uses a fixed viewport, real users use varying viewport sizes), by lazy-loading trigger thresholds (Lighthouse does not scroll, so lazy-loaded elements that become LCP candidates on scroll are missed), and by A/B test variant (Lighthouse sees a single variant, field data captures all variants). If Lighthouse identifies a different element as LCP than real users encounter, the two measurements are comparing different things entirely.

Third-Party Script Variability

Third-party scripts — advertising, analytics, A/B testing, chat widgets, consent management platforms — are the single largest category of lab-field discrepancy for sites with significant external script dependencies.

Several mechanisms cause third-party scripts to behave differently in lab versus field:

Bot detection: many ad-tech and analytics scripts detect headless browser environments or automated testing conditions and serve lighter responses, no-op stubs, or no content at all. Lighthouse may not trigger real ad auctions, real A/B test variant assignment, or real chat widget initialization. The performance impact of these scripts in Lighthouse is understated.
Geographic and behavioral targeting: ad scripts serve different creative sizes and quantities based on user location, browsing history, and audience segments. Lighthouse requests from a data center IP may receive different (typically lighter) ad responses than a real user request from a residential IP.
Consent-dependent loading: in GDPR-regulated markets, third-party scripts fire only after user consent. Lighthouse does not interact with consent banners, so consent-dependent scripts never load in lab testing. Real users who accept consent trigger the full third-party script cascade.
Time-of-day and auction variability: ad script performance varies by time of day (peak hours produce slower ad auctions due to higher demand). Lighthouse tests at a single point in time and misses this temporal variability.

The practical impact: a page that loads six ad scripts, an A/B testing platform, a chat widget, and a consent management platform in the field may load none of these in Lighthouse. The lab score reflects a clean, script-free page. The field score reflects the full third-party payload.

When Lab Overestimates Performance (Field Is Worse)

Lab overestimates real-world performance when:

The test machine’s throttled CPU is still faster than real low-end mobile devices at the 75th percentile.
Third-party scripts are absent or lighter in lab than in production field conditions.
The test does not trigger scroll-dependent or interaction-dependent performance degradation (CLS from scroll, INP from interactions).
The single-test sample captures an optimistic scenario that does not represent the distribution tail.
DNS and connection establishment are faster in the lab environment than for first-visit mobile users.

This is the more dangerous direction because it creates false confidence. Teams that achieve high Lighthouse scores may believe CWV are passing, only to discover CrUX failures that have been affecting rankings for weeks.

When Lab Underestimates Performance (Field Is Better)

Lab underestimates real-world performance when:

The Lighthouse throttling profile is more aggressive than the actual user population’s median connection quality. A site primarily visited by users on fast WiFi connections has field performance better than the simulated 4G profile.
Most real users are return visitors with warm browser caches. The cold-cache lab test is pessimistic relative to the cached field experience.
The test location has higher network latency to the server than the majority of real users experience (e.g., testing from a US location for a site with primarily EU users served by an EU CDN edge).
The Lighthouse test captures a transient performance anomaly (server spike, CDN cache miss) that does not represent typical conditions.

This direction is less dangerous for ranking purposes (CrUX passing despite lab failures) but can mislead teams into deprioritizing optimizations that the lab suggests are needed.

Does Lighthouse test with third-party cookies enabled or disabled?

Lighthouse runs in a clean browser profile without any cookies, including third-party cookies. Real users may have consent management cookies, analytics cookies, and tracking cookies that trigger additional JavaScript execution on page load. This clean-profile testing means Lighthouse misses the performance overhead that cookie-dependent scripts add in real browsing sessions, contributing to lab-field divergence.

Can running Lighthouse on different machines produce significantly different scores for the same page?

Yes. Lighthouse scores depend on the testing machine’s CPU speed, available memory, network conditions, and background processes. A developer running Lighthouse on a high-spec workstation may see scores 10-20 points higher than the same test run on a CI server under load. Using Lighthouse CI with consistent hardware specifications and multiple run averaging reduces this variability.

Does Lighthouse measure INP?

No. Lighthouse is a lab tool that loads a page without user interaction, so it cannot measure Interaction to Next Paint, which requires actual user input events. Lighthouse measures Total Blocking Time (TBT) as a lab proxy for interactivity, but TBT and INP measure different things. TBT captures main-thread blocking during load, while INP captures responsiveness to actual interactions throughout the session.

What technical factors cause Lighthouse lab scores to consistently overestimate or underestimate real-world Core Web Vitals performance?

Simulated Throttling Versus Real-World Network and CPU Constraints

Warm Cache and Connection Reuse in Lab Environments

Single-Page Scope Versus Full-Session Measurement

Third-Party Script Variability

When Lab Overestimates Performance (Field Is Worse)

When Lab Underestimates Performance (Field Is Better)

Sources

Vega SEO Talks

Leave a Reply Cancel reply

Simulated Throttling Versus Real-World Network and CPU Constraints

Warm Cache and Connection Reuse in Lab Environments

Single-Page Scope Versus Full-Session Measurement

Third-Party Script Variability

When Lab Overestimates Performance (Field Is Worse)

When Lab Underestimates Performance (Field Is Better)

Sources

Related posts:

Vega SEO Talks

Leave a Reply Cancel reply