← Back to Blog

Scraped Data vs Streamed Data — Claude Live Artifacts Made the Gap Visible

Claude Desktop Live Artifacts can now pull scraped GA4 numbers via Apify and streamed first-party events from BigQuery into the same dashboard — and when you do, the gap between what GA4 displays and what actually happened on your WooCommerce store becomes visible on a single screen. GA4 applies sampling, modelled conversions, threshold suppression, and attribution remapping before you see a number. Server-side streamed events bypass every one of those layers. The dashboard that tells the truth depends entirely on which data source you feed it.

Two Data Sources, One Screen, Different Numbers

Until Live Artifacts shipped, scraped platform numbers and streamed warehouse events almost never appeared in the same dashboard — so the gap between them stayed invisible.

GA4 says yesterday’s revenue was $48,200. Your BigQuery events table says $58,900. Both numbers are sitting in the same Claude Live Artifact, pulled from two different MCP servers, refreshed when you opened the dashboard this morning. The 22% gap between them is the data quality story most WooCommerce store owners have never seen — because until April 20, 2026, there was no dashboard surface that made both sources visible at the same time.

The gap isn’t a bug. It’s a consequence of two fundamentally different data paths. One number was scraped from a platform’s UI — the version of your data that GA4 or Meta Ads chose to display after applying its own processing layers. The other number was streamed from server-side events captured at the WooCommerce hook level and written to BigQuery with every parameter intact. Same store, same day, different answers.

Claude Desktop Live Artifacts, launched on April 20, 2026, connect to data sources through MCP servers and refresh with current data every time you reopen them. They’re available on all paid Claude plans — Pro at $20 per month. For the first time, a WooCommerce store owner can put the scraped number and the streamed number next to each other on a single page and ask the question that matters: which one is right?

The Apify Workaround That Made the Gap Visible

Marketers discovered they could scrape GA4 and Meta Ads into Live Artifacts via Apify — and the workaround accidentally surfaced a data quality distinction nobody had planned to examine.

Claude Cowork doesn’t have native GA4 or Meta Ads connectors. That’s the gap Annika Helendi, a marketing practitioner, documented in her May 2026 review of Live Artifacts: the tools most marketers actually use for reporting aren’t natively connected. The workaround: install the Apify connector, point a scraper at the GA4 UI or Meta Ads dashboard, and feed the scraped numbers into a Live Artifact.

Apify provides $5 of free credits monthly — enough to run hundreds of scrapes — and integrates directly with Claude Cowork as a connector. The workflow is real and it works. Helendi described the setup as taking half a day for a marketer comfortable with Cowork, with the main friction being the connector configuration rather than the artifact itself.

The Apify-scraper workaround for missing GA4 and Meta Ads connectors in Claude Cowork works mechanically but inherits every interpretive layer the platform UI applied — the scraper captures the platform’s opinion, not the raw events.

The workaround is fine for what it is. If you need to see what GA4 reports, scraping the GA4 UI gives you what GA4 reports. But the moment a store owner puts that scraped GA4 number next to a BigQuery number from the same events stream, the gap appears — and the gap demands an explanation.

You may be interested in: Every WordPress to BigQuery Tool Compared: ETL vs Event Streaming

What Scraped Data Inherits From the Platform

Between your WooCommerce event and the number GA4 displays, at least four processing layers run — and a scraper captures the output of all four, not the input.

Every number you see in the GA4 UI has already been processed. Understanding what that processing does is the key to understanding why scraped data and streamed data produce different numbers for the same store.

Layer one: sampling. GA4 exploration reports apply sampling when data volume exceeds processing thresholds. Observed average error rates are around 5%, but they can climb to 30% for smaller date ranges. The standard reports use 100% of the data, but the moment you apply a secondary dimension or custom filter — the moment you actually ask a useful question — sampling can engage.

Layer two: threshold suppression. When the user count for a specific dimension is small enough that individual users could potentially be identified, GA4 hides the row entirely. You don’t see a suppressed number. You see a blank. The scraper captures the blank.

Layer three: modelled conversions. For users who didn’t consent to tracking, GA4 uses machine learning to estimate what they probably did. It blends these modelled conversions with observed data in the same report — without distinguishing which is which. To activate modelling, a property needs at least 1,000 daily events with analytics storage denied for seven or more days. Small stores often can’t meet the threshold, so their numbers are purely observed. Mid-size stores get a mix. The report doesn’t tell you the ratio.

Layer four: attribution remapping. GA4’s data-driven attribution distributes credit across touchpoints using algorithmic weighting. But it requires at least 400 conversions per key event to activate. Below that threshold, GA4 silently falls back to last-click — without telling you in the interface. Many WooCommerce stores believe they’re running data-driven attribution when they’re actually running last-click with a data-driven label.

GA4 applies at least four interpretive layers between event capture and UI display — sampling, threshold suppression, modelled conversions, and attribution remapping — and none of these are disclosed at the row level in the report.

A scraper reading the GA4 UI inherits every one of these layers. It captures the platform’s interpretation, rendered as a number, with no metadata indicating which processing steps were applied to produce it.

What Streamed Data Preserves That Scraped Data Loses

Server-side streamed events in BigQuery contain the raw record — before any platform decided what to show you.

PropertyScraped (via Apify from GA4)Streamed (server-side to BigQuery)
SamplingApplied (5–30% error range)Not applied (100% of events)
Threshold suppressionRows hidden for privacyNo suppression (your data, your rules)
Modelled conversionsBlended with observed, ratio undisclosedNot applicable (real events only)
Attribution modelPlatform-assigned (may silently be last-click)Raw events with full touchpoint sequence preserved
Latency24–48 hours for standard reportsSeconds via BigQuery Streaming Insert API
Ad-blocked visitorsMissing (GA4 tag never fired)Captured (server-side, no browser dependency)
Consent-rejected visitorsModelled or missingCaptured with consent status as a parameter

The BigQuery Streaming Insert API costs approximately $0.01 per 200MB of inserted data and makes events queryable within seconds of insertion. For a WooCommerce store, the streamed dataset is the canonical record — every event, every parameter, every timestamp, exactly as it happened at the server.

Linking GA4 to BigQuery provides a partial bridge: the GA4 BigQuery export bypasses sampling, thresholding, and cardinality limits. But it still only contains events that GA4 captured in the first place — which excludes ad-blocked visitors, consent-rejected sessions, and any interaction where the GA4 JavaScript failed to load. Server-side streaming captures at the WordPress hook level, before the browser is involved.

The Side-by-Side Test That Settles It

Put both numbers in the same Live Artifact and the conversation stops being theoretical.

Here’s the thing. The theoretical argument about data quality has existed for years. Every analytics professional knows that GA4 samples, models, and remaps. What Live Artifacts changed is the visibility: when both numbers appear on the same screen, refreshed from live sources, the gap becomes something a store owner can see rather than something an analyst has to explain.

The test is straightforward. Build a Claude Live Artifact with two data sources: Apify scraping GA4’s e-commerce revenue report for yesterday, and the BigQuery MCP server querying the same store’s server-side events for the same date. Display both numbers. Every morning, the artifact repaints with current data. The gap will fluctuate — some days it’s 10%, some days it’s 30% — and the fluctuation itself tells a story about which processing layers affected that day’s data more or less.

You may be interested in: Your WooCommerce BigQuery Integration Is Missing 90% of Your Data

Over time, the pattern becomes clear: the scraped number is consistently lower. Not because GA4 under-counts — but because GA4 cannot count what it never received. Ad-blocked sessions, consent-rejected visitors, ITP-expiring cookies, and mobile browser limitations all reduce GA4’s intake. Server-side capture doesn’t have those constraints.

Which Truth Do You Want the Dashboard to Tell?

The dashboard layer is free — the only question left is which data source you wire it to.

The dashboard layer is now free. Claude Live Artifacts build from a prompt and refresh on open, at $20 per month. The competitive question has moved entirely from “can I build a dashboard” to “what data does my dashboard read from.” That’s a data-layer decision, not a tool decision.

For WooCommerce stores that need to monitor what GA4 and Meta Ads report — for budget reconciliation, for ad-platform optimization, for cross-platform comparison — the Apify workaround serves its purpose. Platform numbers are useful for managing the platform.

For stores that need to know what actually happened — how many visitors came, what they did, what they bought, and what brought them there — the data source has to be the warehouse. Not the warehouse populated by a scraped re-rendering of the platform UI, but the warehouse populated by events streamed from the server at the moment they occurred.

The dashboard layer is now free — Claude Live Artifacts build from a prompt and refresh on open — so the only competitive question left is which version of your data you’re feeding it.

Transmute Engine™ streams the raw events to BigQuery — every page view, add-to-cart, checkout step, and purchase, captured at the WordPress hook level with GA4-recommended naming. When Claude Desktop reads that dataset through the BigQuery MCP server, it reads the source data, not a scraped re-rendering of it. The data layer decision is which truth you want the dashboard to tell.

Key Takeaways

  • Scraped data reflects the platform’s interpretation: when you scrape GA4 or Meta Ads via Apify into a Live Artifact, the dashboard displays whatever the platform chose to show after applying sampling, modelling, threshold suppression, and attribution logic.
  • Streamed data reflects what actually happened: server-side events captured at the WooCommerce hook level and written to BigQuery contain every event, every parameter, and every timestamp — no interpretive layers, no UI rounding.
  • Live Artifacts made the gap visible: before April 20, 2026, scraped and streamed data rarely appeared in the same dashboard — now they can sit side by side, and the revenue discrepancy becomes impossible to ignore.
  • The Apify workaround is valid for platform monitoring: if you need to see what GA4 reports for budget management or platform optimization, scraping the UI gives you exactly that — just know it’s the platform’s opinion, not your raw events.
  • The competitive advantage is now the data layer: with dashboards free and authoring reduced to a prompt, the only remaining moat is the quality and completeness of the warehouse feeding the artifact.
What is the difference between scraped data and streamed data for a WooCommerce dashboard?

Scraped data is captured by reading a platform’s UI — for example, using Apify to pull numbers from the GA4 or Meta Ads interface. It contains whatever the platform decided to display after applying sampling, modelling, and attribution logic. Streamed data is captured at the server level as events happen and written directly to a warehouse like BigQuery. It contains the actual events before any platform interpreted them. The difference determines whether your dashboard shows a platform’s opinion or the raw truth.

Does the Apify workaround for Claude Live Artifacts produce accurate data?

It produces accurate scraped data — meaning it faithfully captures what GA4 or Meta Ads chose to display. But that displayed number already has sampling, modelled conversions, threshold suppression, and attribution-window logic baked in. If you need the actual events as they happened, the Apify scraper inherits every interpretive layer the platform applied. It works mechanically, but it doesn’t solve the data quality gap.

Why do GA4 and BigQuery show different numbers for the same WooCommerce store?

The GA4 UI applies sampling for high-volume properties, suppresses rows where user counts are too small, blends modelled conversions with observed data, and remaps attribution credit algorithmically. The GA4 BigQuery export bypasses all of those layers — it contains every event GA4 captured, unsampled and unthresholded. Server-side event capture in BigQuery goes further, recording events that GA4 never saw because the browser tag was blocked or rejected.

Which data source should I feed into a Claude Live Artifact?

If you want to see what GA4 or Meta Ads reports — for comparing platform narratives or monitoring ad-platform metrics — scrape or connect those UIs. If you want to see what actually happened on your store, feed the artifact from a BigQuery dataset populated by server-side event streaming. The best setup does both: platform numbers for context, warehouse events for truth, displayed side by side so the gap is visible.

References

  • Annika Helendi, “Claude Live Artifacts Are Actually Useful for Marketing Reporting (Kinda),” annikahelendi.substack.com, May 2026
  • Anthropic, “Live Artifacts in Claude Cowork,” claude.com, April 20, 2026
  • Mauro Romanella, “GA4 Data Quality: Sampling, Thresholding and Cardinality Explained,” mauroromanella.com, 2025
  • Plausible Analytics, “Consent Mode and How GA4 Fills Missing Data with Behavioral Modeling,” plausible.io, November 2025
  • 1ClickReport, “GA4 Attribution Report 2026: How to Read It Without Getting Misled,” 1clickreport.com, February 2026
  • Google Cloud, “BigQuery Streaming Insert API Pricing,” cloud.google.com, 2025
  • Seresa, “Every WordPress to BigQuery Tool Compared,” seresa.io, February 2026
  • Seresa, “Your WooCommerce BigQuery Integration Is Missing 90% of Your Data,” seresa.io, February 2026

The dashboard is free. The question is which version of your data it reads. If you want the version that skips the platform’s opinion and starts with the events themselves, talk to Seresa about streaming your WooCommerce events to BigQuery.