Before you ask Claude what your best-selling product is, ask it whether your data can be trusted. That’s not a metaphor. Running AI on unaudited data doesn’t produce insights. It produces confident misinformation. The good news: five checks, run once, before any business question gets asked — and every conversation after that is actually reliable.
Here’s the five-step pre-AI data quality audit every WooCommerce store should complete before touching their analytics with an AI tool.
Why Data Quality Matters More When AI Is Asking the Questions
There’s a dangerous version of the Claude + BigQuery setup. The store owner installs the tools, connects to their event data, and starts asking questions. Claude answers confidently. The numbers look plausible. Decisions get made. But the underlying data has gaps — duplicate events, missing fields, revenue that doesn’t reconcile — and every confident answer was built on shaky ground.
31.5% of web visitors block client-side tracking entirely (PageFair, 2023). That alone means your GA4 data is already incomplete before any technical issues enter the picture. Add in GTM misfires, theme updates that silently break hooks, and retry logic that double-fires events, and you’ve got a data set that looks complete but isn’t.
The audit isn’t optional. It’s the prerequisite. You wouldn’t ask your accountant to forecast revenue from a spreadsheet with rows missing. Don’t ask Claude to do it either.
Check 1: Revenue Reconciliation
This is the foundational check. Compare the total purchase revenue recorded in your BigQuery events table against the total completed order revenue from WooCommerce for the same date range — say, the last 90 days.
The benchmark: BigQuery revenue should be within 5% of WooCommerce orders. If it’s off by more than 5%, you have meaningful data gaps.
A 10% gap means one in ten purchases isn’t being captured. If you’re then asking Claude to find your highest-margin products or calculate true ROAS, that analysis is built on a dataset that’s missing a tenth of your business.
Common causes of revenue gaps: ad blockers intercepting client-side purchase events, GTM timing issues where the thank-you page loads before the event fires, and server-side tracking misconfigurations where purchase events don’t complete delivery to BigQuery.
Fix the gap before you ask the questions.
You may be interested in: What Does a Good WooCommerce Data Stack Look Like in 2026?
Check 2: Null Field Rate
A null field is a data point that arrived empty. Product ID, user ID, UTM source — fields that are supposed to be populated but aren’t.
The benchmark: a null rate above 5% in any key field signals a tracking problem.
This check runs against each critical column in your events table: product_id, session_id, user_id, page_location, source, medium. For each, you’re looking at what percentage of events arrived with that field blank.
A 15% null rate on product_id means one in six of your product view or add-to-cart events has no product attached. Ask Claude which products drive the most consideration — and it’s working with a data set that’s missing 15% of the consideration events. The ranking it returns isn’t wrong in an obvious way. It’s wrong in a quietly misleading way.
Null rates spike after theme updates, plugin changes, or WooCommerce version upgrades that alter hook behaviour. This is why the audit should be run quarterly, not just once.
Check 3: Duplicate Events
Duplicate events are the silent metric inflator. They don’t announce themselves. Your dashboard shows a purchase. And then it shows it again. Duplicates silently inflate every metric they touch — conversion counts, ROAS, revenue per session, product performance.
The check: query for event records with identical event_name, timestamp, and transaction_id. Any result with a count above 1 is a duplicate.
Common sources: both client-side and server-side tracking firing without a shared deduplication key; GTM tags with retry logic sending events twice after a slow page load; and BigQuery streaming insert retries that don’t check for existing records.
If you’re running a server-side tracking setup alongside a client-side layer (a valid configuration for redundancy), deduplication logic is not optional. Without it, every purchase is counted twice. Your ROAS looks half of what it actually is. Your ad platform sees double the conversion signal it should.
Check 4: Session Attribution
Attribution tells you where a customer came from before they bought. If session attribution is broken, you don’t know which channel is driving revenue — and every marketing budget decision you make is a guess.
The benchmark: what percentage of purchase events have a populated, valid UTM source? If more than 10% of purchases are attributing to “direct” or arriving with null attribution, there’s a problem.
Check for: UTM parameters being stripped by redirects (common with payment gateways), cross-domain tracking failures where the attribution gets lost between your main domain and a checkout subdomain, and first-click vs last-click cookie inconsistencies where the source field gets overwritten mid-session.
This check protects every AI conversation about marketing performance. If the attribution is broken, Claude’s answer to “which channel drives the most revenue?” is accurate for the data — and wrong for the business.
You may be interested in: The Intelligence Layer: BigQuery + Claude as a WooCommerce Co-Pilot
Check 5: Event Completeness Across the Funnel
The final check is a funnel sanity test. Your events should follow logical ratios: page views lead to product views, product views lead to add-to-cart, add-to-cart leads to checkout initiation, checkout initiation leads to purchase.
If your checkout_initiated count is higher than your add_to_cart count, events are firing out of order. If your purchase event count is higher than your checkout_initiated count, something is double-firing at the purchase step.
Funnel inversions — where a downstream event has a higher count than an upstream one — are a clear signal of tracking logic errors. They don’t always appear in GA4 dashboards because GA4 aggregates and samples. In BigQuery, at the raw event level, they’re visible.
This check also catches ghost events: purchase events firing on page refresh, add-to-cart events triggering on wishlist actions, or product views recording from internal admin sessions.
Running the Audit in One Claude Session
The most efficient way to run all five checks is a single Claude session connected to your BigQuery dataset via MCP (Model Context Protocol). You’re asking Claude to query your raw event data directly — not a dashboard, not an aggregated report, the actual events table.
Give Claude access to your BigQuery dataset and ask it to run the pre-AI data quality audit. A well-configured session can complete all five checks in under 20 minutes, flag which checks failed, and show you exactly which event types or date ranges have the issues.
The output isn’t a decision. It’s a data trust score. You finish the session knowing which of your data layers you can rely on and which need fixing before you ask business questions.
Where Transmute Engine Fits In
If your audit surfaces significant gaps — revenue discrepancies above 5%, null rates above 10%, duplicate purchase events — the root cause is almost always the tracking layer, not the data storage.
Transmute Engine™ is the server-side tracking infrastructure that closes these gaps at the source. It runs first-party on your subdomain (bypassing ad blockers entirely), delivers events to BigQuery via authenticated API rather than client-side JavaScript, and includes built-in deduplication logic so purchase events are never counted twice.
Stores running Transmute Engine as their primary event pipeline typically pass the revenue reconciliation check within 1–2% — not because the data is cleaned after the fact, but because the gaps never appear in the first place.
The Audit Is a One-Time Cost With a Permanent Payoff
Run the audit once before you start using AI on your analytics data. After that, it becomes a quarterly maintenance check — 20 minutes, five queries, a data trust score that tells you whether your AI conversations are operating on solid ground.
Every AI conversation that happens after a clean audit is actually trustworthy. That’s the payoff. Not smarter questions. Not better AI tools. Just data you can actually believe — and answers that are worth acting on.
It’s a five-check process you run on your BigQuery event data before using AI to query it: revenue reconciliation, null field rate check, deduplication check, session attribution validation, and event completeness across the funnel. It verifies your data is accurate enough to make business decisions on.
Run a revenue reconciliation query that compares total purchase revenue in your BigQuery events table against total completed order revenue in WooCommerce for the same date range. If the difference is more than 5%, you have meaningful data gaps that need fixing before using the data for decisions.
A null rate above 5% in key fields like product ID, user ID, or UTM source signals a tracking problem. If more than 5 in 100 events arrive with missing data in critical fields, the tracking implementation has gaps that need fixing.
Duplicates are common when both client-side and server-side tracking fire without a shared deduplication key. They also come from GTM retry logic, browser back-button behaviour, and BigQuery streaming insert retries. Without a dedup check, every metric built on that data is inflated.
Run it once before you start using AI to query your data, then quarterly — or any time you make significant changes to your tracking setup. A new plugin, a theme switch, a GTM change. Tracking implementations break silently and quarterly checks catch problems before they compound.
