If you are using GA4 export with BigQuery, one of the most common issues you will encounter is that your GA4 traffic source data does not match BigQuery’s source_session data.
This discrepancy typically arises due to the incorrect attribution of event traffic data between GA4 and Google BigQuery.
The issue occurs when Google Ad search events and sessions are mistakenly classified as organic or direct traffic. This happens when a page view event is generated from a Google search ad and the event includes a “gclid parameter” in the page URL. Unfortunately, this event is still attributed to organic or direct traffic.
This misattribution can significantly distort the accuracy of your data when querying event or session-level traffic acquisition fields, such as channel grouping, source, medium, and campaign.
This issue has been known for some time, yet Google hasn’t provided a fix.
So, what can you do?
One approach is to create a workaround to enhance the accuracy of your session-based traffic source acquisition queries for dimensions like source/medium, campaign, and default channel grouping.
Begin by extracting traffic source information from events and aggregating it at the session level, focusing on user_pseudo_id and ga_session_id.
Identify events originating from paid search clicks by checking if the ‘gclid’ parameter is present in the page_location event parameter. Once you have this data, you can apply a fix to overwrite the source, medium, and campaign fields when ‘has_gclid’ is true.
This ensures that sessions previously misattributed to organic search are correctly assigned to paid search.
The fixed attribution can now be used in various queries. You can count sessions by source/medium, and if necessary, further analyze campaigns and default channel grouping to gain strategic insights from your GA4 data.
This approach helps you work around the misattribution issue, providing more accurate insights into your data.