BigQuery Archives - Web Data Geek https://webdatageek.com/tag/bigquery/ Web Analytics | GA4 & BigQuery Thu, 29 Feb 2024 16:57:46 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.2 https://i0.wp.com/webdatageek.com/wp-content/uploads/2020/10/DALL·E-2023-03-10-20.33.29-create-circle-logo-using-_web-data-geek_-text-1.png?fit=32%2C4&ssl=1 BigQuery Archives - Web Data Geek https://webdatageek.com/tag/bigquery/ 32 32 194758041 Impact of iOS 17+ Updates on Digital Advertising and the Importance of UTM Parameters https://webdatageek.com/impact-of-ios-17-updates-on-digital-advertising-and-the-importance-of-utm-parameters/ https://webdatageek.com/impact-of-ios-17-updates-on-digital-advertising-and-the-importance-of-utm-parameters/#respond Thu, 29 Feb 2024 16:57:45 +0000 https://webdatageek.com/?p=1815 In the digital advertising landscape, the recent iOS 17+ updates have introduced significant changes, particularly affecting how traffic from Google and Meta Ads is tracked and reported. Apple’s initiative to enhance user privacy through its ‘Link Tracking Protection’ feature, present in the latest update, has created a new challenge for digital marketers relying on GA4 […]

The post Impact of iOS 17+ Updates on Digital Advertising and the Importance of UTM Parameters appeared first on Web Data Geek.

]]>
In the digital advertising landscape, the recent iOS 17+ updates have introduced significant changes, particularly affecting how traffic from Google and Meta Ads is tracked and reported. Apple’s initiative to enhance user privacy through its ‘Link Tracking Protection’ feature, present in the latest update, has created a new challenge for digital marketers relying on GA4 reports for analytics and insights.

Understanding ‘Link Tracking Protection’

Apple’s iOS 17+ updates have rolled out a feature known as ‘Link Tracking Protection’. This feature operates by removing the ‘gclid’ (Google Click Identifier) and ‘fbclid’ (Facebook Click Identifier) tracking parameters when a user navigates via Safari in private mode, or uses Apple’s native Mail app or Messages. These parameters are crucial for tracking the source of web traffic, and their removal can lead to misreported data.

Consequences for Google and Meta Ads

Without these identifiers, traffic originating from Google and Meta ads may be inaccurately categorized as direct or organic in GA4 reports. This misclassification is particularly troubling for Google Ads advertisers. While Meta advertisers might not feel a significant impact due to the prevalence of the native Facebook app among users, the updates pose a direct challenge to Google’s ability to track ad performance.

The Google Ads Dilemma

For Google Ads, these updates disable the auto-tagging feature when users click on ads in Safari’s private mode. Auto-tagging is vital for Google as it gathers extensive information to optimize ad performance. Losing this data means losing critical insights into user behavior and ad effectiveness.

The Solution: UTM Parameters and Server-Side Tagging

To counteract these tracking limitations, digital marketers must adapt. One effective strategy is the use of UTM parameters in ad URLs. Unlike ‘gclid’ and ‘fbclid’, iOS 17+ updates do not strip away UTM parameters, making them a reliable alternative for tracking ad performance in GA4 reports.

Furthermore, considering server-side tagging can offer a robust solution. This method mitigates tracking limitations imposed by browser and device restrictions, providing a more controlled and reliable data collection mechanism.

Navigating the New Landscape

The iOS 17+ updates underscore the evolving nature of digital advertising and the importance of flexibility in marketing strategies. Advertisers and marketers must stay informed about these changes and adapt their tactics accordingly to ensure accurate tracking and reporting. By leveraging UTM parameters and exploring server-side tagging, businesses can continue to gain valuable insights from their digital advertising efforts, despite the challenges posed by the latest iOS updates.

The post Impact of iOS 17+ Updates on Digital Advertising and the Importance of UTM Parameters appeared first on Web Data Geek.

]]>
https://webdatageek.com/impact-of-ios-17-updates-on-digital-advertising-and-the-importance-of-utm-parameters/feed/ 0 1815
GA4 and BigQuery Introduction https://webdatageek.com/introduction-to-ga4-and-bigquery/ https://webdatageek.com/introduction-to-ga4-and-bigquery/#respond Fri, 24 Mar 2023 22:03:59 +0000 https://webdatageek.com/?p=1710 Discover the power of GA4 and BigQuery integration to unlock advanced analytics capabilities. Learn how combining Google Analytics 4 with BigQuery allows for in-depth data analysis, custom reporting, and data-driven decision-making Google has introduced a groundbreaking new approach to measuring app and web analytics with Google Analytics 4 (GA4). Though most of us at the […]

The post GA4 and BigQuery Introduction appeared first on Web Data Geek.

]]>
Discover the power of GA4 and BigQuery integration to unlock advanced analytics capabilities. Learn how combining Google Analytics 4 with BigQuery allows for in-depth data analysis, custom reporting, and data-driven decision-making

Google has introduced a groundbreaking new approach to measuring app and web analytics with Google Analytics 4 (GA4). Though most of us at the time of writing this are dreading the move from Universal Analytics to GA4, this innovation marks a significant shift for both app and web analytics.

It’s not all bad news though and one of the most exciting features in GA4 is undoubtedly BigQuery integration. All GA4 property owners can now enable data export to BigQuery and leverage the raw event data collected from their websites and apps.

In the previous version of Google Analytics (Universal Analytics), this integration was exclusive to GA360 enterprise properties. However, GA4 makes data export available for free to everyone; you only pay for actual data storage and querying if you exceed Google Cloud’s free tier limits. Your credit card will only be charged after 1 TB of querying per month and 10 GB of storage.

You can also use the BigQuery sandbox environment without a credit card, but be aware that your data tables may expire after 60 days.

Why should you enable BigQuery linking for GA4?

Some reasons include:

  • Store your data in BigQuery (Google Cloud) and/or send it to your data warehouse in other clouds, such as AWS, Azure, or Snowflake
  • Combine and enrich your data with other marketing/CRM/contextual data
  • Visualize your data using tools like Data Studio, Tableau, Looker, or PowerBI
  • Perform advanced data analysis
  • Use your data as input for (machine learning) models

Don’t waste any time; start sending data immediately, as there’s no backfill for historical data already collected in GA4.

Follow these steps to link Google Analytics 4 to BigQuery

How to Set Up BigQuery Linking in Your Google Analytics 4 Property (GA4) There’s no backfill, so start collecting data now. Learn how to set up BigQuery export from Google Analytics 4 (GA4).

To set up GA4 and BigQuery integration from scratch, follow these step-by-step instructions:

Set up a Google Cloud Platform (GCP) account:

  1. Go to https://console.cloud.google.com/ and sign in with your Google account.
  2. Create a new project by clicking the “Select a project” dropdown menu in the top-right corner, then click “New Project.”
  3. Enter a project name, select a billing account, and click “Create.”

Enable BigQuery API:

  1. From the GCP console, click the hamburger menu in the top-left corner and select “APIs & Services” > “Library.”
  2. Search for “BigQuery API” and click on the result.
  3. Click “Enable” to activate the BigQuery API for your project.

Set up a BigQuery dataset:

  1. In the GCP console, navigate to “BigQuery” from the left-hand menu.
  2. Click on your project name in the left sidebar, then click “Create Dataset.”
  3. Enter a dataset ID (e.g., “ga4_data”), select a data location, and configure other settings as needed. Click “Create dataset.”
Create Dataset BigQuery

Set up Google Analytics 4 (GA4) property:

  1. Go to https://analytics.google.com/ and sign in with your Google account.
  2. If you don’t have a GA4 property yet, create one by following the on-screen instructions.
  3. Once your GA4 property is created, navigate to the property’s “Admin” panel by clicking the gear icon in the bottom-left corner.
GA4 Account setup

Link GA4 to BigQuery:

  1. In the GA4 “Admin” panel, click “Data Streams” under the “Data” column.
  2. Select the data stream you want to link (e.g., web or app) and click the “Link to BigQuery” button.
  3. Choose the GCP project and dataset you created earlier, then click “Next.”
  4. Choose “Daily” or “Streaming” export frequency and click “Next.”
  5. Review the linking settings and click “Submit” to start the integration.
Link-BigQuery-in-GA4

Check the data in BigQuery:

After linking GA4 to BigQuery, it may take a few hours for the data to appear in BigQuery. Once the data starts flowing, you can query it in the BigQuery console

  • Go to the GCP console and navigate to “BigQuery” from the left-hand menu.
  • Locate your dataset (e.g., “ga4_data”) in the left sidebar and click on it to view the tables.
  • Click on a table to preview the data or click “Query Table” to write and execute SQL queries.
Check Data in BigQuery

Why you should set-up GA4 and BigQuery now?

Remember that GA4 to BigQuery integration doesn’t provide historical data backfill. Once the integration is set up, only new data will be exported to BigQuery.
Now that you’ve set up GA4 to BigQuery integration, you can start analyzing your raw GA4 data, join it with other datasets, create visualizations with tools like Data Studio, and apply advanced analytics or machine learning models to your data.

We hope this has given you a good Introduction to GA4 and BigQuery and why you should set-up GA4 to BigQuery now?

The post GA4 and BigQuery Introduction appeared first on Web Data Geek.

]]>
https://webdatageek.com/introduction-to-ga4-and-bigquery/feed/ 0 1710
Querying GA4 Session Last Non-direct Traffic Source in BigQuery https://webdatageek.com/querying-ga4-session-last-non-direct-traffic-source-in-bigquery/ https://webdatageek.com/querying-ga4-session-last-non-direct-traffic-source-in-bigquery/#respond Fri, 24 Mar 2023 20:17:21 +0000 https://webdatageek.com/?p=1690 Hello! I recently found a solution to one of GA4 BigQuery export’s limitations – the lack of session-level traffic source data. I will share how I recreated GA4 session traffic source dimensions using BigQuery export event data, resulting in a lookup table containing the last non-direct source of traffic for each unique session id. The […]

The post Querying GA4 Session Last Non-direct Traffic Source in BigQuery appeared first on Web Data Geek.

]]>
Hello! I recently found a solution to one of GA4 BigQuery export’s limitations – the lack of session-level traffic source data. I will share how I recreated GA4 session traffic source dimensions using BigQuery export event data, resulting in a lookup table containing the last non-direct source of traffic for each unique session id.

The Basics: Traffic Source Data in BigQuery

GA4 export can be confusing, especially with the traffic_source parameters. Remember that traffic_source refers to the user-level first-touch traffic source, not the session’s traffic source. Individual events in BigQuery can contain event parameters like source, medium, and campaign. These are nested within the event_params field of the event.

Defining Session Traffic Source in GA4

In Universal Analytics, traffic source dimensions were straightforward. However, GA4 is different – a change in the source of traffic no longer triggers a new session. GA4 reports the session source and medium by tracking the first event’s traffic source as the source for the entire session, excluding automatically generated first_visit and session_start events.

Last Non-direct Attribution in GA4

GA4’s approach for last non-direct attribution, or Cross-channel last click, is slightly different. It retrieves data from the last event containing traffic source details, rather than looking at the first event of each preceding session.

Query Process

I split the process into two SQL queries:

  1. Create a table with the first and last traffic source per session.
  2. Identify the last non-direct source of traffic if the current session had no traffic source data.

This helps make the queries less expensive.

Querying the Session Traffic Source

The first SQL query goes through the events and returns the first and last traffic source per unique session id. The values will be null if the traffic source data doesn’t exist. Save the query’s results in a new table and reference it in the next query instead of rerunning the entire thing with all event data.

Identifying the Last Non-direct Traffic Source

The second query uses the results of the first query to fill in the last non-direct traffic source if the session had no traffic source data.

Now you can recreate the GA4 session traffic source dimensions using BigQuery export event data and obtain the last non-direct source of traffic for each unique session id. I’ve added an example of this code below for you to copy and adapt. Happy querying!

GA4 BigQuery Example SQL

Query the Session Traffic Source

The first SQL query iterates through the events, returning the first and last traffic source for each unique session ID. If traffic source data is unavailable, the values will be null.

To avoid rerunning the entire query with all event data, save the results in a new table and reference it in the subsequent query.

Here’s the optimized session traffic source query:

-- Extract event data for session traffic source details
WITH events AS (
  SELECT
    CAST(event_date AS DATE FORMAT 'YYYYMMDD') AS date,
    CONCAT(user_pseudo_id, (SELECT value.int_value FROM unnest(event_params) WHERE key = 'ga_session_id')) AS session_id,
    user_pseudo_id,
    (SELECT value.int_value FROM unnest(event_params) WHERE key = 'ga_session_id') AS session_start,
    (SELECT value.string_value FROM unnest(event_params) WHERE key = 'source') AS source,
    (SELECT value.string_value FROM unnest(event_params) WHERE key = 'medium') AS medium,
    (SELECT value.string_value FROM unnest(event_params) WHERE key = 'campaign') AS campaign,
    event_timestamp
  FROM
    `<project>.<dataset>.events_*`
  WHERE
    (_table_suffix >= '<start date>' AND _table_suffix <= '<end date>')
    AND event_name NOT IN ('session_start', 'first_visit')
)
SELECT
  date,
  session_id,
  user_pseudo_id,
  session_start,
  FIRST_VALUE(
    IF(
      COALESCE(source, medium, campaign) IS NOT NULL,
      (SELECT AS STRUCT source, medium, campaign),
      NULL
    )
  ) OVER(
    PARTITION BY session_id
    ORDER BY event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS session_first_traffic_source,
  LAST_VALUE(
    IF(
      COALESCE(source, medium, campaign) IS NOT NULL,
      (SELECT AS STRUCT source, medium, campaign),
      NULL
    ) IGNORE NULLS
  ) OVER(
    PARTITION BY session_id
    ORDER BY event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS session_last_traffic_source,
  ROW_NUMBER() OVER(
    PARTITION BY session_id
    ORDER BY event_timestamp
  ) AS session_event_number
FROM
  events
QUALIFY session_event_number = 1
-- Select only one row per session

This query extracts the necessary event data for session traffic source details, resulting in an optimized and efficient session traffic source query.

Querying the Last Non-direct Traffic Source

The second SQL query retrieves session-level data from the previously created table and fills in missing traffic sources with the last non-direct source. If session_first_traffic_source doesn’t contain any traffic source data, the query searches for the same user’s previous non-null value for session_last_traffic_source.

Here’s the optimized last non-direct traffic source query:

SELECT
  date,
  session_id,
  user_pseudo_id,
  session_start,
  session_first_traffic_source,
  IFNULL(
    session_first_traffic_source,
    LAST_VALUE(session_last_traffic_source IGNORE NULLS) OVER(
      PARTITION BY user_pseudo_id
      ORDER BY session_start RANGE BETWEEN 7776000 PRECEDING AND CURRENT ROW -- 90 day lookback
    )
  ) AS session_traffic_source_last_non_direct
FROM
  `<project>.<dataset>.<session traffic source table>`

This query has a 90-day lookback (7776000 seconds), which you can adjust by changing the number of days specified in seconds. The window order and range are based on session_start, corresponding to the ga_session_id parameter value. Ga_session_id is a Unix timestamp assigned at the beginning of a session.

Keep in mind that a larger lookback window can find traffic sources further back in time but will also increase the query’s cost.

Further Considerations

The query performed well on my test dataset, and the base logic seems accurate. However, there were a few common issues absent from the dataset I used.

Google Ads and GCLID Bug

GA4 has some challenges when it comes to tracking Google Ads traffic with auto-tagging enabled. Auto-tagging utilizes the GCLID parameter to link website visits to their respective campaigns. In theory, this eliminates the need for the standard utm tags.

Unfortunately, this doesn’t always function as expected with BigQuery exports. In BigQuery, sessions that begin with an event containing the GCLID parameter may either lack source and medium details or display incorrect information such as “google / organic” or “youtube.com / referral” for their source medium.

The post Querying GA4 Session Last Non-direct Traffic Source in BigQuery appeared first on Web Data Geek.

]]>
https://webdatageek.com/querying-ga4-session-last-non-direct-traffic-source-in-bigquery/feed/ 0 1690