Extracting Data from JSON Lists and Arrays in Google BigQuery SQL

Google BigQuery SQL: Extract Data from JSON (List and Array) into Columns

=============================================

In this article, we will explore how to extract data from a list and array in JSON format into separate columns using Google BigQuery SQL.

Background


BigQuery is a fully-managed enterprise data warehouse service by Google. It provides a powerful and flexible way to analyze large datasets. However, when working with JSON data, it can be challenging to extract specific fields or values from the list and array structures.

Problem Statement


The problem presented in the Stack Overflow question states that we have a table with a JSON string field, which contains a list of objects. We need to extract the individual object’s fields (e.g., id, value, os_type, etc.) into separate columns.

Solution Overview


To solve this problem, we will use BigQuery SQL functions such as json_extract_array and unnest. The steps involve:

  1. Extracting the JSON array: We’ll first extract the JSON array from the list using json_extract_array.
  2. Unnesting the array: Next, we’ll use unnest to expand the array into individual rows.
  3. Extracting scalar values: Finally, we’ll extract the scalar values (e.g., id, value, etc.) from each row using json_extract_scalar.

Solution Step-by-Step


Step 1: Extracting the JSON Array

SELECT iUserID, json_extract_array(json_string) as json_array
FROM `project.dataset.table`

This step extracts the JSON array from the list.

Step 2: Unnesting the Array

We’ll use unnest to expand the array into individual rows.

with `project.dataset.table` as (
    select 100 UserID, '[{"id": 77379513, "value": "35.4566", "os_type": null, "amount": "200", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "same"}]' json_string union all
    select 100, '[{"id": 77379514, "value": "38.658", "os_type": null, "amount": "100", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "niko"}]' union all
    select 100, '[{"id": 77379515, "value": "40.569", "os_type": null, "amount": "150", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "koko"}]' union all
    select 200, '[{"id": 77378899, "value": "25.365", "os_type": null, "amount": "100", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "same"}]' union all
    select 200, '[{"id": 77378900, "value": "35.898", "os_type": null, "amount": "500", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "niko"}]' union all
    select 200, '[{"id": 77378901, "value": "41.258", "os_type": null, "amount": "400", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "koko"}]' 
)

This step creates a temporary view project.dataset.table and populates it with sample data.

Step 3: Extracting Scalar Values

We’ll use json_extract_scalar to extract the scalar values (e.g., id, value, etc.) from each row in the array.

SELECT 
    UserID, 
    json_extract_scalar(json, '$.id') as id,
    json_extract_scalar(json, '$.value') as value,
    json_extract_scalar(json, '$.os_type') as os_type,
    json_extract_scalar(json, '$.amount') as amount,
    json_extract_scalar(json, '$.created_at') as created_at,
    json_extract_scalar(json, '$.updated_at') as updated_at,
    json_extract_scalar(json, '$.Type_name') as Type_name
from `project.dataset.table`,
unnest(json_extract_array(replace(json_string, "'", '"'), '$')) json 

Note: The replace function is used to replace single quotes with double quotes, which are required for the json_extract_array function.

This step extracts the scalar values from each row in the array.

Conclusion


In this article, we have demonstrated how to extract data from a list and array in JSON format into separate columns using Google BigQuery SQL. We used the json_extract_array, unnest, and json_extract_scalar functions to achieve this. The steps involve extracting the JSON array, unnesting the array, and extracting scalar values.

Example Use Cases


  • Analyzing customer data in a list format.
  • Extracting field values from a JSON string in a list format.
  • Performing data cleaning and preprocessing tasks on large datasets.

Tips and Variations


  • Make sure to replace single quotes with double quotes when using the json_extract_array function.
  • Use the replace function to fix issues with quote characters in your JSON string.
  • Experiment with different BigQuery SQL functions and techniques to improve data extraction and analysis.

Last modified on 2023-08-23