Google BigQuery SQL: Extract Data from JSON (List and Array) into Columns
=============================================
In this article, we will explore how to extract data from a list and array in JSON format into separate columns using Google BigQuery SQL.
Background
BigQuery is a fully-managed enterprise data warehouse service by Google. It provides a powerful and flexible way to analyze large datasets. However, when working with JSON data, it can be challenging to extract specific fields or values from the list and array structures.
Problem Statement
The problem presented in the Stack Overflow question states that we have a table with a JSON string field, which contains a list of objects. We need to extract the individual object’s fields (e.g., id, value, os_type, etc.) into separate columns.
Solution Overview
To solve this problem, we will use BigQuery SQL functions such as json_extract_array and unnest. The steps involve:
- Extracting the JSON array: We’ll first extract the JSON array from the list using
json_extract_array. - Unnesting the array: Next, we’ll use
unnestto expand the array into individual rows. - Extracting scalar values: Finally, we’ll extract the scalar values (e.g.,
id,value, etc.) from each row usingjson_extract_scalar.
Solution Step-by-Step
Step 1: Extracting the JSON Array
SELECT iUserID, json_extract_array(json_string) as json_array
FROM `project.dataset.table`
This step extracts the JSON array from the list.
Step 2: Unnesting the Array
We’ll use unnest to expand the array into individual rows.
with `project.dataset.table` as (
select 100 UserID, '[{"id": 77379513, "value": "35.4566", "os_type": null, "amount": "200", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "same"}]' json_string union all
select 100, '[{"id": 77379514, "value": "38.658", "os_type": null, "amount": "100", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "niko"}]' union all
select 100, '[{"id": 77379515, "value": "40.569", "os_type": null, "amount": "150", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "koko"}]' union all
select 200, '[{"id": 77378899, "value": "25.365", "os_type": null, "amount": "100", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "same"}]' union all
select 200, '[{"id": 77378900, "value": "35.898", "os_type": null, "amount": "500", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "niko"}]' union all
select 200, '[{"id": 77378901, "value": "41.258", "os_type": null, "amount": "400", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "koko"}]'
)
This step creates a temporary view project.dataset.table and populates it with sample data.
Step 3: Extracting Scalar Values
We’ll use json_extract_scalar to extract the scalar values (e.g., id, value, etc.) from each row in the array.
SELECT
UserID,
json_extract_scalar(json, '$.id') as id,
json_extract_scalar(json, '$.value') as value,
json_extract_scalar(json, '$.os_type') as os_type,
json_extract_scalar(json, '$.amount') as amount,
json_extract_scalar(json, '$.created_at') as created_at,
json_extract_scalar(json, '$.updated_at') as updated_at,
json_extract_scalar(json, '$.Type_name') as Type_name
from `project.dataset.table`,
unnest(json_extract_array(replace(json_string, "'", '"'), '$')) json
Note: The replace function is used to replace single quotes with double quotes, which are required for the json_extract_array function.
This step extracts the scalar values from each row in the array.
Conclusion
In this article, we have demonstrated how to extract data from a list and array in JSON format into separate columns using Google BigQuery SQL. We used the json_extract_array, unnest, and json_extract_scalar functions to achieve this. The steps involve extracting the JSON array, unnesting the array, and extracting scalar values.
Example Use Cases
- Analyzing customer data in a list format.
- Extracting field values from a JSON string in a list format.
- Performing data cleaning and preprocessing tasks on large datasets.
Tips and Variations
- Make sure to replace single quotes with double quotes when using the
json_extract_arrayfunction. - Use the
replacefunction to fix issues with quote characters in your JSON string. - Experiment with different BigQuery SQL functions and techniques to improve data extraction and analysis.
Last modified on 2023-08-23