Partitioning Tables in BigQuery

If you use BigQuery, especially for large amounts of data, you can optimize your work by partitioning your tables - that is, dividing them into smaller segments. This helps to improve query performance and reduce costs because each query would then read a lesser number of bytes.

BigQuery supports three types of partitioning:

Integer range partitioning: a table is divided based on ranges of values in a specific INTEGER column.
Time-unit partitioning: a table is partitioned on a DATE, TIMESTAMP, or DATETIME column.
Ingestion time partitioning: when a table partitioned by ingestion time is created, BigQuery automatically assigns rows to partitions based on the time it ingested the data.

One important limitation is that BigQuery only supports partitioning by a single column.

In this article, we’ll describe two ways to convert a non-partitioned table into a partitioned one: for integer range and time-unit partitioning and for ingestion time partitioning.

Creating an integer range or time-unit partitioned table

If you want to partition a table based on the integer range or time unit, the first step is pausing the connector in your dashboard.

When the connector is paused, go to the BiqQuery console and execute one of the following SQL statements, depending on the column you want to partition your table on:

INTEGER:

CREATE TABLE Bschema-name].copy
PARTITION BY range_bucket(kinteger-column], GENERATE_ARRAY(Rstart],rend],ninterval])) 
AS SELECT * FROM Rschema-name].mtable-name];
DROP TABLE Bschema-name].mtable-name];

DATE:

CREATE TABLE Bschema-name].copy
PARTITION BY  date-column]
AS SELECT * FROM Rschema-name].mtable-name];
DROP TABLE Bschema-name].mtable-name];

TIMESTAMP:

CREATE TABLE Bschema-name].copy
PARTITION BY date(atimestamp-column])
AS SELECT * FROM Rschema-name].mtable-name];
DROP TABLE Bschema-name].mtable-name];

DATETIME:

CREATE TABLE Bschema-name].copy
PARTITION BY  datetime-column]
AS SELECT * FROM Rschema-name].mtable-name];
DROP TABLE Bschema-name].mtable-name];

As a result, a partitioned table will be created and populated with data from the original table, which will then be dropped.

Next, open the BigQuery user interface in the console, go to the navigation bar, and select your project:

Next, select the dataset. In the dataset, select the copy table created before:

Then, rename it to the original table’s name and click ‘Copy':

In the ‘Table name’ field, enter the original table’s name and click ‘Copy’:

Finally, drop the copy table by running the following statement in the BigQuery console:

DROP TABLE eschema-name].copy;

Almost done! Now go to your dashboard and enable the connector again.

Creating an ingestion time partitioned table

To partition a table based on ingestion time, go to the BigQuery user interface in the console, and select your project in the navigation bar:

Next, find the dataset to be used as the destination or create a new dataset. Don’t forget to grant your service account the bigquery.dataOwner access for this dataset so it could create, delete, and update tables in it.

Next, click ‘+ Create table’ and retain the default values for the ‘Source’, ‘Destination’, and ‘Schema’ sections. In the ‘Table name’ field, enter the same table name as in your source.

Next, go to ‘Partition and cluster settings', select ‘Partition by ingestion time’, and click ‘Create table’:

That’s it, you’re all set!

Be the first to reply!

Creating an integer range or time-unit partitioned table

Creating an ingestion time partitioned table

Reply

Sign up

CData Community

Scanning file for viruses.

This file cannot be downloaded