BigQuery

Google Cloud BigQuery is a fully managed, petabyte scale, low cost analytics data warehouse.

Spring Cloud GCP provides:

  • A convenience starter which provides autoconfiguration for the BigQuery client objects with credentials needed to interface with BigQuery.

  • A Spring Integration message handler for loading data into BigQuery tables in your Spring integration pipelines.

Maven coordinates, using Spring Cloud GCP BOM:

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-gcp-starter-bigquery</artifactId>
</dependency>

Gradle coordinates:

dependencies {
    implementation("org.springframework.cloud:spring-cloud-gcp-starter-bigquery")
}

Configuration

The following application properties may be configured with Spring Cloud GCP BigQuery libraries.

Name

Description

Required

Default value

spring.cloud.gcp.bigquery.datasetName

The BigQuery dataset that the BigQueryTemplate and BigQueryFileMessageHandler is scoped to.

Yes

spring.cloud.gcp.bigquery.enabled

Enables or disables Spring Cloud GCP BigQuery autoconfiguration.

No

true

spring.cloud.gcp.bigquery.project-id

GCP project ID of the project using BigQuery APIs, if different from the one in the Spring Cloud GCP Core Module.

No

Project ID is typically inferred from gcloud configuration.

spring.cloud.gcp.bigquery.credentials.location

Credentials file location for authenticating with the Google Cloud BigQuery APIs, if different from the ones in the Spring Cloud GCP Core Module

No

Inferred from Application Default Credentials, typically set by gcloud.

BigQuery Client Object

The GcpBigQueryAutoConfiguration class configures an instance of BigQuery for you by inferring your credentials and Project ID from the machine’s environment.

Example usage:

// BigQuery client object provided by our autoconfiguration.
@Autowired
BigQuery bigquery;

public void runQuery() throws InterruptedException {
  String query = "SELECT column FROM table;";
  QueryJobConfiguration queryConfig =
      QueryJobConfiguration.newBuilder(query).build();

  // Run the query using the BigQuery object
  for (FieldValueList row : bigquery.query(queryConfig).iterateAll()) {
    for (FieldValue val : row) {
      System.out.println(val);
    }
  }
}

This object is used to interface with all BigQuery services. For more information, see the BigQuery Client Library usage examples.

BigQueryTemplate

The BigQueryTemplate class is a wrapper over the BigQuery client object and makes it easier to load data into BigQuery tables. A BigQueryTemplate is scoped to a single dataset. The autoconfigured BigQueryTemplate instance will use the dataset provided through the property spring.cloud.gcp.bigquery.datasetName.

Below is a code snippet of how to load a CSV data InputStream to a BigQuery table.

// BigQuery client object provided by our autoconfiguration.
@Autowired
BigQueryTemplate bigQueryTemplate;

public void loadData(InputStream dataInputStream, String tableName) {
  ListenableFuture<Job> bigQueryJobFuture =
      bigQueryTemplate.writeDataToTable(
          tableName,
          dataFile.getInputStream(),
          FormatOptions.csv());

  // After the future is complete, the data is successfully loaded.
  Job job = bigQueryJobFuture.get();
}

Spring Integration

Spring Cloud GCP BigQuery also provides a Spring Integration message handler BigQueryFileMessageHandler. This is useful for incorporating BigQuery data loading operations in a Spring Integration pipeline.

Below is an example configuring a ServiceActivator bean using the BigQueryFileMessageHandler.

@Bean
public DirectChannel bigQueryWriteDataChannel() {
  return new DirectChannel();
}

@Bean
public DirectChannel bigQueryJobReplyChannel() {
  return new DirectChannel();
}

@Bean
@ServiceActivator(inputChannel = "bigQueryWriteDataChannel")
public MessageHandler messageSender(BigQueryTemplate bigQueryTemplate) {
  BigQueryFileMessageHandler messageHandler = new BigQueryFileMessageHandler(bigQueryTemplate);
  messageHandler.setFormatOptions(FormatOptions.csv());
  messageHandler.setOutputChannel(bigQueryJobReplyChannel());
  return messageHandler;
}

BigQuery Message Handling

The BigQueryFileMessageHandler accepts the following message payload types for loading into BigQuery: java.io.File, byte[], org.springframework.core.io.Resource, and java.io.InputStream. The message payload will be streamed and written to the BigQuery table you specify.

By default, the BigQueryFileMessageHandler is configured to read the headers of the messages it receives to determine how to load the data. The headers are specified by the class BigQuerySpringMessageHeaders and summarized below.

Header

Description

BigQuerySpringMessageHeaders.TABLE_NAME

Specifies the BigQuery table within your dataset to write to.

BigQuerySpringMessageHeaders.FORMAT_OPTIONS

Describes the data format of your data to load (i.e. CSV, JSON, etc.).

Alternatively, you may omit these headers and explicitly set the table name or format options by calling setTableName(…​) and setFormatOptions(…​).

BigQuery Message Reply

After the BigQueryFileMessageHandler processes a message to load data to your BigQuery table, it will respond with a Job on the reply channel. The Job object provides metadata and information about the load file operation.

By default, the BigQueryFileMessageHandler is run in asynchronous mode, with setSync(false), and it will reply with a ListenableFuture<Job> on the reply channel. The future is tied to the status of the data loading job and will complete when the job completes.

If the handler is run in synchronous mode with setSync(true), then the handler will block on the completion of the loading job and block until it is complete.

If you decide to use Spring Integration Gateways and you wish to receive ListenableFuture<Job> as a reply object in the Gateway, you will have to call .setAsyncExecutor(null) on your GatewayProxyFactoryBean. This is needed to indicate that you wish to reply on the built-in async support rather than rely on async handling of the gateway.

Sample

A BigQuery sample application is available.