6. JdbcHdfs Task

This task exports data from a Jdbc table(s) into CSV files on an HDFS using a single batch job. When launching the task, you must either supply the select statement by setting the sql parameter, or you can supply both tableName and columns options (which will be used to build the SQL statement).

6.1 Options

The jdbchdfs task has the following options:

jdbchdfs.check-column
The name of the column used to determine if the data should be read. (String, default: <none>)
jdbchdfs.column-names
The name of the columns to be queried. (String, default: <none>)
jdbchdfs.commit-interval
The commit interval for the application. (Integer, default: 1000)
jdbchdfs.datasource.driver-class-name
The driver of the datasource that will be used by jdbhdfs app to retrieve table input. (String, default: <none>)
jdbchdfs.datasource.password
The password of the datasource that will be used by jdbhdfs app to retrieve table input. (String, default: <none>)
jdbchdfs.datasource.url
The url of the datasource that will be used by jdbhdfs app to retrieve table input. (String, default: <none>)
jdbchdfs.datasource.username
The username of the datasource that will be used by jdbhdfs app to retrieve table input. (String, default: <none>)
jdbchdfs.delimiter
The delimiter used to split the columns of data in the output file. (String, default: ,)
jdbchdfs.directory
The directory the files are to be written. (String, default: /data)
jdbchdfs.file-extension
The extension that will be applied to the files. (String, default: csv)
jdbchdfs.file-name
The name of the file to be written to the file system. (String, default: jdbchdfs)
jdbchdfs.fs-uri
The URI to the hadoop file system. (String, default: <none>)
jdbchdfs.max-workers
Maximum number of concurrent workers. (Integer, default: 2)
jdbchdfs.name-node-principal
The name node principal to be used. (String, default: <none>)
jdbchdfs.partition-column
The name of the column used to partition the data. (String, default: <none>)
jdbchdfs.partitions
The number of partitions to be created. (Integer, default: 4)
jdbchdfs.properties-location
The properties location to be used. (String, default: <none>)
jdbchdfs.register-url-handler
The register url handler to be used. (String, default: <none>)
jdbchdfs.restartable
Is the batch job restartable. (Boolean, default: false)
jdbchdfs.rm-manager-principal
The rm manager principal to be used. (String, default: <none>)
jdbchdfs.rollover
The number of bytes to be written before file rolls over. (Long, default: 1000000000)
jdbchdfs.security-method
The security method to be used. (String, default: <none>)
jdbchdfs.sql
Sql to be used to retrieve the data. (String, default: <none>)
jdbchdfs.table-name
The name of the table to be queried. (String, default: <none>)
jdbchdfs.user-key-tab
The user key tab to be used. (String, default: <none>)
jdbchdfs.user-principal
The user principal to be used. (String, default: <none>)
[Note]Note

If the jdbchdfs.datasource properties are not set the application will use the spring.datasource properties as the settings for the datasource that retrieves table data.

6.2 Building with Maven

$ ./mvnw clean install -PgenerateApps
$ cd apps/jdbchdfs-local-task
$ ./mvnw clean package

6.3 Example

The following example reads the FOO table and writes files to the local file system based on the partition column specified in the parameters. The jdbchdfs.datasource.* refers to the datastore that contains the data to be migrated, while the spring.datasource.* refers to the datastore that will hold the task and batch information.

java -jar target/jdbchdfs-local-task-{version}.jar \
--server.port=0 \
--jdbchdfs.directory=/data/jdbchdfstask2/ \
--jdbchdfs.tableName=FOO \
--jdbchdfs.partitionColumn=id \
--jdbchdfs.columnNames=PROPKEY,PROPVALUE \
--spring.application.name=helloWorld \
--jdbchdfs.datasource.url=jdbc:postgresql://localhost:5432/mydb \
--jdbchdfs.datasource.username=myusername  \
--jdbchdfs.datasource.driverClassName=org.postgresql.Driver \
--jdbchdfs.datasource.password=mypassword \
--spring.datasource.url=jdbc:mariadb://localhost:3306/practice \
--spring.datasource.username=myusername \
--spring.datasource.password=mypassword \
--spring.datasource.driverClassName=org.mariadb.jdbc.Driver \
spring.profiles.active=master