6. JdbcHdfs Task

6. JdbcHdfs Task
Prev	Part II. Tasks	Next

This task exports data from a Jdbc table(s) into CSV files on an HDFS using a single batch job. When launching the task, you must either supply the select statement by setting the sql parameter, or you can supply both tableName and columns options (which will be used to build the SQL statement).

6.1 Options

The jdbchdfs task has the following options:

jdbchdfs.check-column: The name of the column used to determine if the data should be read. (String, default: <none>)
jdbchdfs.column-names: The name of the columns to be queried. (String, default: <none>)
jdbchdfs.commit-interval: The commit interval for the application. (Integer, default: 1000)
jdbchdfs.datasource.driver-class-name: The driver of the datasource that will be used by jdbhdfs app to retrieve table input. (String, default: <none>)
jdbchdfs.datasource.password: The password of the datasource that will be used by jdbhdfs app to retrieve table input. (String, default: <none>)
jdbchdfs.datasource.url: The url of the datasource that will be used by jdbhdfs app to retrieve table input. (String, default: <none>)
jdbchdfs.datasource.username: The username of the datasource that will be used by jdbhdfs app to retrieve table input. (String, default: <none>)
jdbchdfs.delimiter: The delimiter used to split the columns of data in the output file. (String, default: ,)
jdbchdfs.directory: The directory the files are to be written. (String, default: /data)
jdbchdfs.file-extension: The extension that will be applied to the files. (String, default: csv)
jdbchdfs.file-name: The name of the file to be written to the file system. (String, default: jdbchdfs)
jdbchdfs.fs-uri: The URI to the hadoop file system. (String, default: <none>)
jdbchdfs.max-workers: Maximum number of concurrent workers. (Integer, default: 2)
jdbchdfs.name-node-principal: The name node principal to be used. (String, default: <none>)
jdbchdfs.partition-column: The name of the column used to partition the data. (String, default: <none>)
jdbchdfs.partitions: The number of partitions to be created. (Integer, default: 4)
jdbchdfs.properties-location: The properties location to be used. (String, default: <none>)
jdbchdfs.register-url-handler: The register url handler to be used. (String, default: <none>)
jdbchdfs.restartable: Is the batch job restartable. (Boolean, default: false)
jdbchdfs.rm-manager-principal: The rm manager principal to be used. (String, default: <none>)
jdbchdfs.rollover: The number of bytes to be written before file rolls over. (Long, default: 1000000000)
jdbchdfs.security-method: The security method to be used. (String, default: <none>)
jdbchdfs.sql: Sql to be used to retrieve the data. (String, default: <none>)
jdbchdfs.table-name: The name of the table to be queried. (String, default: <none>)
jdbchdfs.user-key-tab: The user key tab to be used. (String, default: <none>)
jdbchdfs.user-principal: The user principal to be used. (String, default: <none>)

	Note
	If the jdbchdfs.datasource properties are not set the application will use the spring.datasource properties as the settings for the datasource that retrieves table data.

6.2 Building with Maven

$ ./mvnw clean install -PgenerateApps
$ cd apps/jdbchdfs-local-task
$ ./mvnw clean package

6.3 Example

The following example reads the FOO table and writes files to the local file system based on the partition column specified in the parameters. The jdbchdfs.datasource.* refers to the datastore that contains the data to be migrated, while the spring.datasource.* refers to the datastore that will hold the task and batch information.

java -jar target/jdbchdfs-local-task-{version}.jar \
--server.port=0 \
--jdbchdfs.directory=/data/jdbchdfstask2/ \
--jdbchdfs.tableName=FOO \
--jdbchdfs.partitionColumn=id \
--jdbchdfs.columnNames=PROPKEY,PROPVALUE \
--spring.application.name=helloWorld \
--jdbchdfs.datasource.url=jdbc:postgresql://localhost:5432/mydb \
--jdbchdfs.datasource.username=myusername  \
--jdbchdfs.datasource.driverClassName=org.postgresql.Driver \
--jdbchdfs.datasource.password=mypassword \
--spring.datasource.url=jdbc:mariadb://localhost:3306/practice \
--spring.datasource.username=myusername \
--spring.datasource.password=mypassword \
--spring.datasource.driverClassName=org.mariadb.jdbc.Driver \
spring.profiles.active=master

Prev	Up	Next
5. Timestamp Task	Home	7. Composed Task Runner