Spring Cloud Data Flow for Apache YARN

Sabby Anandan, Marius Bogoevici, Eric Bottard, Mark Fisher, Ilayaperumal Gopinathan, Gunnar Hillert, Mark Pollack, Patrick Peralta, Glenn Renfro, Thomas Risberg, Dave Syer, David Turanski, Janne Valkealahti


Copies of this document may be made for your own use and for distribution to others, provided that you do not charge any fee for such copies and further provided that each copy contains this Copyright Notice, whether distributed in print or electronically.

Table of Contents

I. Preface
1. About the documentation
2. Getting help
II. Introduction
3. Introducing Spring Cloud Data Flow for Apache YARN project
4. Spring Cloud Data Flow
5. Spring Cloud Stream
6. Spring Cloud Task
III. Architecture
7. Introduction
8. Microservice Architectural Style
8.1. Comparison to other Platform architectures
9. Streaming Applications
9.1. Imperative Programming Model
9.2. Functional Programming Model
10. Streams
10.1. Topologies
10.2. Concurrency
10.3. Partitioning
10.4. Message Delivery Guarantees
11. Analytics
12. Task Applications
13. Data Flow Server
13.1. Endpoints
13.2. Customization
13.3. Security
14. Runtime
14.1. Fault Tolerance
14.2. Resource Management
14.3. Scaling at runtime
14.4. Application Versioning
IV. Spring Cloud Data Flow Runtime
15. Deploying on YARN
15.1. Prerequisites
15.2. Download and Extract Distribution
15.3. Configure Settings
15.4. Start Server
15.5. Connect Shell
15.6. Register Applications
15.6.1. Sourcing Applications from HDFS
15.7. Create Stream
15.8. Create Task
15.9. Using YARN Cli
15.9.1. Check YARN App Statuses
15.9.2. Push Apps
15.10. Using Metric Collectors
16. Deploying on AMBARI
16.1. Install Ambari Server
16.2. Deploy Data Flow
16.3. Using Configuration
16.3.1. Change Datasource
17. Configuring Runtime Settings and Environment
17.1. Generic App Settings
17.2. Configuring Application Resources
17.3. Configure Base Directory
17.4. Pre-populate Applications
17.5. Configure Logging
17.6. Configure Metrics
17.7. Global YARN Memory Settings
17.8. Configure Kerberos
17.8.1. Working with Kerberized Kafka
17.9. Configure Hdfs HA
17.10. Configure Database
17.11. Configure Network Discovery
18. How YARN Deployment Works
19. Troubleshooting
20. Using Sandboxes
20.1. Hortonworks Sandbox
V. Streams
21. Introduction
22. Stream DSL
23. Register a Stream App
23.1. Whitelisting application properties
23.2. Creating and using a dedicated metadata artifact
23.2.1. Using the companion artifact
24. Creating custom applications
25. Creating a Stream
25.1. Application properties
25.1.1. Passing application properties when creating a stream
25.2. Deployment properties
25.2.1. Application properties versus Deployer properties
25.2.2. Passing instance count as deployment property
25.2.3. Inline vs file reference properties
25.2.4. Passing application properties when deploying a stream
25.2.5. Passing Spring Cloud Stream properties for the application
25.2.6. Passing per-binding producer consumer properties
25.2.7. Passing stream partition properties during stream deployment
25.2.8. Passing application content type properties
25.2.9. Overriding application properties during stream deployment
25.3. Common application properties
26. Destroying a Stream
27. Deploying and Undeploying Streams
28. Other Source and Sink Application Types
29. Simple Stream Processing
30. Stateful Stream Processing
31. Tap a Stream
32. Using Labels in a Stream
33. Explicit Broker Destinations in a Stream
34. Directed Graphs in a Stream
35. Stream applications with multiple binder configurations
VI. Tasks
36. Introducing Spring Cloud Task
37. The Lifecycle of a task
37.1. Creating a custom Task Application
37.2. Registering a Task Application
37.3. Creating a Task
37.4. Launching a Task
37.4.1. Common application properties
37.5. Reviewing Task Executions
37.6. Destroying a Task
38. Task Repository
38.1. Configuring the Task Execution Repository
38.1.1. Local
38.1.2. Task Application Repository
38.2. Datasource
39. Subscribing to Task/Batch Events
40. Launching Tasks from a Stream
40.1. TriggerTask
40.2. TaskLaunchRequest-transform
41. Composed Tasks
41.1. Configuring the Composed Task Runner in Spring Cloud Data Flow
41.1.1. Registering the Composed Task Runner application
41.1.2. Configuring the Composed Task Runner application
41.2. Creating, Launching, and Destroying a Composed Task
41.2.1. Creating a Composed Task
Task Application Parameters
41.2.2. Launching a Composed Task
Exit Statuses
41.2.3. Destroying a Composed Task
41.2.4. Stopping a Composed Task
41.2.5. Restarting a Composed Task
41.3. Composed Task DSL
41.3.1. Conditional Execution
41.3.2. Transitional Execution
Basic Transition
Transition With a Wildcard
Transition With a Following Conditional Execution
41.3.3. Split Execution
Split Containing Conditional Execution
VII. Dashboard
42. Introduction
43. Apps
43.1. Bulk Import of Applications
44. Runtime
45. Streams
46. Create Stream
47. Tasks
47.1. Apps
47.1.1. Create a Task Definition from a selected Task App
47.1.2. View Task App Details
47.2. Definitions
47.2.1. Creating Task Definitions using the bulk define interface
47.2.2. Creating Composed Task Definitions
47.2.3. Launching Tasks
47.3. Executions
48. Jobs
48.1. List job executions
48.1.1. Job execution details
48.1.2. Step execution details
48.1.3. Step Execution Progress
49. Analytics
VIII. ‘How-to’ guides
50. Configure Maven Properties
51. Logging
51.1. Deployment Logs
51.2. Application Logs
52. Frequently asked questions
52.1. Advanced SpEL expressions
52.2. How to use JDBC-sink?
52.3. How to use multiple message-binders?
IX. Appendices
A. Migrating from Spring XD to Spring Cloud Data Flow
A.1. Terminology Changes
A.2. Modules to Applications
A.2.1. Custom Applications
A.2.2. Application Registration
A.2.3. Application Properties
A.3. Message Bus to Binders
A.3.1. Message Bus
A.3.2. Binders
A.3.3. Named Channels
A.3.4. Directed Graphs
A.4. Batch to Tasks
A.5. Shell/DSL Commands
A.7. UI / Flo
A.8. Architecture Components
A.8.1. ZooKeeper
A.8.2. RDBMS
A.8.3. Redis
A.8.4. Cluster Topology
A.9. Central Configuration
A.10. Distribution
A.11. Hadoop Distribution Compatibility
A.12. YARN Deployment
A.13. Use Case Comparison
A.13.1. Use Case #1
A.13.2. Use Case #2
A.13.3. Use Case #3
B. Building
B.1. Documentation
B.2. Working with the code
B.2.1. Importing into eclipse with m2eclipse
B.2.2. Importing into eclipse without m2eclipse
C. Contributing
C.1. Sign the Contributor License Agreement
C.2. Code Conventions and Housekeeping