Spring Cloud Data Flow for Apache YARN

Sabby Anandan, Marius Bogoevici, Eric Bottard, Mark Fisher, Ilayaperumal Gopinathan, Gunnar Hillert, Mark Pollack, Patrick Peralta, Glenn Renfro, Thomas Risberg, Dave Syer, David Turanski, Janne Valkealahti


Copies of this document may be made for your own use and for distribution to others, provided that you do not charge any fee for such copies and further provided that each copy contains this Copyright Notice, whether distributed in print or electronically.

Table of Contents

I. Preface
1. About the documentation
2. Getting help
II. Introduction
3. Introducing Spring Cloud Data Flow for Apache YARN project
4. Spring Cloud Data Flow
5. Spring Cloud Stream
6. Spring Cloud Task
III. Architecture
7. Introduction
8. Microservice Architectural Style
8.1. Comparison to other Platform architectures
9. Streaming Applications
9.1. Imperative Programming Model
9.2. Functional Programming Model
10. Streams
10.1. Topologies
10.2. Concurrency
10.3. Partitioning
10.4. Message Delivery Guarantees
11. Analytics
12. Task Applications
13. Data Flow Server
13.1. Endpoints
13.2. Customization
13.3. Security
14. Runtime
14.1. Fault Tolerance
14.2. Resource Management
14.3. Scaling at runtime
14.4. Application Versioning
IV. Spring Cloud Data Flow Runtime
15. Deploying on YARN
15.1. Prerequisites
15.2. Download and Extract Distribution
15.3. Configure Settings
15.4. Start Server
15.5. Connect Shell
15.6. Register Applications
15.6.1. Sourcing Applications from HDFS
15.7. Create Stream
15.8. Create Task
15.9. Using YARN Cli
15.9.1. Check YARN App Statuses
15.9.2. Push Apps
16. Deploying on AMBARI
16.1. Install Ambari Server
16.2. Deploy Data Flow
16.3. Using Configuration
16.3.1. Change Datasource
17. Configuring Runtime Settings and Environment
17.1. Generic App Settings
17.2. Configuring Application Resources
17.3. Configure Base Directory
17.4. Pre-populate Applications
17.5. Configure Logging
17.6. Global YARN Memory Settings
17.7. Configure Kerberos
17.7.1. Working with Kerberized Kafka
17.8. Configure Hdfs HA
17.9. Configure Database
17.10. Configure Network Discovery
18. How YARN Deployment Works
19. Troubleshooting
20. Using Sandboxes
20.1. Hortonworks Sandbox
V. Streams
21. Introduction
22. Stream DSL
23. Register a Stream App
23.1. Whitelisting application properties
24. Creating custom applications
25. Creating a Stream
25.1. Application properties
25.1.1. Passing application properties when creating a stream
25.2. Deployment properties
25.2.1. Passing instance count as deployment property
25.2.2. Inline vs file reference properties
25.2.3. Passing application properties when deploying a stream
25.2.4. Passing Spring Cloud Stream properties for the application
25.2.5. Passing per-binding producer consumer properties
25.2.6. Passing stream partition properties during stream deployment
25.2.7. Passing application content type properties
25.2.8. Overriding application properties during stream deployment
25.3. Deployment properties
25.3.1. Passing instance count as deployment property
25.3.2. Inline vs file reference properties
26. Destroying a Stream
27. Deploying and Undeploying Streams
28. Other Source and Sink Application Types
29. Simple Stream Processing
30. Stateful Stream Processing
31. Tap a Stream
32. Using Labels in a Stream
33. Explicit Broker Destinations in a Stream
34. Directed Graphs in a Stream
34.1. Common application properties
35. Stream applications with multiple binder configurations
VI. Tasks
36. Introducing Spring Cloud Task
37. The Lifecycle of a task
37.1. Creating a custom Task Application
37.2. Registering a Task Application
37.3. Creating a Task
37.4. Launching a Task
37.5. Reviewing Task Executions
37.6. Destroying a Task
38. Task Repository
38.1. Configuring the Task Execution Repository
38.1.1. Local
38.1.2. Task Application Repository
38.2. Datasource
39. Subscribing to Task/Batch Events
40. Launching Tasks from a Stream
40.1. TriggerTask
40.2. Translator
VII. Dashboard
41. Introduction
42. Apps
42.1. Bulk Import of Applications
43. Runtime
44. Streams
45. Create Stream
46. Tasks
46.1. Apps
46.1.1. Create a Task Definition from a selected Task App
46.1.2. View Task App Details
46.2. Definitions
46.2.1. Creating Task Definitions using the bulk define interface
46.2.2. Launching Tasks
46.3. Executions
47. Jobs
47.1. List job executions
47.1.1. Job execution details
47.1.2. Step execution details
47.1.3. Step Execution Progress
48. Analytics
VIII. ‘How-to’ guides
49. Configure Maven Properties
50. Logging
50.1. Deployment Logs
50.2. Application Logs
IX. Appendices
A. Migrating from Spring XD to Spring Cloud Data Flow
A.1. Terminology Changes
A.2. Modules to Applications
A.2.1. Custom Applications
A.2.2. Application Registration
A.2.3. Application Properties
A.3. Message Bus to Binders
A.3.1. Message Bus
A.3.2. Binders
A.3.3. Named Channels
A.3.4. Directed Graphs
A.4. Batch to Tasks
A.5. Shell/DSL Commands
A.7. UI / Flo
A.8. Architecture Components
A.8.1. ZooKeeper
A.8.2. RDBMS
A.8.3. Redis
A.8.4. Cluster Topology
A.9. Central Configuration
A.10. Distribution
A.11. Hadoop Distribution Compatibility
A.12. YARN Deployment
A.13. Use Case Comparison
A.13.1. Use Case #1
A.13.2. Use Case #2
A.13.3. Use Case #3
B. Building
B.1. Documentation
B.2. Working with the code
B.2.1. Importing into eclipse with m2eclipse
B.2.2. Importing into eclipse without m2eclipse
C. Contributing
C.1. Sign the Contributor License Agreement
C.2. Code Conventions and Housekeeping