Spring Cloud Data Flow for Apache YARN

Sabby Anandan, Marius Bogoevici, Eric Bottard, Mark Fisher, Ilayaperumal Gopinathan, Gunnar Hillert, Mark Pollack, Patrick Peralta, Glenn Renfro, Thomas Risberg, Dave Syer, David Turanski, Janne Valkealahti

1.0.0.BUILD-SNAPSHOT

Copies of this document may be made for your own use and for distribution to others, provided that you do not charge any fee for such copies and further provided that each copy contains this Copyright Notice, whether distributed in print or electronically.


Table of Contents

I. Preface
1. About the documentation
2. Getting help
II. Introduction
3. Introducing Spring Cloud Data Flow for Apache YARN project
4. Spring Cloud Data Flow
5. Spring Cloud Stream
6. Spring Cloud Task
III. Architecture
7. Introduction
8. Microservice Architectural Style
8.1. Comparison to other Platform architectures
9. Streaming Applications
9.1. Imperative Programming Model
9.2. Functional Programming Model
10. Streams
10.1. Topologies
10.2. Concurrency
10.3. Partitioning
10.4. Message Delivery Guarantees
11. Analytics
12. Task Applications
13. Data Flow Server
13.1. Endpoints
13.2. Customization
13.3. Security
14. Runtime
14.1. Fault Tolerance
14.2. Resource Management
14.3. Scaling at runtime
14.4. Application Versioning
IV. Spring Cloud Data Flow Runtime
15. Deploying on YARN
15.1. Prerequisites
15.2. Download and Extract Distribution
15.3. Configure Settings
15.4. Start Server
15.5. Connect Shell
15.6. Register Applications
15.6.1. Sourcing Applications from HDFS
15.7. Create Stream
15.8. Create Task
15.9. Check YARN App Statuses
16. Deploying on AMBARI
16.1. Install Ambari Server
16.2. Deploy Data Flow
16.3. Using Configuration
17. Configuring Runtime Settings and Environment
17.1. Configuring Application Resources
17.2. Configure Base Directory
17.3. Pre-populate Applications
17.4. Configure Logging
17.5. Global YARN Memory Settings
18. How YARN Deployment Works
19. Troubleshooting
20. Using Sandboxes
20.1. Hortonworks Sandbox
V. Streams
21. Introduction
22. Stream DSL
23. Register a Stream App
23.1. Whitelisting application properties
24. Creating a Stream
24.1. Application properties
24.1.1. Passing application properties when creating a stream
24.1.2. Passing application properties when deploying a stream
24.1.3. Passing stream partition properties during stream deployment
24.1.4. Overriding application properties during stream deployment
24.2. Deployment properties
24.2.1. Passing instance count as deployment property
24.2.2. Inline vs file reference properties
25. Destroying a Stream
26. Deploying and Undeploying Streams
27. Other Source and Sink Application Types
28. Simple Stream Processing
29. Stateful Stream Processing
30. Tap a Stream
31. Using Labels in a Stream
32. Explicit Broker Destinations in a Stream
33. Directed Graphs in a Stream
33.1. Common application properties
VI. Tasks
34. Introducing Spring Cloud Task
35. The Lifecycle of a task
35.1. Registering a Task Application
35.2. Creating a Task
35.3. Launching a Task
35.4. Reviewing Task Executions
35.5. Destroying a Task
36. Task Repository
36.1. Configuring the Task Execution Repository
36.1.1. Local
36.2. Datasource
37. Subscribing to Task/Batch Events
38. Launching Tasks from a Stream
38.1. TriggerTask
38.2. Translator
VII. Dashboard
39. Introduction
40. Apps
41. Runtime
42. Streams
43. Create Stream
44. Tasks
44.1. Apps
44.1.1. Create a Task Definition from a selected Task App
44.1.2. View Task App Details
44.2. Definitions
44.2.1. Launching Tasks
44.3. Executions
45. Jobs
45.1. List job executions
45.1.1. Job execution details
45.1.2. Step execution details
45.1.3. Step Execution Progress
46. Analytics
VIII. ‘How-to’ guides
47. Configure Maven Properties
IX. Appendices
A. Migrating from Spring XD to Spring Cloud Data Flow
A.1. Terminology Changes
A.2. Modules to Applications
A.2.1. Custom Applications
A.2.2. Application Registration
A.2.3. Application Properties
A.3. Message Bus to Binders
A.3.1. Message Bus
A.3.2. Binders
A.3.3. Named Channels
A.3.4. Directed Graphs
A.4. Batch to Tasks
A.5. Shell/DSL Commands
A.6. REST-API
A.7. UI / Flo
A.8. Architecture Components
A.8.1. ZooKeeper
A.8.2. RDBMS
A.8.3. Redis
A.8.4. Cluster Topology
A.9. Central Configuration
A.10. Distribution
A.11. Hadoop Distribution Compatibility
A.12. YARN Deployment
A.13. Use Case Comparison
A.13.1. Use Case #1
A.13.2. Use Case #2
A.13.3. Use Case #3
B. Building
B.1. Documentation
B.2. Working with the code
B.2.1. Importing into eclipse with m2eclipse
B.2.2. Importing into eclipse without m2eclipse
C. Contributing
C.1. Sign the Contributor License Agreement
C.2. Code Conventions and Housekeeping