Spring Cloud Data Flow for Apache YARN

Sabby Anandan, Marius Bogoevici, Eric Bottard, Mark Fisher, Ilayaperumal Gopinathan, Gunnar Hillert, Mark Pollack, Patrick Peralta, Glenn Renfro, Thomas Risberg, Dave Syer, David Turanski, Janne Valkealahti

1.0.3.BUILD-SNAPSHOT

Copies of this document may be made for your own use and for distribution to others, provided that you do not charge any fee for such copies and further provided that each copy contains this Copyright Notice, whether distributed in print or electronically.


Table of Contents

I. Preface
1. About the documentation
2. Getting help
II. Introduction
3. Introducing Spring Cloud Data Flow for Apache YARN project
4. Spring Cloud Data Flow
5. Spring Cloud Stream
6. Spring Cloud Task
III. Architecture
7. Introduction
8. Microservice Architectural Style
8.1. Comparison to other Platform architectures
9. Streaming Applications
9.1. Imperative Programming Model
9.2. Functional Programming Model
10. Streams
10.1. Topologies
10.2. Concurrency
10.3. Partitioning
10.4. Message Delivery Guarantees
11. Analytics
12. Task Applications
13. Data Flow Server
13.1. Endpoints
13.2. Customization
13.3. Security
14. Runtime
14.1. Fault Tolerance
14.2. Resource Management
14.3. Scaling at runtime
14.4. Application Versioning
IV. Spring Cloud Data Flow Runtime
15. Deploying on YARN
15.1. Prerequisites
15.2. Download and Extract Distribution
15.3. Configure Settings
15.4. Start Server
15.5. Connect Shell
15.6. Register Applications
15.6.1. Sourcing Applications from HDFS
15.7. Create Stream
15.8. Create Task
15.9. Using YARN Cli
15.9.1. Check YARN App Statuses
15.9.2. Push Apps
16. Deploying on AMBARI
16.1. Install Ambari Server
16.2. Deploy Data Flow
16.3. Using Configuration
16.3.1. Change Datasource
17. Configuring Runtime Settings and Environment
17.1. Generic App Settings
17.2. Configuring Application Resources
17.3. Configure Base Directory
17.4. Pre-populate Applications
17.5. Configure Logging
17.6. Global YARN Memory Settings
17.7. Configure Kerberos
17.7.1. Working with Kerberized Kafka
17.8. Configure Hdfs HA
17.9. Configure Database
18. How YARN Deployment Works
19. Troubleshooting
20. Using Sandboxes
20.1. Hortonworks Sandbox
V. Streams
21. Introduction
22. Stream DSL
23. Register a Stream App
23.1. Whitelisting application properties
24. Creating a Stream
24.1. Application properties
24.1.1. Passing application properties when creating a stream
24.1.2. Passing application properties when deploying a stream
24.1.3. Passing stream partition properties during stream deployment
24.1.4. Overriding application properties during stream deployment
24.2. Deployment properties
24.2.1. Passing instance count as deployment property
24.2.2. Inline vs file reference properties
25. Destroying a Stream
26. Deploying and Undeploying Streams
27. Other Source and Sink Application Types
28. Simple Stream Processing
29. Stateful Stream Processing
30. Tap a Stream
31. Using Labels in a Stream
32. Explicit Broker Destinations in a Stream
33. Directed Graphs in a Stream
33.1. Common application properties
34. Stream applications with multiple binder configurations
VI. Tasks
35. Introducing Spring Cloud Task
36. The Lifecycle of a task
36.1. Registering a Task Application
36.2. Creating a Task
36.3. Launching a Task
36.4. Reviewing Task Executions
36.5. Destroying a Task
37. Task Repository
37.1. Configuring the Task Execution Repository
37.1.1. Local
37.2. Datasource
38. Subscribing to Task/Batch Events
39. Launching Tasks from a Stream
39.1. TriggerTask
39.2. Translator
VII. Dashboard
40. Introduction
41. Apps
42. Runtime
43. Streams
44. Create Stream
45. Tasks
45.1. Apps
45.1.1. Create a Task Definition from a selected Task App
45.1.2. View Task App Details
45.2. Definitions
45.2.1. Launching Tasks
45.3. Executions
46. Jobs
46.1. List job executions
46.1.1. Job execution details
46.1.2. Step execution details
46.1.3. Step Execution Progress
47. Analytics
VIII. ‘How-to’ guides
48. Configure Maven Properties
IX. Appendices
A. Migrating from Spring XD to Spring Cloud Data Flow
A.1. Terminology Changes
A.2. Modules to Applications
A.2.1. Custom Applications
A.2.2. Application Registration
A.2.3. Application Properties
A.3. Message Bus to Binders
A.3.1. Message Bus
A.3.2. Binders
A.3.3. Named Channels
A.3.4. Directed Graphs
A.4. Batch to Tasks
A.5. Shell/DSL Commands
A.6. REST-API
A.7. UI / Flo
A.8. Architecture Components
A.8.1. ZooKeeper
A.8.2. RDBMS
A.8.3. Redis
A.8.4. Cluster Topology
A.9. Central Configuration
A.10. Distribution
A.11. Hadoop Distribution Compatibility
A.12. YARN Deployment
A.13. Use Case Comparison
A.13.1. Use Case #1
A.13.2. Use Case #2
A.13.3. Use Case #3
B. Building
B.1. Documentation
B.2. Working with the code
B.2.1. Importing into eclipse with m2eclipse
B.2.2. Importing into eclipse without m2eclipse
C. Contributing
C.1. Sign the Contributor License Agreement
C.2. Code Conventions and Housekeeping