Spring Cloud Data Flow for Apache YARN
		Next

Spring Cloud Data Flow for Apache YARN

Sabby Anandan, Marius Bogoevici, Eric Bottard, Mark Fisher, Ilayaperumal Gopinathan, Gunnar Hillert, Mark Pollack, Patrick Peralta, Glenn Renfro, Thomas Risberg, Dave Syer, David Turanski, Janne Valkealahti

1.2.0.RELEASE

Copyright © 2013-2017 Pivotal Software, Inc.

Copies of this document may be made for your own use and for distribution to others, provided that you do not charge any fee for such copies and further provided that each copy contains this Copyright Notice, whether distributed in print or electronically.

Table of Contents

1. About the documentation
2. Getting help

II. Introduction

3. Introducing Spring Cloud Data Flow for Apache YARN project
4. Spring Cloud Data Flow
5. Spring Cloud Stream
6. Spring Cloud Task

III. Architecture

7. Introduction

8. Microservice Architectural Style

8.1. Comparison to other Platform architectures

9. Streaming Applications

9.1. Imperative Programming Model
9.2. Functional Programming Model

10. Streams

10.1. Topologies
10.2. Concurrency
10.3. Partitioning
10.4. Message Delivery Guarantees

11. Analytics

12. Task Applications

13. Data Flow Server

13.1. Endpoints
13.2. Customization
13.3. Security

14. Runtime

14.1. Fault Tolerance
14.2. Resource Management
14.3. Scaling at runtime
14.4. Application Versioning

IV. Spring Cloud Data Flow Runtime

15. Deploying on YARN

15.1. Prerequisites

15.2. Download and Extract Distribution

15.3. Configure Settings

15.4. Start Server

15.5. Connect Shell

15.6. Register Applications

15.6.1. Sourcing Applications from HDFS

15.7. Create Stream

15.8. Create Task

15.9. Using YARN Cli

15.9.1. Check YARN App Statuses
15.9.2. Push Apps

15.10. Using Metric Collectors

16. Deploying on AMBARI

16.1. Install Ambari Server

16.2. Deploy Data Flow

16.3. Using Configuration

16.3.1. Change Datasource

17. Configuring Runtime Settings and Environment

17.1. Generic App Settings

17.2. Configuring Application Resources

17.3. Configure Base Directory

17.4. Pre-populate Applications

17.5. Configure Logging

17.6. Configure Metrics

17.7. Global YARN Memory Settings

17.8. Configure Kerberos

17.8.1. Working with Kerberized Kafka

17.9. Configure Hdfs HA

17.10. Configure Database

17.11. Configure Network Discovery

18. How YARN Deployment Works

19. Troubleshooting

20. Using Sandboxes

20.1. Hortonworks Sandbox

21. Introduction

22. Stream DSL

23. Register a Stream App

23.1. Whitelisting application properties

23.2. Creating and using a dedicated metadata artifact

23.2.1. Using the companion artifact

24. Creating custom applications

25. Creating a Stream

25.1. Application properties

25.1.1. Passing application properties when creating a stream

25.2. Deployment properties

25.2.1. Application properties versus Deployer properties
25.2.2. Passing instance count as deployment property
25.2.3. Inline vs file reference properties
25.2.4. Passing application properties when deploying a stream
25.2.5. Passing Spring Cloud Stream properties for the application
25.2.6. Passing per-binding producer consumer properties
25.2.7. Passing stream partition properties during stream deployment
25.2.8. Passing application content type properties
25.2.9. Overriding application properties during stream deployment

25.3. Common application properties

26. Destroying a Stream

27. Deploying and Undeploying Streams

28. Other Source and Sink Application Types

29. Simple Stream Processing

30. Stateful Stream Processing

31. Tap a Stream

32. Using Labels in a Stream

33. Explicit Broker Destinations in a Stream

34. Directed Graphs in a Stream

35. Stream applications with multiple binder configurations

36. Introducing Spring Cloud Task

37. The Lifecycle of a task

37.1. Creating a custom Task Application

37.2. Registering a Task Application

37.3. Creating a Task

37.4. Launching a Task

37.4.1. Common application properties

37.5. Reviewing Task Executions

37.6. Destroying a Task

38. Task Repository

38.1. Configuring the Task Execution Repository

38.1.1. Local
38.1.2. Task Application Repository

38.2. Datasource

39. Subscribing to Task/Batch Events

40. Launching Tasks from a Stream

40.1. TriggerTask
40.2. TaskLaunchRequest-transform

41. Composed Tasks

41.1. Configuring the Composed Task Runner in Spring Cloud Data Flow

41.1.1. Registering the Composed Task Runner application
41.1.2. Configuring the Composed Task Runner application

41.2. Creating, Launching, and Destroying a Composed Task

41.2.1. Creating a Composed Task

Task Application Parameters

41.2.2. Launching a Composed Task

Exit Statuses

41.2.3. Destroying a Composed Task

41.2.4. Stopping a Composed Task

41.2.5. Restarting a Composed Task

41.3. Composed Task DSL

41.3.1. Conditional Execution

41.3.2. Transitional Execution

Basic Transition
Transition With a Wildcard
Transition With a Following Conditional Execution

41.3.3. Split Execution

Split Containing Conditional Execution

VII. Dashboard

42. Introduction

43.1. Bulk Import of Applications

44. Runtime

45. Streams

46. Create Stream

47.1.1. Create a Task Definition from a selected Task App
47.1.2. View Task App Details

47.2. Definitions

47.2.1. Creating Task Definitions using the bulk define interface
47.2.2. Creating Composed Task Definitions
47.2.3. Launching Tasks

47.3. Executions

48.1. List job executions

48.1.1. Job execution details
48.1.2. Step execution details
48.1.3. Step Execution Progress

49. Analytics

VIII. ‘How-to’ guides

50. Configure Maven Properties

51. Logging

51.1. Deployment Logs
51.2. Application Logs

52. Frequently asked questions

52.1. Advanced SpEL expressions
52.2. How to use JDBC-sink?
52.3. How to use multiple message-binders?

IX. Appendices

A. Migrating from Spring XD to Spring Cloud Data Flow

A.1. Terminology Changes

A.2. Modules to Applications

A.2.1. Custom Applications
A.2.2. Application Registration
A.2.3. Application Properties

A.3. Message Bus to Binders

A.3.1. Message Bus
A.3.2. Binders
A.3.3. Named Channels
A.3.4. Directed Graphs

A.4. Batch to Tasks

A.5. Shell/DSL Commands

A.6. REST-API

A.7. UI / Flo

A.8. Architecture Components

A.8.1. ZooKeeper
A.8.2. RDBMS
A.8.3. Redis
A.8.4. Cluster Topology

A.9. Central Configuration

A.10. Distribution

A.11. Hadoop Distribution Compatibility

A.12. YARN Deployment

A.13. Use Case Comparison

A.13.1. Use Case #1
A.13.2. Use Case #2
A.13.3. Use Case #3

B. Building

B.1. Documentation

B.2. Working with the code

B.2.1. Importing into eclipse with m2eclipse
B.2.2. Importing into eclipse without m2eclipse

C. Contributing

C.1. Sign the Contributor License Agreement
C.2. Code Conventions and Housekeeping

		Next
		Part I. Preface