AWS Step Functions – Doing Serverless is Easier Than You Think

by Mustafa Motani on August 4, 2020

Editor’s Note: This post was originally published in January 2019 and has been updated in July 2020.

AWS Step Functions are a powerful tool for building dynamic state machines that control the flow of your serverless application. Featuring deep integration with AWS Lambda, Step Functions are critical in managing application workflows for serverless software.

In this article, we’ll briefly review step functions, including some of their more advanced features. We’ll then take a look at a few new features before closing out with some tips on pricing and links to additional resources.

A Review of Step Functions

AWS Step Functions let you build a state-based control flow mechanism using easy-to-read JSON objects. These objects, which are specified in Amazon States Language, define a series of states and state transitions, linking the outputs of each state to the inputs of the states that follow. This feature gives you greater flexibility and makes establishing control flows in your application less complicated so that you can build complex batch-processing systems without having to design the scalable infrastructure required in a more traditional approach.

Step functions also enable several different support functions for when things go wrong. These include dynamic error handling and routing, automatic retries, and custom routing based on function outputs.

Dynamic Error Handling

Below is a fairly straightforward state machine with some simple error handling:

"HelloWorld": {    "Type": "Task",    "Resource": "YOUR-FUNCTION-ARN",    "Catch": [{            "ErrorEquals": ["CustomError"],            "Next": "CustomErrorFallback"        },        {            "ErrorEquals": ["States.TaskFailed"],            "Next": "ReservedTypeFallback"        },        {            "ErrorEquals": ["States.ALL"],            "Next": "CatchAllFallback"         }     ],     "End": true}

Source: Thundra Blog Article

In the above sample code, we define a sample task called “HelloWorld”. This links to the Lambda function at “YOUR-FUNCTION-ARN”. Potential future states from this task depend on the function output. If your Lambda function ends successfully, then all is well! Your state machine ends and completes successfully, letting your application move on to the next phase of its operation.

But when things go wrong, the flexibility of step-function error handling really begins to shine through. The above definition includes three potential fallback scenarios based on the function output. This lets you easily respond to failures when they occur and build custom rehabilitation flows for your data before sending it back through the state machine.

Dynamic Parallelism Improves Throughput

A significant change since our last article has been the introduction of dynamic parallelism to AWS Step Functions. Dynamic parallelism defines a new state type, Map, which requires an Iterator, a complete sub-flow defined in your state machine. The Map type allows you to define a concurrency amount, giving you full control over the rate at which your batches are processed. This can prove crucial in batch workflows, when large numbers of records are dealt with in their entirety instead of individually.

Walking Through an Example

To really understand the power of dynamic parallelism, let’s look at an example of a step function flow, pulled from the launch article on the AWS blog:

{  "StartAt": "ValidatePayment",  "States": {    "ValidatePayment": {      "Type": "Task",      "Resource": VALIDATE-PAYMENT-ARN",      "Next": "CheckPayment"    },    "CheckPayment": {      "Type": "Choice",      "Choices": [        {          "Not": {            "Variable": "$.payment",            "StringEquals": "Ok"          },          "Next": "PaymentFailed"        }      ],      "Default": "ProcessAllItems"    },    "PaymentFailed": {      "Type": "Task",      "Resource": "PAYMENT-FAILED-ARN",      "End": true    },    "ProcessAllItems": {      "Type": "Map",      "InputPath": "$.detail",      "ItemsPath": "$.items",      "MaxConcurrency": 3,      "Iterator": {        "StartAt": "CheckAvailability",        "States": {          "CheckAvailability": {            "Type": "Task",            "Resource": "CHECK-AVAILABILITY-ARN",            "Retry": [              {                "ErrorEquals": [                  "TimeOut"                ],                "IntervalSeconds": 1,                "BackoffRate": 2,                "MaxAttempts": 3              }            ],            "Next": "PrepareForDelivery"          },          "PrepareForDelivery": {            "Type": "Task",            "Resource": "PREPARE-FOR-DELIVERY-ARN",            "Next": "StartDelivery"          },          "StartDelivery": {            "Type": "Task",            "Resource": "START-DELIVERY-ARN",            "End": true          }        }      },      "ResultPath": "$.detail.processedItems",      "Next": "SendOrderSummary"    },    "SendOrderSummary": {      "Type": "Task",      "InputPath": "$.detail.processedItems",      "Resource": "SEND-ORDER-SUMMARY-ARN",      "ResultPath": "$.detail.summary",      "End": true    }  }}

Source: AWS Blog

The key definition here is the definition for ProcessAllItems. This state defines a substate flow that is processed in batches of three (via MaxConcurrency). Each of the defined substates is a fully featured state compliant with Amazon States Language, giving you the flexibility to run concurrent workflows on large batches of data.

In the example above, each record is passed through the sub-state machine starting at the state CheckAvailability. This function has a retry mechanism built in; upon success, it passes the record to the state PrepareForDelivery.

This is a simple passthrough function call that adds information to the order item before passing it to the StartDelivery end-state. The results are then collated into the item $.detail.processedItems, which we use in SendOrderSummary as input to generate a user-readable summary table for the order.

With the above small definition, you have a robust availability and fulfillment flow that can efficiently operate on a large batch, intelligently routing the results into a collection you can use to report on aggregate behavior. And the only coding required is functional, ensuring that each feature delivers as promised. This powerful improvement to AWS Step Functions opens up additional flexibility via a simple JSON interface.

More Flexibility With New Features

In addition to dynamic parallelism, there are a few additional features worth exploring: Callback Patterns, Nested Workflows, and CodePipeline integration.

Callback Patterns

Callback patterns allow for deeper integration with third-party services as well as enhanced capabilities when working with human-driven interactions. Callback patterns pause your step-function state machines mid-flow while you contact third parties or await human interaction. Once the desired step has been completed, you can trigger a resumption of the state machine via a call to the AWS Step Functions API.

Callback patterns are supported by a number of AWS services, such as Amazon SQS, Amazon SNS, and others. The steps of your workflow can also exist anywhere, letting you easily tie disparate functions together into a cohesive whole via a simple JSON state definition.

Nested Workflows

Nested workflows, launched in August of 2019, allow for highly configurable sub-flows of steps that are managed as singular units, instead of being replicated across multiple different states in your Amazon States Language definitions. Nested workflows construct reusable component flows that can be easily included in a step function with a single item, reducing configuration complexity by making the individual elements of your step functions easier to categorize and organize.

CodePipeline Integration

As most applications grow, their complexity scales as well, hopefully in a linear fashion not geometric. This increase in complexity often comes with more complex deployment mechanisms that require a finely tuned approach to keep the ensemble running smoothly.

In support of this, AWS added a deeper integration between Step Functions and Amazon CodePipeline, allowing for a more fine-grained approach to your CodePipeline-driven deployments. By leveraging the new CodePipeline action type, you can trigger a custom step-function state machine that can handle the complexity of your development environment.

An Important Note on Step Function Pricing

When proposing any technical solution to a problem, the first and most important question raised is “How much will it cost?” Step functions are billed per the number of state transitions executed each month. The first 4,000 state transitions are free; thereafter, you are charged at a rate of $0.025 per 1,000 transactions incurred.

If you’re looking to process millions of records a month through Step Functions, your costs will scale accordingly, and this will be on top of any costs for data processing and compute usage incurred by your step-function state machine’s Lambda functions. It is important to be aware of the general level of activity in your states, particularly when using powerful tools like dynamic parallelism and nested workflows, which ease workflows, but at the cost of complexity in state-machine transitions. These transitions are counted individually and can add up quickly.

There are alternative mechanisms available. One of the powerful features of step functions is the capacity it gives you to progressively build up intricate synchronous data workflows per your business specifications.

But if you can handle processing your data in an asynchronous fashion, you can likely save a lot of money—and time—by evaluating event-based flows using tools like Amazon’s Simple Queue Service or Simple Notification Service. These event-driven tools can serve your asynchronous data modification pipeline by replacing simple state flows with event queues and event consumers. It’s important to keep in mind: You may not actually need the complexity of AWS Step Functions to achieve your goals.

Getting More Information on Step Functions

We wanted to close out our review of step functions with a collection of resources that you may find useful:

Here’s our accompanying code repo for this article.
This is AWS’ documentation on step functions, a comprehensive resource for information on step-function definitions.
This getting started article on creating serverless workflows with AWS Lambda and AWS Step Functions has some excellent examples, as does the AWS documentation on integrating AWS Lambda with AWS Step Functions.
This Medium article from Ben Ellerby provides a powerful guide for building serverless control flows with step functions.

Stepping Forward

AWS Step Functions have continued to grow their capabilities, providing deeper support for complexity and new features that enhance throughput and increase modularity. Dynamic parallelism adds a new way to improve information flow in your step functions, letting you control the degree to which your batch processing is parallelized via simple JSON state definitions.

Other new features like nested workflows and callback patterns allow for more complex pipeline integrations. The ability to integrate with CodePipeline is a powerful tool for managing infrastructure configuration complexity as your applications scale. However, this can come at a cost, namely the sheer number of state transitions you need to achieve your goals.

Third-party tools like Thundra can help you manage this complexity by providing metrics and monitoring of your step functions in an easy-to-use dashboard, letting you build the metrics required to manage the complexity you need.

*** This is a Security Bloggers Network syndicated blog from Thundra blog authored by Mustafa Motani. Read the original post at: https://blog.thundra.io/aws-step-functions