Important Considerations For Building With Serverless
The increasing abstraction of a Serverless solution requires organizations to update their operating
models, requiring users to take different approaches and select different tools to manage these new
deployments. The Serverless community is working hard to improve the following considerations.
Networking in Serverless is abstracted by the cloud into Events and Triggers, removing any
access for users beyond which is provided in environment variables. Functions run in multi-tenanted
containers, which incur a high overhead to bind to private network interface. Event-driven architectures
have a big impact on networking, and we recommend Cloud managers consider how to access data currently
residing in VPC-based deployments, and how to optimize architectures for Serverless workloads.
Security is also impacted by the move from networks to Events, where existing security tools
like NIDS and WAF’s rely on inspecting networks and packets, which are not present in Serverless. Functions
are typically daisy chained together with other services and resources - how to manage the combinations
of events and services - and block execution paths that are not permitted? Best practices are to apply
per-Function least privilege access policies, which must be managed and updated as new service calls
are added to code releases. We recommend a proof of concept to evaluate threats and new tooling.
Observability in Serverless solutions is the responsibility of the end user to create detailed
insights into the operational health and end-to-end processes. Public cloud logging is eventually consistent,
with delays of seconds to several minutes to get access to operational data required to understand
and fix issues. Functions run in highly restricted environments with common performance tools removed,
so adopters must consider Function-specific monitoring solutions to provide operational insights.
Debugging of production Function code and live data collection is essential for developers
and site reliability engineers working to improve customer experiences. Public cloud Functions execute
as multi-cloud containers, and require logging of data state at key code breakpoints. New tools are
required to provide streaming logs and insights into data state and metrics in running Functions.
Testing code in a Serverless solution also requires a change in approaches, with simple Function
code but complex interactions between many functions and services and events in a highly distributed
stateless system. Best practices are for Functions for perform a single purpose, and are written as
a few hundred lines of code, so unit testing becomes relatively very simple. Outside of the Function,
there are combinations of events, data, workflows and service interactions to be tested. The emphasis
of serverless testing shifts from code units to end-to-end calls for chains of events across a set
of distributed services.
Team roles evolve when moving Serverless, where the focus of the team shifts away from managing
infrastructure deployments, and increases the amount of time the team can spend on feature development
and Site Reliability Engineering (SRE) activities. The SRE role is an evolution beyond the DevOps transformation
where operations engineers spend shift time coding application improvements and fixing bugs in real-time,
rather than ensuring infrastructure uptime. This process becomes a virtuous cycle, where SRE’s act
as a flywheel for long-term reliability improvements in code, and reduce the operational support required
for a well instrumented and automated workload.
Runaway costs are when developers accidentally create infinite event loops between invocations,
where triggers cause code to continuously execute. A common infinite loop is event sourcing from a
datastore, where a Function writes to the table, which triggers a stream to emit a new event, which
in-turn triggers the initial Function to run and infinitely loop [
7]. Serverless adopters must actively manage concurrency limits across all services, activate billing
alarms, and establish a circuit breaker for a deployment to buy some time to debug and release code