Looking Back on 5 Years of using Serverless Infrastructure on AWS
How we've kept our infrastructure cheap to run and fast to build, by going fully serverless on AWS in 2019.
NearSt was relatively early in the trend of going 100% serverless. By 2019, 99% of our infrastructure was running on AWS Lambda, DynamoDB and Kinesis.
What initially attracted us to AWS's serverless offering was the low starting cost, and the promise of scalability and uptime that comes with managed services. This has allowed NearSt to grow to process billions of stock updates from thousands of retailers around the world, with a 99.99% uptime for our inventory processing pipeline, without any full-time DevOps engineers.
Over time we learned a lot about how to use the different serverless AWS services more effectively, both from an engineering and a cost-management perspective. We've also added many tools to our tool belt, now actively using services like Step Functions, EventBridge, Glue and many more on a daily basis to power our tools for retailers.
Cost management
One of the things that serverless infrastructure has really allowed us to do, is optimise our cloud costs. Even with billions of daily events happening across our platform, our monthly bill is in the low four figures range.
I know lead engineers that aren't used to the type of granular usage-based billing that serverless offers often see the complexity of the billing management of AWS as a downside, but we tend to look at it as an opportunity.
Because of AWS's breath of services, there are always multiple ways of implementing a new feature, and as engineers we spend some time discussing costs, complexity, testability and observability of each method to figure out what the right trade-off is. This allows us to be mindful about whether we want to build a cheap, quick experiment or a solid core production service that we're willing to send a bit more of our budget on.
Tools to manage this billing complexity have also come a long way in the last few years. The AWS Console now offers a whole range of cost management and reporting tools, and tools like Vantage are a godsend for drilling down further and finding part of your infrastructure that can be further cost-optimised.
Scalability and uptime
One of the great things about our Serverless setup is that we spend very little time thinking about scalability or uptime. There's no back-ends running on servers that can crash (at least not within our management), there's no databases that get overwhelmed by requests.
Beyond some fine-tuning of our batch sizes and rates of sending events between services, there is very little DevOps work involved in keeping the platform up and running. We scaled from a few thousands of products to millions without needing to spend too much time thinking too much about the underlying infrastructure. As long as you follow the AWS Well-Architected Framework principles, you are well on your way to build scalable software.
Prototyping
The other amazing thing about serverless is that it is a library of building blocks.
Once you've gained some basic understanding of the different core elements, it is very easy to quickly chain them together to build prototypes. I can spin up a new API with some basic database storage in a few minutes, and deploy it to a public endpoint immediately, allowing for some really quick testing of new concepts.
This is a real shift in thinking that developers new to serverless often have to get used to - cloud infrastructure is cheap, and can be torn down easily, so the best way to try something new is to deploy it and see what happens.
AWS Organisations make it easy to set up a separate environment for each of your devs to play in, with real cloud resources in a real production-like environment, whilst retaining top-level control over security and costs.
Whilst we're still small enough to set these environments up manually, there's a way to fully automate this process by creating new AWS accounts through CloudFormation, allowing you to manage dev accounts through Infrastructure-as-Code, like any other cloud resource you create in those accounts.
Integrations
What makes these prototypes, and production applications, so great to build is the event-driven nature of the platform, and the way services integrate with each other.
Services like AWS Step Functions make a big difference here. Step Functions allow you to chain together actions in over 200 AWS services, without writing much code to orchestrate those workflows. This is great for things like building a signup flow that executes many different steps, or taking a large file stored in S3 and processing it line-by-line, but also allows simple workflows like getting the latest record from a database and sending a Slack message.
Trade-offs
Of course it's not all sunshine and rainbows. Serverless on AWS also comes with some trade-offs:
- Vendor lock-in: our codebase is highly dependent on the AWS SDK to function, and moving to another cloud vendor would require a major rebuild of many of the core components of our platform.
- Observability: with events-based infrastructure, having a clear overview of what's happening (e.g. what events are triggering what, where an event originated, etc.) can be difficult. Luckily, there's great 3rd party tools like Lumigo out there that can help make this a lot easier.
These trade-offs are currently more than worth it for us as a business, but of course as we continue to grow, we'll keep evaluating these and maintain a balance between quick and easy development and business value.