E2E Load Testing 🌡️
« Application failing to scale up to meet sudden surge of requests. »
Why
Load testing is critical to ensure systems can handle peak traffic and prevent outages during major events that will drive a significant part of business earnings (Black Friday sales, breaking news, etc…).
Requirements
- Engineering standards SHOULD define standardized event/request/message headers, including how to flag test events vs real events. This permits to not propagate the testing externally to a third-part vendor
- Teams MUST create dedicated QA / load testing environments. They don’t need to run 24/7 and SHOULD be on-demand load testing environments.
- Teams SHOULD take the opportunity of this exercise to review all parts of their applications: scalability, monitoring, alerts, retry strategies, error handling.
- Load test results MUST be shared across teams to identify bottlenecks. Teams can optimize and scale their services accordingly.
Process
- Identify end-to-end workflows (input and expected)
- Estimate throughput with safety margins based on data from previous year and current year evolution
- Schedule the different segments having to be tested
- Prepare a load testing dashboard to narrow down quickly on issues
- Reminders, heads-up, and communication channel sharing test results
- Repeat the scenario until it’s successful
Making it a self-service platform
In some companies, it’s expected from every team to implement these load tests. It often leads to copy-paste solutions, diverging set of tools being used, and significant work to include in the project planning.
Instead, a platform can be built and focus on:
- Being extensible, and support any I/O (HTTP, Kafka, Pub/Sub, …) - you select your endpoint / topic / …
- Accepting any payload type, it can be:
- a human-readable format like JSON
- a file containing bytes (Avro, Protobuf, etc…)
- Selecting a load test mode, if you want for example a topic to be pre-populated with X events, or continuously receiving a certain throughput.
- Validating the output - it can be a trivial validation focused on throughput or also validating semantically the payload with regex validation.
- (Optional) Sending an alert about the load test being scheduled, starting, being interrupted, ending, results … if needed to a channel.
Tools
and others like Artillery.io , JMeter , Gatling , etc…
Last updated on