Skip to Content

E2E Load Testing 🌡️

Production On Fire « Application failing to scale up to meet sudden surge of requests. »

Why

Load testing is critical to ensure systems can handle peak traffic and prevent outages during major events that will drive a significant part of business earnings (Black Friday sales, breaking news, etc…).

Requirements

  • Engineering standards SHOULD define standardized event/request/message headers, including how to flag test events vs real events. This permits to not propagate the testing externally to a third-part vendor
  • Teams MUST create dedicated QA / load testing environments. They don’t need to run 24/7 and SHOULD be on-demand load testing environments.
  • Teams SHOULD take the opportunity of this exercise to review all parts of their applications: scalability, monitoring, alerts, retry strategies, error handling.
  • Load test results MUST be shared across teams to identify bottlenecks. Teams can optimize and scale their services accordingly.

Process

  1. Identify end-to-end workflows (input and expected)
  2. Estimate throughput with safety margins based on data from previous year and current year evolution
  3. Schedule the different segments having to be tested
  4. Prepare a load testing dashboard to narrow down quickly on issues
  5. Reminders, heads-up, and communication channel sharing test results
  6. Repeat the scenario until it’s successful

Making it a self-service platform

In some companies, it’s expected from every team to implement these load tests. It often leads to copy-paste solutions, diverging set of tools being used, and significant work to include in the project planning.

Instead, a platform can be built and focus on:

  • Being extensible, and support any I/O (HTTP, Kafka, Pub/Sub, …) - you select your endpoint / topic / …
  • Accepting any payload type, it can be:
    • a human-readable format like JSON
    • a file containing bytes (Avro, Protobuf, etc…)
  • Selecting a load test mode, if you want for example a topic to be pre-populated with X events, or continuously receiving a certain throughput.
  • Validating the output - it can be a trivial validation focused on throughput or also validating semantically the payload with regex validation.
  • (Optional) Sending an alert about the load test being scheduled, starting, being interrupted, ending, results … if needed to a channel.

Tools

and others like Artillery.io , JMeter , Gatling , etc…

Last updated on