To get comparable results from very different web technologies, we first fix the use case. We do not reinvent the wheel, but use the Hotel Booking Example well-known to java folks.
It is simple to implement but incorporates a landing page, hotel search, user registration, a hotel detail page and the actual booking as a transactional database write operation. Thus it is "simple, but not too simple".
The Test Script
We of course want an objective, quantifiable result. Therefore we run a script against your solution, requesting pages and benchmarking as well as checking the result. We nevertheless were faced with the requirement to write one single script being able to robustly interact with web apps of different technology. There are, as you know well, subtle details of state and session management that vary from tech to tech. Classical load testing tools like e.g. JMeter thus have to be tuned to the app under test.
Our tagbrowser is a very simple layer on top of Apaches http components and the JSoup HTML parser. At the expense of parsing the pages your solution delivers, it provides a way to interact with the app as the user does: By clicking on links or submit buttons with ever handling concrete HTTP details like URLs or cookies.
Therefore we provide one tagbrowser-Script that is capable of interacting with solutions of each technology. You can download this script in advance to test whether your app conformes to the test requirements.
Our Test Setup
After you submit your solution, we do install it on an EC2 instance. This instance is preconfigured with an PHP-enabled Apache, a tomcat, a mysql instance and the test script. Being of course unusual and not production-like, this setup eliminates network effects. We then run the script, check the database final state and get the benchmark results.
Result interpretation and ranking
The test script executes a sequence of tests on the freshly installed application. Each test is done with an increased number of parallel clients. At the beginning of each test, one client is doing a full scan of the application. After that, the number of parallel clients are performing one scan each, without ramp-up time. A scan is performing a number of walkthroughs in sequence, each walkthrough being a sequential number of requests to the application, parsing the response and constructing the next request.
During each test, for each page and for all pages together, the following numbers are measured:
- the average of the response times
- the standard deviation of the response times
- the average time used to parse the responses
At this point, a series of average response times versus client counts is on hand. One usually calculates the throughput (the number of pages that can be delivered in one second) as the number of clients divided by the average response time. The thoughput usually has a sharp peak at relatively low lient numbers. This peak denotes the number of clients that can be served without waiting for resources. We take this maximum throughput, maxTP, as the primary indiator for performance ranking.
Another interesting behaviour of the application is how it does scale in terms of response times when additional clients are requesting pages beyond the maximum throughput barrier. We simply calculate the number of microseconds an additional client causes in average to get some feeling about client scaling. This of course has nothing to do with scaling with the number of CPU nodes.
We hope this site to be continuously updated with better solutions and newer technologies.
If we receive lots of solutions, we consider automating the checkout, install and test procedure.blog comments powered by Disqus