Your Application

To get comparable results from very different web technologies, we first fix the use case. We do not reinvent the wheel, but use the Hotel Booking Example well-known to java folks.

It is simple to implement but incorporates a landing page, hotel search, user registration, a hotel detail page and the actual booking as a transactional database write operation. Thus it is "simple, but not too simple".

The Test Script

We of course want an objective, quantifiable result. Therefore we run a script against your solution, requesting pages and benchmarking as well as checking the result. We nevertheless were faced with the requirement to write one single script being able to robustly interact with web apps of different technology. There are, as you know well, subtle details of state and session management that vary from tech to tech. Classical load testing tools like e.g. JMeter thus have to be tuned to the app under test.

Our tagbrowser is a very simple layer on top of Apaches http components and the JSoup HTML parser. At the expense of parsing the pages your solution delivers, it provides a way to interact with the app as the user does: By clicking on links or submit buttons with ever handling concrete HTTP details like URLs or cookies.

Therefore we provide one tagbrowser-Script that is capable of interacting with solutions of each technology. You can download this script in advance to test whether your app conformes to the test requirements.

Our Test Setup

After you submit your solution, we do install it on an EC2 instance. This instance is preconfigured with an PHP-enabled Apache, a tomcat, a mysql instance and the test script. Being of course unusual and not production-like, this setup eliminates network effects. We then run the script, check the database final state and get the benchmark results.

Result interpretation and ranking

The test script executes a sequence of tests on the freshly installed application. Each test is done with an increased number of parallel clients. At the beginning of each test, one client is doing a full scan of the application. After that, the number of parallel clients are performing one scan each, without ramp-up time. A scan is performing a number of walkthroughs in sequence, each walkthrough being a sequential number of requests to the application, parsing the response and constructing the next request.

During each test, for each page and for all pages together, the following numbers are measured:

Response times thus are the primary number for performance assessment. Their deviation allows to asses result confidence and the parse times allow to estimate whether the dynamic request calculation required to do technology agnostic benchmarking is not invalidating load results.

At this point, a series of average response times versus client counts is on hand. One usually calculates the throughput (the number of pages that can be delivered in one second) as the number of clients divided by the average response time. The thoughput usually has a sharp peak at relatively low lient numbers. This peak denotes the number of clients that can be served without waiting for resources. We take this maximum throughput, maxTP, as the primary indiator for performance ranking.

Another interesting behaviour of the application is how it does scale in terms of response times when additional clients are requesting pages beyond the maximum throughput barrier. We simply calculate the number of microseconds an additional client causes in average to get some feeling about client scaling. This of course has nothing to do with scaling with the number of CPU nodes.

Future Plans

We hope this site to be continuously updated with better solutions and newer technologies.

If we receive lots of solutions, we consider automating the checkout, install and test procedure.

blog comments powered by Disqus