You maybe don't know if but since years Tomcat provides a concurrent limit implementation. The implementation is based on a semaphore which will control the concurrency.

It has several configurations like:

  • concurrency: the central configuration defining the max concurrent request you can accept.
  • fairness: is the semaphore fair. If true the acquire call order will be respected when granting permissions, otherwise under contention you can release a request being entered after another one.
  • block: should the waiting block to get the permissions (queueing the requests).
  • interruptible: if blocking, should the permission acquiring be interruptible or should the semaphore acquire the permission anyway and then the thread be interrupted.

Here is a sample definition in server.xml:

<Valve className="org.apache.catalina.valves.SemaphoreValve"
       concurrency="150" block="false" fairness="false" />

This configuration can be put anywhere valves are accepted so likely either on the host to control the whole instance or the context to control a single web application.

This will work fine and if you test with ab for instance you would get something like:

> ab -c 50 -n 1000 http://localhost:8080/test
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:
Server Hostname:        localhost
Server Port:            8080

Document Path:          /test
Document Length:        2 bytes

Concurrency Level:      50
Time taken for tests:   0.547 seconds
Complete requests:      1000
Failed requests:        903
   (Connect: 0, Receive: 0, Length: 903, Exceptions: 0)
Total transferred:      94716 bytes
HTML transferred:       194 bytes
Requests per second:    1826.98 [#/sec] (mean)
Time per request:       27.368 [ms] (mean)
Time per request:       0.547 [ms] (mean, across all concurrent requests)
Transfer rate:          168.99 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.6      0       3
Processing:     1   26  32.0     16     216
Waiting:        1   25  31.9     15     215
Total:          1   26  32.4     16     218
WARNING: The median and mean for the initial connection time are not within a normal deviation
        These results are probably not that reliable.

Percentage of the requests served within a certain time (ms)
  50%     16
  66%     21
  75%     25
  80%     29
  90%     54
  95%    104
  98%    149
  99%    150
 100%    218 (longest request)

The interesting part being:

Failed requests:        903

If I remove the valve I would get:

Failed requests:        0

So this works but it has 2 issues at the moment:

  • the finest grain is the webapp, not a subcontext or some paths
  • in case of an error you still get a HTTP 200

To solve that the valve is designed to be inherited and you get two smooth hooks for these two use cases:

public class MySemaphoreValve extends SemaphoreValve {
    @Override
    public boolean controlConcurrency(final Request request, final Response response) {
        return request.getRequestURI().startsWith("/test");
    }

    @Override
    public void permitDenied(final Request request, final Response response) throws IOException, ServletException {
        response.sendError(HttpServletResponse.SC_PRECONDITION_FAILED);
    }
}
  1. controlConcurrency allows us to match exactly the paths we want to limit. It also means adding N times this valve you can concurrent limit per endpoint or even combine global concurrent limit with endpoint concurrent limit. For example: global limit is 1000 but GET /api/user is 800 and POST /api/user is 500. This allows to give more space for high throughouput endpoints keeping a kind of safety limit on the overall instance.
  2. the permitDenied hook gives you the opportunity to change the response (status for us there) before the valve exists when a limit is reached. In our case we just set a HTTP 412 to show it was not a valid execution (HTTP 200 would be quite misleading for the client).

Now we set a HTTP 412 on a concurrent limited request we'll get this additional output with ab:

Non-2xx responses:      535

So globally this valves gives you the opportunity to keep the control of your instance and avoid to accept too much requests in a smooth manner. Note however there is one pitfall using that implementation: if you rely a lot of AsyncContext (servlet asynchronism), directly or transitively (though JAX-RS @Suspended for instance) then you would just limit the creation of asynchronous requests and not the request themself which would need to use your own semaphore (though a filter supporting asynchronism wrapping the AsyncContext for instance).

 

From the same author:

In the same category: