Why does my Kubernetes HTTP probe fails with a "Broken pipe" exception

, Romain Manni-Bucau, 2020-12-08, 1 min and 16 sec read

Recently I faced a weird issue on OpenShift/Kubernetes where a Tomcat HTTP probe (a Camel healthcheck but it is the same with Spring Boot or Microprofile healthchecks) was failling with a Broken Pipe exception:

org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
        at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:351)
        at org.apache.catalina.connector.OutputBuffer.flushByteBuffer(OutputBuffer.java:776)
        at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:298)
        at org.apache.catalina.connector.OutputBuffer.close(OutputBuffer.java:251)

The exception mentions that the client aborted the call but why? The impact when using the underlying endpoint as a HTTP probe is that Kubernetes thinks the deployment fails so the new version/container does not pop up.

After reviewing Kubernetes HTTP probe code, I realized they had some limitation in their HTTP client which are not obvious upfront. The critical line in my case was this one: https://github.com/kubernetes/kubernetes/blob/c4cf0d49b1108f16d4c560954275f8a1d76175e8/pkg/probe/http/http.go#L36: the client limits the health check payload size to 10k. Back to my application (a camel context using camel healthchecks) I realize the payload is ~50k big! So what happens? Kubernetes HTTP probe gets a HTTP 200 response, starts to read the healthcheck response payload...and stops after 10k data and considers it is a server error.

The fix was obvious: make the output less verbose. It is not that nice because when it fails you loose the details (or you use a "compressed" convention like ok_xxx, ko_xxx) but it works and rolling updates are now functional.

From the same author:

Romain Manni-Bucau

In the same category:

Other

Why does my Kubernetes HTTP probe fails with a "Broken pipe" exception

Navigation