We are currently running a Java 17 app on Cloud Run and have encountered an unusual issue. While the service usually operates smoothly, a small percentage of requests (both GET and POST) fail unexpectedly.
These failed requests return either a 503 or 504 status, often appearing in pairs (which I observed today). Additionally, the failed requests share the same instanceID, and oddly, some successful requests are also associated with this instance. Meanwhile, the liveness probe is functioning correctly without any issues, despite customer-facing requests failing. The liveness probe checks our database, Redis connections, and other integrations, such as file storage connections.
The 503s include the following text payload:
The request failed because either the HTTP response was malformed or connection to the instance had an error. Additional troubleshooting documentation can be found at: https://cloud.google.com/run/docs/troubleshooting#malformed-response-or-connection-error
Another Spring Boot app, trying to access the API via a FeignClient, is receiving a feign.FeignException$ServiceUnavailable. I'm wondering if this could be related to a load balancer issue. Perhaps the health checks are passing correctly because they bypass the load balancer, but the actual requests are being affected by it?
Our CPU and memory usage are within reasonable limits, so I don't believe the issue is due to our instances being under-provisioned. Many of the failing requests are "simple" requests that typically respond in under 100ms.
-
Can you share more about your Cloud Run configuration, your code and if you are doing something "special" (websocket, streaming,...)?guillaume blaquiere– guillaume blaquiere2025年01月15日 16:31:54 +00:00Commented Jan 15, 2025 at 16:31
-
My code involves querying a database, where some operations are trivial (usually returning in 50-100ms), while others are more complex, such as accessing Google Cloud Storage and performing calculations, which can take 5-10 seconds to complete. Here's a high-level overview of my Cloud Run configuration: 8 CPU units, 8GB of RAM, and 8 minimum instances.XII– XII2025年01月16日 06:58:59 +00:00Commented Jan 16, 2025 at 6:58
-
It's more platform related I think. The Google support would help you on thisguillaume blaquiere– guillaume blaquiere2025年01月16日 19:36:44 +00:00Commented Jan 16, 2025 at 19:36
-
Unfortunately, the last time I reached out to them, they weren't very helpful. They simply referred me to their public documentation on 503 errors and, as far as I could tell, didn’t conduct any specific investigation.XII– XII2025年01月17日 08:30:49 +00:00Commented Jan 17, 2025 at 8:30
1 Answer 1
In case you haven’t tried yet, please check the troubleshooting guide for recommended steps to rule out application side failure:
Check Cloud Logging
App-level timeouts
Downstream network bottleneck
Inbound request limit to a single container
Another thing to consider is investigating if there’s a mismatch in the location of your resources. This solution works here and could be useful to you (hopefully).
If the above options still won’t resolve it, this could be a Cloud Run specific issue and better addressed by the Google Cloud Support team. You may reach out to them via below channels:
Premium support - paid support option
Cloud Run Public Issue Tracker - full list of open tickets
1 Comment
Explore related questions
See similar questions with these tags.