I'm running a cluster in AWS (EKS). I'm experiencing an issue with a pod hosting an import service (Python FastAPI endpoint). The pod restarts upon file import.
Reason : OOMKilled - exit code: 137
The initial helm pod setup assigns:
resources:
limits:
memory: "512Mi"
requests:
memory: "128Mi"
The file weighs 3.75 MB. Memory monitoring does not show any peak (here is a screen shot of some metrics). Locally, the container uses 137 Mo RAM.
I eventually decided to increase memory to:
resources:
limits:
memory: "1024Mi"
requests:
memory: "256Mi"
And then it works.
But, I find it kind of hard/weird to accept that amount of memory to handle fairly light files.
Additional info:
Cluster nodes are not under memory pressure (
kubetctl describe node <node_name>):No error on the pod logs. But the restart occurs more or less when the code is loading the file:
response = s3_client.get_object(Bucket=settings.AWS_BUCKET,Key=import_create.s3_key) excel_content = response['Body'].read() df = pd.read_excel(io.BytesIO(excel_content), header=1)It works on my machine (Typical computer guy statement...)
The issue is not systematic. I have been able to import another file slightly bigger.
My feeling is that it might not actually be a memory issue. What do you think? How would you troubleshoot that issue? How/where would you look for more detailed information?