0

I'm running a cluster in AWS (EKS). I'm experiencing an issue with a pod hosting an import service (Python FastAPI endpoint). The pod restarts upon file import.

Reason : OOMKilled - exit code: 137

The initial helm pod setup assigns:

resources:
 limits:
 memory: "512Mi"
 requests:
 memory: "128Mi"

The file weighs 3.75 MB. Memory monitoring does not show any peak (here is a screen shot of some metrics). Locally, the container uses 137 Mo RAM.

Memory monitoring screenshot.

I eventually decided to increase memory to:

resources:
 limits:
 memory: "1024Mi"
 requests:
 memory: "256Mi"

And then it works.

But, I find it kind of hard/weird to accept that amount of memory to handle fairly light files.

Additional info:

  1. Cluster nodes are not under memory pressure (kubetctl describe node <node_name>):

    kubetctl describe node <node_name>

  2. No error on the pod logs. But the restart occurs more or less when the code is loading the file:

    response = s3_client.get_object(Bucket=settings.AWS_BUCKET,Key=import_create.s3_key)
    excel_content = response['Body'].read()
    df = pd.read_excel(io.BytesIO(excel_content), header=1)
    
  3. It works on my machine (Typical computer guy statement...)

  4. The issue is not systematic. I have been able to import another file slightly bigger.

My feeling is that it might not actually be a memory issue. What do you think? How would you troubleshoot that issue? How/where would you look for more detailed information?

Giacomo1968
59.1k23 gold badges180 silver badges225 bronze badges
asked Dec 1 at 22:21
New contributor
Julien is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
0

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.