Excessive Postgres Docker CPU Consumption

Question 1

I'm using a Postgres container to run some small non-critical apps and sites. It's been stable for a while, but now the container has started to consume some serious CPU after it's been running for a short period of time. I have removed all other containers which use the Postgres container, and even after starting a new instance, the excessive CPU utilisation reoccurs. In my host (docker stats), I see this:

CONTAINER ID NAME 
CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS cd553249727d data_postgresql.1.ft2gof5jci25xs5w5uqw6eezi 
814.52% 22.11MiB / 46.95GiB 0.05% 129kB / 116kB 0B / 692kB 23

And this (top):

 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28923 70 20 0 633580 19664 488 S 696.7 0.0 2408:51 Dp2N

In the container (top), I see this:

Mem: 42042244K used, 7183656K free, 3622600K shrd, 1952K buff, 30585480K cached
CPU: 63% usr 9% sys 0% nic 26% idle 0% io 0% irq 0% sirq
Load average: 9.77 9.70 9.66 13/508 11090
 PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
 94 1 postgres S 618m 1% 3 58% ./Dp2N <----- WTF?!?!?
 53 52 postgres S 1588 0% 1 1% {systemd} /bin/sh ./systemd
 47 1 postgres S 163m 0% 8 0% postgres: postgrea67 postgres 10.2
 22 1 postgres S 161m 0% 0 0% postgres: autovacuum launcher proc
 20 1 postgres S 161m 0% 8 0% postgres: writer process
 21 1 postgres S 161m 0% 5 0% postgres: wal writer process
 1 0 postgres S 161m 0% 0 0% postgres
 19 1 postgres S 161m 0% 8 0% postgres: checkpointer process
 23 1 postgres S 19988 0% 1 0% postgres: stats collector process
11081 53 postgres R 1588 0% 4 0% [systemd]
 33 0 root S 1576 0% 9 0% sh
 52 47 postgres S 1568 0% 10 0% sh -c setsid ./systemd
 39 33 root R 1508 0% 11 0% top
11083 11081 postgres Z 0 0% 5 0% [grep]
11084 11081 postgres Z 0 0% 4 0% [awk]

Query activity (no idea what select fun308928987('setsid ./systemd') does):

postgres=# select backend_start, usename, application_name, client_addr, client_hostname, query from pg_stat_activity;
 backend_start | usename | application_name | client_addr | client_hostname | query
-------------------------------+------------+------------------+-------------+-----------------+-------------------------------------------------------------------------------------------------------------
 2018年05月23日 07:34:14.694057+00 | postgres | psql | | | select backend_start, usename, application_name, client_addr, client_hostname, query from pg_stat_activity;
 2018年05月23日 01:26:55.235556+00 | postgrea67 | | 10.255.0.2 | | select fun308928987('setsid ./systemd');
 2018年05月23日 07:26:03.519231+00 | postgrea67 | | 10.255.0.2 | | select fun308928987('setsid ./systemd');

In the service logs there are also a large amount of instances of this error:

data_postgresql.1.ft2gof5jci25@IS-57436 | ps: bad -o argument 'command', supported arguments: user,group,comm,args,pid,ppid,pgid,etime,nice,rgroup,ruser,time,tty,vsz,stat,rss

If I kill the Dp2N process within the container, CPU usage returns to normal, but then something immediately spins that process back up. I have googled to see if I can find any info on Dp2N, but to no avail. It's located in an externally mounted volume:

/ # ls -al /var/lib/postgresql/data/pgdata/Dp2N
-rwxrwxrwx 1 postgres postgres 1886536 May 22 23:25 /var/lib/postgresql/data/pgdata/Dp2N

but is seemingly created as it's not part of the base image as far as I can see.

I'm using postgres:9.6.9-alpine. The problem started with postgres:9.6.8-alpine, but upgrading didn't fix it. Any help would be greatly appreciated as this is driving me nuts!

Additional details

Results of running file:

sudo file /var/data/pgdata/pgdata/Dp2N
/var/data/pgdata/pgdata/Dp2N: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.24, BuildID[sha1]=bcb5ccf2bc22d1fcb0676506d7c7f31a9b7148bc, stripped

It turns out that Alpine comes with a limited version of the ps command. Running this:

apk --no-cache add procps

gets the enhanced version and prevents the ps related error in the logs. I've updated the Postgres image to include this, and so far the problem hasn't resurfaced. Speculation is that the CPU is being thrashed trying to execute the command repeatedly after failure.

Diagnosis

As per the answer below, it turns out I've been hacked. I'm currently at a loss to how they got in though. Server locked down to specific user with SSH cert / no password access and root disabled. ('last' only shows my accesses - unless it's been hacked.) No public access to postgresql. Very strong database admin password. Only accessed from 1 other container currently. Seems likely that they got in via the web sites on the server but only got as far as the container operating system in this case, not the host OS. FWIW I'm running a Wordpress site, Grafana, Kibana, Traefik, Portainer and my own .NET based API. I'm starting off with a Wordpress shakedown first, as I've experienced plug-in related infections with it before.

For educational purposes:

https://www.imperva.com/blog/2018/03/deep-dive-database-attacks-scarlett-johanssons-picture-used-for-crypto-mining-on-postgre-database/

Question 2

What does "file /var/lib/postgresql/data/pgdata/Dp2N" give? It's started by systemd, as far as I care to see. Look for it in the systemd services?

Question 3

Hi Gerard, thanks for your swift response. I've added the requested info to the question. Still investigating as systemd seems a likely candidate for starting / restarting processes. It look as though running 'apk --no-cache add procps' should fix the 'ps' related logs issue as alpine comes with a basic version of the command.

Question 4

Systemd is also a likely candidate for other things. What is an executable doing in that directory???

Question 5

No idea what it's doing there, it gets created on the fly. First hypothesis: CPU is being thrashed as for some reason, the container is continually executing the 'ps' command and erroring. So restarted the container and immediately ran the apk command to get the updated version of ps. Quiet so far - we'll see.

Question 6

Why "file /var/data/pgdata/pgdata/Dp2N" when I asked for "file /var/lib/postgresql/data/pgdata/Dp2N"? cfr your "ls -al /var/lib/postgresql/data/pgdata/Dp2N"

Question 7

You have been hacked, and are now mining cryptocurrency for the hacker.

They got in by guessing the password for your postgresql server's super-user account. Then they used the lo_export facility to drop the binary for a user-defined-function which executes arbitrary shell commands. That is what fun308928987 is, the SQL function which was created to wrap this binary.

Best clean up is to just destroy the server and rebuild it, this time setting up an actual strong password for the superuser account. Or better yet, also change pg_hba.conf to not allow super users connections, or preferably any connections, from the outside world.

Question 8

Hi, thanks for this. I'm currently at a loss to how they got in though. Server locked down to specific user with SSH cert / no password access and root disabled. ('last' only shows my accesses - unless it's been hacked.) No public access to postgresql. Very strong admin password. Only accessed from 1 other container currently. There are around 5 web entrypoints on the server, so I guess that's where they came in. Will have to do some more digging to see how I can prevent this in future. I think they only got as far as the container operating system in this case, not the host os.

Question 9

The password I'm talking about is the database user password, not the OS user. last wouldn't help you, but log_connections = on would (for the next time). Are you sure your settings in pg_hba are locked down? Your pg_stat_activity does show the connection coming from a private address, 10.255.0.2, so maybe they compromised that machine and then compromised PostgreSQL service from there.

Question 10

I was using 'last' just to see if they'd managed to gain login access to the server - the strong admin password was a strong database password. Suggestions appreciated though, I will continue looking.

jjanes jjanes 42.4k3 gold badges44 silver badges54 bronze badges · Accepted Answer · 2018-05-23 14:11:28Z

8

You have been hacked, and are now mining cryptocurrency for the hacker.

They got in by guessing the password for your postgresql server's super-user account. Then they used the lo_export facility to drop the binary for a user-defined-function which executes arbitrary shell commands. That is what fun308928987 is, the SQL function which was created to wrap this binary.

Best clean up is to just destroy the server and rebuild it, this time setting up an actual strong password for the superuser account. Or better yet, also change pg_hba.conf to not allow super users connections, or preferably any connections, from the outside world.

Share

Improve this answer

edited May 23, 2018 at 14:16

answered May 23, 2018 at 14:11

jjanes's user avatar

jjanes jjanes

42.4k3 gold badges44 silver badges54 bronze badges

3

Hi, thanks for this. I'm currently at a loss to how they got in though. Server locked down to specific user with SSH cert / no password access and root disabled. ('last' only shows my accesses - unless it's been hacked.) No public access to postgresql. Very strong admin password. Only accessed from 1 other container currently. There are around 5 web entrypoints on the server, so I guess that's where they came in. Will have to do some more digging to see how I can prevent this in future. I think they only got as far as the container operating system in this case, not the host os.

vipes
– vipes

2018年05月23日 19:10:40 +00:00
Commented May 23, 2018 at 19:10
The password I'm talking about is the database user password, not the OS user. last wouldn't help you, but log_connections = on would (for the next time). Are you sure your settings in pg_hba are locked down? Your pg_stat_activity does show the connection coming from a private address, 10.255.0.2, so maybe they compromised that machine and then compromised PostgreSQL service from there.

jjanes
– jjanes

2018年05月23日 22:25:07 +00:00
Commented May 23, 2018 at 22:25
I was using 'last' just to see if they'd managed to gain login access to the server - the strong admin password was a strong database password. Suggestions appreciated though, I will continue looking.

vipes
– vipes

2018年05月24日 11:13:47 +00:00
Commented May 24, 2018 at 11:13

Add a comment |

Stack Exchange Network

Excessive Postgres Docker CPU Consumption

Additional details

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Excessive Postgres Docker CPU Consumption

Additional details

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions