bash script to collect new network sockets in a given period of time

Question 1

The following bash code is meant to check every second the amount of NEW (relative to the last second) network socket files. at the end of the run it summarizes every 60 entries (should be 60 seconds) and output a file called verdict.csv that tells me how many new network sockets were opened in that minute (I run under the assumption that those sockets live for more than 1 second and hence I don't miss new ones).

The problem starts when I run it on a busy server where I have a lot of new network sockets being opened, then I start seeing that the lsof_func iterations takes much more then 1 second (even more than a minute some times) and than I cannot trust the output of this script.

#!/bin/bash
TIMETORUN=84600 # Time for the script to run in seconds
NEWCONNECTIONSPERMINUTE=600
# collect number of new socket files in the last second
lsof_func () {
 echo "" > /tmp/lsof_test
 while [[ $TIME -lt $TIMETORUN ]]; do
 lsof -i -t > /tmp/lsof_test2
 echo "$(date +"%Y-%m-%d %H:%M:%S"),$(comm -23 <(cat /tmp/lsof_test2|sort) <(cat /tmp/lsof_test|sort) | wc -l)" >> /tmp/results.csv # comm command is used as a set subtractor operator (lsof_test minus lsof_test2)
 mv /tmp/lsof_test2 /tmp/lsof_test
 TIME=$((TIME+1))
 sleep 0.9
 done
}
# Calculate the number of new connections per minute
verdict () {
 cat /tmp/results.csv | uniq > /tmp/results_for_verdict.csv
 echo "Timestamp,New Procs" > /tmp/verdict.csv
 while [[ $(cat /tmp/results_for_verdict.csv | wc -l) -gt 60 ]]; do
 echo -n $(cat /tmp/results_for_verdict.csv | head -n 1 | awk -F, '{print 1ドル}'), >> /tmp/verdict.csv
 cat /tmp/results_for_verdict.csv | head -n 60 | awk -F, '{s+=2ドル} END {print s}' >> /tmp/verdict.csv
 sed -n '61,$p' < /tmp/results_for_verdict.csv > /tmp/tmp_results_for_verdict.csv
 mv /tmp/tmp_results_for_verdict.csv /tmp/results_for_verdict.csv
 done
 echo -n $(cat /tmp/results_for_verdict.csv | head -n 1 | awk -F, '{print 1ドル}'), >> /tmp/verdict.csv
 cat /tmp/results_for_verdict.csv | head -n 60 | awk -F, '{s+=2ドル} END {print s}' >> /tmp/verdict.csv
}
lsof_func
verdict
#cleanup
rm /tmp/lsof_test
#rm /tmp/lsof_test2
rm /tmp/results.csv
rm /tmp/results_for_verdict.csv

How can I make the iterations of lsof_func function be more consistent / run faster and collect this data every second?

Question 2

It's possible that another tool could yield better performance than lsof but make sure you are aware of all the available command line arguments to fine tune your request. I would have a look at the ss command which is the successor of netstat. Try for example ss -s for a start. This could be enough for your purpose, you only need counters and not the full details even if the information is aggregated after collection.

Question 3

For anyone in the Close Vote Queue. This is not broken code, this is a performance issue which is on-topic for the Code Review Community.

Question 4

@Anonymous I think that might be most of a good answer.

Question 5

Is 84600 a typo for 86400? If not, what's significant about 23½ hours that makes it a good sampling period?

Question 6

@TobySpeight Yes this is a Typo...

Question 7

We have a simple bug - using lsof -t causes it to print one line per process rather than one line per socket. If we want to observe changes to the open sockets as claimed in the question, then we'll want something like lsof -i -b -n -F 'n' | grep '^n'.

Instead of using lsof, it may be more efficient to use netstat; on my lightly-loaded system it's about 10-20 times as fast, but you should benchmark the two on your target system.

So instead of comparing subsequent runs of lsof -i -t | sort, we could compare runs of

netstat -tn | awk '{print 4,ドル5ドル}' | sort

Some things to note here:

netstat -t examines TCP connections over IPv4 and IPv6. I believe that's what's wanted.
netstat -n, like lsof -n, saves a vast amount of time by not doing DNS reverse lookup.
awk is more suitable than cut for selecting columns, since netstat uses a variable number of spaces to separate fields.
Netstat includes a couple of header lines, but because these are the same in every invocation, they will disappear in the comparison. We could remove them if we really want: awk 'FNR>2 {print 4,ドル5ドル}'.

Question 8

Temporary files

Don't assume that /tmp/ is the right location for temporary files - if $TMPDIR is set, we should prefer that directory instead (e.g. which pam_tmpdir is used for per-user temp directories).

It's a good idea to create a single directory for all the script's temporary files, then we can arrange for it to be cleaned up however the script exits:

export TMPDIR=$(mktemp -d)
trap 'rm -rf "$TMPDIR"' EXIT

By using the well-known TMPDIR environment variable for this, we also clean up any of our subprocesses' temporary files if they get left lying around.

We can also simplify code by changing into the temporary directory (but always fail if cd is unsuccessful, either explicitly or by using set -e).

Variables

Prefer lower-case for non-exported shell variables, as they share a namespace with environment variables, and upper-case is conventionally used for communicating between programs.

Extending the comment would expose a bug:

timetorun=84600 # Time for the script to run (1 day)

We could then see that actually 86,400 seconds was intended.

NEWCONNECTIONSPERMINUTE certainly needs a comment, as it appears to be completely unused.

Unnecessary `cat`

There's no need to concatenate a single file like this:

cat results.csv | uniq > results_for_verdict.csv

Simply redirect standard input using <:

<results.csv uniq >results_for_verdict.csv

or pass the filenames as argument to uniq:

uniq results.csv results_for_verdict.csv

An even more egregious case is the use of cat in process substitution here:

comm -23 <(cat lsof_test2|sort) <(cat lsof_test|sort)

The obvious transformation is to:

comm -23 <(sort lsof_test2) <(sort lsof_test)

But if we arrange for the files to be sorted (lsof -i -t | sort >lsof_test2), then we can just pass the file names directly, and we only sort each file once rather than twice:

comm -23 lsof_test2 lsof_test

Timing

Counting loop iterations is a very approximate method of timing. A better way is to calculate the time we should finish, and loop until that time is reached. We can use the Bash magic variable SECONDS to determine how long the script has been running, so our test becomes:

local -i endtime=$SECONDS+$timetorun
while [ $SECONDS -lt $endtime ]

Multiple opens of output files

Instead of opening for append each time around the loop, we can redirect the entire loop's output:

while ...
⋮
done >results.csv

Or let lsof_func just write to its standard output, and pipe that into verdict (which can also write to its standard output):

lsof_func | verdict >verdict.csv

Splitting by minute

Instead of a shell loop to count lines, we could use the standard split utility to break our input into 60-line files. That simplifies our code a great deal:

verdict() {
 echo "Timestamp,New Procs"
 # Create 1 file per 60 seconds
 uniq | split -l 60 - results_
 for file in results_*
 do
 printf '%s' "$(head -n 1 "$file" | awk -F, '{print 1ドル}'),"
 awk -F, '{s+=2ドル} END {print s}' "$file"
 done
}

I don't think we need two separate awk programs here, as a single one can capture the initial timestamp and accumulate the values:

for file in results_*
do
 awk -F, 'FNR==1{ts=1ドル} {s+=2ドル} END{OFS="," ; print ts,s}' "$file"
done

The full program would then be

#!/bin/bash
set -eu
timetorun=86400 # Gather statistics for 1 day
export TMPDIR=$(mktemp -d)
trap 'rm -rf "$TMPDIR"' EXIT
cd "$TMPDIR" || exit $?
# collect number of new socket files in the last second
lsof_func() {
 true > lsof_test
 local -i endtime=$SECONDS+$timetorun
 while [ $SECONDS -lt $endtime ]
 do
 lsof -i -t | sort >lsof_test2
 date +"%F %T,$(comm -23 lsof_test2 lsof_test | wc -l)"
 mv lsof_test2 lsof_test
 sleep 0.9
 done
}
# Calculate the number of new connections per minute
verdict() {
 echo "Timestamp,New Procs"
 # Create 1 file per 60 seconds
 uniq | split -l 60 - results_
 for file in results_*
 do
 awk -F, 'FNR==1{ts=1ドル} {s+=2ドル} END{OFS="," ; print ts,s}' "$file"
 done
}
lsof_func | verdict

Alternative approach

Instead of writing to files and post-processing them with awk, we could simply hold all our data in memory, in a Bash array. We can add as we go, so that we're only storing one value per minute.

Easiest of all is if we care only about calendar minutes, and don't mind that we have a partial minute at start and end:

# collect number of new socket files in each second, and add to per-minute counter
declare -i endtime=$SECONDS+$timetorun
declare -A -i minute
while [ "$SECONDS" -lt "$endtime" ]
do
 lsof -i -t | sort >lsof_test2
 minute["$(date +'%F %R')"]+=$(comm -23 lsof_test2 lsof_test | wc -l)
 mv lsof_test2 lsof_test
 sleep 0.9
done
# Output the number of new connections per minute
echo "Timestamp,New Procs"
for t in "${!minute[@]}"
do
 printf '%s,%u\n' "$t" "${minute[$t]}"
done

If we want to keep the current behaviour, we'll need to store into arbitrary 60-second chunks, and store the times separately:

declare -i endtime=$SECONDS+$timetorun
declare -A -i minute
declare -A date
while [ $SECONDS -lt $endtime ]
do
 lsof -i -t | sort >lsof_test2
 declare -i m=$SECONDS/60
 minute[$m]+=$(comm -23 lsof_test2 lsof_test | wc -l)
 date[$m]=$(date -d -1min '+%F %T')
 mv lsof_test2 lsof_test
 sleep 0.9
done
# Output the number of new connections per minute
echo "Timestamp,New Procs"
for m in "${!minute[@]}"
do
 printf '%s,%u\n' "${date[$m]}" "${minute[$m]}"
done | sort

Modified code

The full, rewritten version (with no Shellcheck warnings):

#!/bin/bash
set -eu -o pipefail
declare -i endtime=86400 # Stop after 1 day
TMPDIR=$(mktemp -d); export TMPDIR
trap 'rm -rf "$TMPDIR"' EXIT
cd "$TMPDIR"
list_sockets() {
 lsof -i -t | sort
}
# Initial open sockets
list_sockets >sockets
# Update per-minute counters with new sockets
declare -A -i minute
declare -A date
while [ $SECONDS -lt $endtime ]
do
 mv sockets sockets.old
 list_sockets >sockets
 declare -i m=$SECONDS/60
 minute[$m]+=$(comm -23 sockets sockets.old | wc -l)
 date[$m]=$(date -d -1min '+%F %T')
 sleep 0.5
done
# Output the number of new connections per minute
echo "Timestamp,New Procs"
for m in "${!minute[@]}"
do printf '%s,%u\n' "${date[$m]}" "${minute[$m]}"
done | sort

Question 9

This is great, thank you very much for all the output. I didn't try everything but I did use variables instead if files and that alone made performance much better, I will try and implement other suggestions you had thanks!

Toby Speight Toby Speight 87.7k14 gold badges104 silver badges325 bronze badges · Answer 1 · 2021-09-04 12:18:22Z

We have a simple bug - using lsof -t causes it to print one line per process rather than one line per socket. If we want to observe changes to the open sockets as claimed in the question, then we'll want something like lsof -i -b -n -F 'n' | grep '^n'.

Instead of using lsof, it may be more efficient to use netstat; on my lightly-loaded system it's about 10-20 times as fast, but you should benchmark the two on your target system.

So instead of comparing subsequent runs of lsof -i -t | sort, we could compare runs of

netstat -tn | awk '{print 4,ドル5ドル}' | sort

Some things to note here:

netstat -t examines TCP connections over IPv4 and IPv6. I believe that's what's wanted.
netstat -n, like lsof -n, saves a vast amount of time by not doing DNS reverse lookup.
awk is more suitable than cut for selecting columns, since netstat uses a variable number of spaces to separate fields.
Netstat includes a couple of header lines, but because these are the same in every invocation, they will disappear in the comparison. We could remove them if we really want: awk 'FNR>2 {print 4,ドル5ドル}'.

Toby Speight Toby Speight 87.7k14 gold badges104 silver badges325 bronze badges · Answer 2 · 2021-09-04 09:46:24Z

Temporary files

Don't assume that /tmp/ is the right location for temporary files - if $TMPDIR is set, we should prefer that directory instead (e.g. which pam_tmpdir is used for per-user temp directories).

It's a good idea to create a single directory for all the script's temporary files, then we can arrange for it to be cleaned up however the script exits:

export TMPDIR=$(mktemp -d)
trap 'rm -rf "$TMPDIR"' EXIT

By using the well-known TMPDIR environment variable for this, we also clean up any of our subprocesses' temporary files if they get left lying around.

We can also simplify code by changing into the temporary directory (but always fail if cd is unsuccessful, either explicitly or by using set -e).

Variables

Prefer lower-case for non-exported shell variables, as they share a namespace with environment variables, and upper-case is conventionally used for communicating between programs.

Extending the comment would expose a bug:

timetorun=84600 # Time for the script to run (1 day)

We could then see that actually 86,400 seconds was intended.

NEWCONNECTIONSPERMINUTE certainly needs a comment, as it appears to be completely unused.

Unnecessary `cat`

There's no need to concatenate a single file like this:

cat results.csv | uniq > results_for_verdict.csv

Simply redirect standard input using <:

<results.csv uniq >results_for_verdict.csv

or pass the filenames as argument to uniq:

uniq results.csv results_for_verdict.csv

An even more egregious case is the use of cat in process substitution here:

comm -23 <(cat lsof_test2|sort) <(cat lsof_test|sort)

The obvious transformation is to:

comm -23 <(sort lsof_test2) <(sort lsof_test)

But if we arrange for the files to be sorted (lsof -i -t | sort >lsof_test2), then we can just pass the file names directly, and we only sort each file once rather than twice:

comm -23 lsof_test2 lsof_test

Timing

Counting loop iterations is a very approximate method of timing. A better way is to calculate the time we should finish, and loop until that time is reached. We can use the Bash magic variable SECONDS to determine how long the script has been running, so our test becomes:

local -i endtime=$SECONDS+$timetorun
while [ $SECONDS -lt $endtime ]

Multiple opens of output files

Instead of opening for append each time around the loop, we can redirect the entire loop's output:

while ...
⋮
done >results.csv

Or let lsof_func just write to its standard output, and pipe that into verdict (which can also write to its standard output):

lsof_func | verdict >verdict.csv

Splitting by minute

Instead of a shell loop to count lines, we could use the standard split utility to break our input into 60-line files. That simplifies our code a great deal:

verdict() {
 echo "Timestamp,New Procs"
 # Create 1 file per 60 seconds
 uniq | split -l 60 - results_
 for file in results_*
 do
 printf '%s' "$(head -n 1 "$file" | awk -F, '{print 1ドル}'),"
 awk -F, '{s+=2ドル} END {print s}' "$file"
 done
}

I don't think we need two separate awk programs here, as a single one can capture the initial timestamp and accumulate the values:

for file in results_*
do
 awk -F, 'FNR==1{ts=1ドル} {s+=2ドル} END{OFS="," ; print ts,s}' "$file"
done

The full program would then be

#!/bin/bash
set -eu
timetorun=86400 # Gather statistics for 1 day
export TMPDIR=$(mktemp -d)
trap 'rm -rf "$TMPDIR"' EXIT
cd "$TMPDIR" || exit $?
# collect number of new socket files in the last second
lsof_func() {
 true > lsof_test
 local -i endtime=$SECONDS+$timetorun
 while [ $SECONDS -lt $endtime ]
 do
 lsof -i -t | sort >lsof_test2
 date +"%F %T,$(comm -23 lsof_test2 lsof_test | wc -l)"
 mv lsof_test2 lsof_test
 sleep 0.9
 done
}
# Calculate the number of new connections per minute
verdict() {
 echo "Timestamp,New Procs"
 # Create 1 file per 60 seconds
 uniq | split -l 60 - results_
 for file in results_*
 do
 awk -F, 'FNR==1{ts=1ドル} {s+=2ドル} END{OFS="," ; print ts,s}' "$file"
 done
}
lsof_func | verdict

Alternative approach

Instead of writing to files and post-processing them with awk, we could simply hold all our data in memory, in a Bash array. We can add as we go, so that we're only storing one value per minute.

Easiest of all is if we care only about calendar minutes, and don't mind that we have a partial minute at start and end:

# collect number of new socket files in each second, and add to per-minute counter
declare -i endtime=$SECONDS+$timetorun
declare -A -i minute
while [ "$SECONDS" -lt "$endtime" ]
do
 lsof -i -t | sort >lsof_test2
 minute["$(date +'%F %R')"]+=$(comm -23 lsof_test2 lsof_test | wc -l)
 mv lsof_test2 lsof_test
 sleep 0.9
done
# Output the number of new connections per minute
echo "Timestamp,New Procs"
for t in "${!minute[@]}"
do
 printf '%s,%u\n' "$t" "${minute[$t]}"
done

If we want to keep the current behaviour, we'll need to store into arbitrary 60-second chunks, and store the times separately:

declare -i endtime=$SECONDS+$timetorun
declare -A -i minute
declare -A date
while [ $SECONDS -lt $endtime ]
do
 lsof -i -t | sort >lsof_test2
 declare -i m=$SECONDS/60
 minute[$m]+=$(comm -23 lsof_test2 lsof_test | wc -l)
 date[$m]=$(date -d -1min '+%F %T')
 mv lsof_test2 lsof_test
 sleep 0.9
done
# Output the number of new connections per minute
echo "Timestamp,New Procs"
for m in "${!minute[@]}"
do
 printf '%s,%u\n' "${date[$m]}" "${minute[$m]}"
done | sort

Modified code

The full, rewritten version (with no Shellcheck warnings):

#!/bin/bash
set -eu -o pipefail
declare -i endtime=86400 # Stop after 1 day
TMPDIR=$(mktemp -d); export TMPDIR
trap 'rm -rf "$TMPDIR"' EXIT
cd "$TMPDIR"
list_sockets() {
 lsof -i -t | sort
}
# Initial open sockets
list_sockets >sockets
# Update per-minute counters with new sockets
declare -A -i minute
declare -A date
while [ $SECONDS -lt $endtime ]
do
 mv sockets sockets.old
 list_sockets >sockets
 declare -i m=$SECONDS/60
 minute[$m]+=$(comm -23 sockets sockets.old | wc -l)
 date[$m]=$(date -d -1min '+%F %T')
 sleep 0.5
done
# Output the number of new connections per minute
echo "Timestamp,New Procs"
for m in "${!minute[@]}"
do printf '%s,%u\n' "${date[$m]}" "${minute[$m]}"
done | sort

This is great, thank you very much for all the output. I didn't try everything but I did use variables instead if files and that alone made performance much better, I will try and implement other suggestions you had thanks!

Stack Exchange Network

bash script to collect new network sockets in a given period of time

2 Answers 2

Temporary files

Variables

Unnecessary `cat`

Timing

Multiple opens of output files

Splitting by minute

Alternative approach

Modified code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

bash script to collect new network sockets in a given period of time

2 Answers 2

Temporary files

Variables

Unnecessary cat

Timing

Multiple opens of output files

Splitting by minute

Alternative approach

Modified code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Unnecessary `cat`