Parsing a command output - Python

Question 1

I'm running a utility that parses the output of the df command. I capture the output and send it to my parser. Here's a sample:

Filesystem 512-blocks Used Available Capacity iused ifree %iused Mounted on
/dev/disk2 1996082176 430874208 1564695968 22% 2429281 4292537998 0% /
devfs 668 668 0 100% 1156 0 100% /dev
map -hosts 0 0 0 100% 0 0 100% /net
map auto_home 0 0 0 100% 0 0 100% /home

Here's the function:

def parse_df(self, content):
 """Parse the `df` content output
 :param content: The command content output
 :return: (list) A list of objects of the type being parsed
 """
 entries = []
 if not content:
 return entries
 # Split the content by line and check if we should ignore first line
 for line in content.split("\n"):
 if line.startswith("Filesystem"):
 continue
 tokens = line.split()
 print tokens

However I'm getting the following output:

['/dev/disk2', '1996082176', '430876480', '1564693696', '22%', '2429288', '4292537991', '0%', '/']
['devfs', '668', '668', '0', '100%', '1156', '0', '100%', '/dev']
['map', '-hosts', '0', '0', '0', '100%', '0', '0', '100%', '/net']
['map', 'auto_home', '0', '0', '0', '100%', '0', '0', '100%', '/home']

The issue is map -host is supposed to be a single element (for the Filesystem column). I've tried to apply a regex tokens = re.split(r'\s{2,}', line) but the result was still not correct:

['/dev/disk2', '1996082176 430869352 1564700824', '22% 2429289 4292537990', '0%', '/']

What would be the correct way to parse the output?

Question 2

You need to use a different delimiter maybe like \t? Even multiple spaces should work.

Question 3

Each column has a fixed width. You could try splitting based on that

Question 4

@Nishant: Splitting by \t: ['/dev/disk2 1996082176 430874728 1564695448 22% 2429300 4292537979 0% /']

Question 5

Sounds like a job for regular expressions; or os.statvfs.

Question 6

Unrelated, but there are system calls (e.g. statvfs) that will probably get what you want more directly.

Question 7

Just split on one or more spaces which was followed by a digit or /

>>> import re
>>> s = '''/dev/disk2 1996082176 430874208 1564695968 22% 2429281 4292537998 0% /
devfs 668 668 0 100% 1156 0 100% /dev
map -hosts 0 0 0 100% 0 0 100% /net
map auto_home 0 0 0 100% 0 0 100% /home'''.splitlines()
>>> for line in s:
 print re.split(r'\s+(?=[\d/])', line)
['/dev/disk2', '1996082176', '430874208', '1564695968', '22%', '2429281', '4292537998', '0%', '/']
['devfs', '668', '668', '0', '100%', '1156', '0', '100%', '/dev']
['map -hosts', '0', '0', '0', '100%', '0', '0', '100%', '/net']
['map auto_home', '0', '0', '0', '100%', '0', '0', '100%', '/home']
>>>

Question 8

If that is the behavior that you want, the easiest way I can see is to join the first element of the array until you reach a numeric element.

So something like this:

tokens = line.split()
n = 1
while n < len(tokens) and not tokens[n].isdigit():
 n += 1
tokens[0] = ' '.join(tokens[:n])
tokens = [ tokens[0] ] + tokens[n:]

Alternatively you could try @cricket_007’s suggestion:

first_token = line[:15].strip()
tokens = [ first_token ] + [ x.strip() for x in line[15:].split() ]

Question 9

Since FS is going to probably have multiple spaces and as long as you can pre-determine that you can split using different delimiters and combine them eventually.

fs, rest = re.split(r'\s{2,}', line, 1)
result = [fs] + rest.split()

But this won't work is fs is separated by a single space like a big one.

Agree with comments that using os.statvfs(path) is a better tool for this. df would be a subprocess call.

Avinash Raj 175k32 gold badges247 silver badges289 bronze badges · Accepted Answer · 2017-01-09 05:57:55Z

Just split on one or more spaces which was followed by a digit or /

>>> import re
>>> s = '''/dev/disk2 1996082176 430874208 1564695968 22% 2429281 4292537998 0% /
devfs 668 668 0 100% 1156 0 100% /dev
map -hosts 0 0 0 100% 0 0 100% /net
map auto_home 0 0 0 100% 0 0 100% /home'''.splitlines()
>>> for line in s:
 print re.split(r'\s+(?=[\d/])', line)
['/dev/disk2', '1996082176', '430874208', '1564695968', '22%', '2429281', '4292537998', '0%', '/']
['devfs', '668', '668', '0', '100%', '1156', '0', '100%', '/dev']
['map -hosts', '0', '0', '0', '100%', '0', '0', '100%', '/net']
['map auto_home', '0', '0', '0', '100%', '0', '0', '100%', '/home']
>>>

CollectivesTM on Stack Overflow

Parsing a command output - Python

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related