Parsing the lsblk output

Question 1

I am a Python beginner learning Python 3. I have written two small functions that parse the lsblk output and return Linux physical and logical disks. Here is the first function:

from subprocess import run, PIPE
def physical_drives():
 """
 Gets all physical drive names.
 Gets all physical drive names on a Linux system,
 parsing the lsblk utility output.
 Parameters
 ----------
 Returns
 -------
 list
 A list of strings representing drive names.
 """
 command = ['lsblk -d -o name -n']
 output = run(command, shell=True, stdout=PIPE)
 output_string = output.stdout.decode('utf-8')
 output_string = output_string.strip()
 results = output_string.split('\n')
 return results
def main():
 print(physical_drives())
if __name__ == '__main__':
 main()

The second function:

from subprocess import run, PIPE
def partitions(disk):
 """
 Gets all partitions for a given physical disk.
 Gets all partitions present on a physical disk
 on a Linux system.
 The function parses the lsblk utility output.
 Parameters
 ----------
 disk : string
 A string containing a disk name such as 'sda'
 Returns
 -------
 list
 A list of strings representing partitions.
 """
 command = ['lsblk -o name -n -s -l']
 output = run(command, shell=True, stdout=PIPE)
 output_string = output.stdout.decode('utf-8')
 output_string = output_string.strip()
 results = list()
 results.extend(output_string.split('\n'))
 results = [x for x in results if x != disk and disk in x]
 return results
def main():
 from disks import physical_drives
 for drive in physical_drives():
 print(drive)
 parts = partitions(drive)
 for partition in parts:
 print('\t' + partition)
if __name__ == '__main__':
 main()

The functions are in two different files in the same directory. I would appreciate a quick review on anything that is not idiomatic/pythonic code. I also have one specific question. I have coded the functions in Ninja-Ide with lint and PEP8 suggestions turned on. In both files the IDE suggests that my print() statements:

print('\t' + partition)
print(physical_drives())

should be written with doubled parentheses:

print(('\t' + partition))
print((physical_drives()))

For Python 3 support, I have checked the print() documentation, but have been unable to find a reference for including function calls and string concatenation in double parentheses when calling print().

Question 2

The double-parentheses suggestion makes no sense.

Question 3

lsblk

The -s option to lsblk was introduced to util-linux rather recently, in release 2.22. You may experience compatibility issues on slightly older GNU/Linux installations.

But I don't see why you would want the -s option at all — it just gives you an inverted device tree. For example, on my machine:

$ lsblk -o name -n -s -l
sda1
sda
sda2
sda
sr0
vg-root
sda3
sda
vg-var
sda3
sda
vg-data
sda3
sda

In the output, sda appears multiple times. To understand the output, you need to drop the -l flag so that the list appears in tree form:

$ lsblk -o name -n -s
sda1
└─sda
sda2
└─sda
sr0
vg-root
└─sda3
 └─sda
vg-var
└─sda3
 └─sda
vg-data
└─sda3
 └─sda

Now, it's more apparent that the -s option isn't helpful. If you drop it, then the output makes more sense:

$ lsblk -o name -n
sda
├─sda1
├─sda2
└─sda3
 ├─vg-root
 ├─vg-var
 └─vg-data
sr0
$ lsblk -o name -n -l
sda
sda1
sda2
sda3
vg-root
vg-var
vg-data
sr0

To list the devices on sda, it would be better to run lsblk -o name -n -l /dev/sda — that would immediately drop sr0 from consideration, for example. Note that LVM volumes (such as vg-root above) would still appear in the output. I don't think that doing a substring search (if x != disk and disk in x in your code) is a reliable filter. It could be fooled if there are more than 26 physical disks: the 27th disk would be named sdaa. It might also be fooled by exceptionally tricky naming of LVM volumes.

Subprocess execution

Whenever practical, I recommend avoiding the shell when executing subprocesses. The shell introduces a set of potential security vulnerabilities — for example, shenanigans with the PATH environment variable. Best practice would be to run the command with a specific executable and pre-parsed command-line options:

run('/bin/lsblk -o name -n -s -l'.split(), stdout=PIPE)

Alternative solution

I actually wouldn't bother with parsing the output of lsblk at all. After all, lsblk is just a way to report the contents of the sysfs filesystem. You would be better off inspecting /sys directly.

from glob import glob
from os.path import basename, dirname
def physical_drives():
 drive_glob = '/sys/block/*/device'
 return [basename(dirname(d)) for d in glob(drive_glob)]
def partitions(disk):
 if disk.startswith('.') or '/' in disk:
 raise ValueError('Invalid disk name {0}'.format(disk))
 partition_glob = '/sys/block/{0}/*/start'.format(disk)
 return [basename(dirname(p)) for p in glob(partition_glob)]

Question 4

I used the -s option because the output seemed easier to parse. I will definitelly check out your alternative solution. I didn't like using text parsing for my functions either, i was hoping for something like an API call but was unable to find one. Thank you.

Question 5

This code looks okay, but I would simplify a few things:

This:

output_string = output.stdout.decode('utf-8')
output_string = output_string.strip()
results = output_string.split('\n')
return results

could be :

return output.stdout.decode('utf-8').strip().split('\n')

this:

def main():
 print(physical_drives())
if __name__ == '__main__':
 main()

could be:

if __name__ == '__main__':
 print(physical_drives())

this:

output_string = output.stdout.decode('utf-8')
output_string = output_string.strip()
results = list()
results.extend(output_string.split('\n'))
results = [x for x in results if x != disk and disk in x]
return results

could be:

results = output.stdout.decode('utf-8').strip().split('\n'))
return [x for x in results if x != disk and disk in x]

this:

parts = partitions(drive)
for partition in parts:
 print('\t' + partition)

could be:

for partition in partitions(drive):
 print('\t' + partition)

And finally, I see no reason for the double paren to the print function.

Question 6

Thank you for your detailed suggestions. I am sorry I can mark only one post as the answer.

Question 7

@Anonimista No worries. The concept that a single code review is the answer is just odd. Hopefully code reviews are additive and they are all useful is some way. :-)

Question 8

Especially because the answers complement one another. Your answer suggests how to write more pythonic code and the other answer is related to my use of lsblk. Oh, well.

Question 9

The logic can be moved out to an external process, if you're able to install jq (command-line JSON parsing tool).

Command: apt install jq

Then the code can be reduced to just:

import os
resp = os.popen('lsblk -J | jq -c .').readline().strip()

Question 10

Welcome to Code Review! You have presented an alternative solution, but haven't reviewed the code. Please edit to show what aspects of the question code prompted you to write this version, and in what ways it's an improvement over the original. It may be worth (re-)reading How to Answer.

Question 11

@TobySpeight I disagree. I think that processing JSON is clearly simpler and better, and passes the "Oh, duh!" test.

Question 12

@200 I've added an introductory sentence or two to make this a review. HTH.

score 11 · Accepted Answer · 2017-01-13 07:56:49Z

lsblk

The -s option to lsblk was introduced to util-linux rather recently, in release 2.22. You may experience compatibility issues on slightly older GNU/Linux installations.

But I don't see why you would want the -s option at all — it just gives you an inverted device tree. For example, on my machine:

$ lsblk -o name -n -s -l
sda1
sda
sda2
sda
sr0
vg-root
sda3
sda
vg-var
sda3
sda
vg-data
sda3
sda

In the output, sda appears multiple times. To understand the output, you need to drop the -l flag so that the list appears in tree form:

$ lsblk -o name -n -s
sda1
└─sda
sda2
└─sda
sr0
vg-root
└─sda3
 └─sda
vg-var
└─sda3
 └─sda
vg-data
└─sda3
 └─sda

Now, it's more apparent that the -s option isn't helpful. If you drop it, then the output makes more sense:

$ lsblk -o name -n
sda
├─sda1
├─sda2
└─sda3
 ├─vg-root
 ├─vg-var
 └─vg-data
sr0
$ lsblk -o name -n -l
sda
sda1
sda2
sda3
vg-root
vg-var
vg-data
sr0

To list the devices on sda, it would be better to run lsblk -o name -n -l /dev/sda — that would immediately drop sr0 from consideration, for example. Note that LVM volumes (such as vg-root above) would still appear in the output. I don't think that doing a substring search (if x != disk and disk in x in your code) is a reliable filter. It could be fooled if there are more than 26 physical disks: the 27th disk would be named sdaa. It might also be fooled by exceptionally tricky naming of LVM volumes.

Subprocess execution

Whenever practical, I recommend avoiding the shell when executing subprocesses. The shell introduces a set of potential security vulnerabilities — for example, shenanigans with the PATH environment variable. Best practice would be to run the command with a specific executable and pre-parsed command-line options:

run('/bin/lsblk -o name -n -s -l'.split(), stdout=PIPE)

Alternative solution

I actually wouldn't bother with parsing the output of lsblk at all. After all, lsblk is just a way to report the contents of the sysfs filesystem. You would be better off inspecting /sys directly.

from glob import glob
from os.path import basename, dirname
def physical_drives():
 drive_glob = '/sys/block/*/device'
 return [basename(dirname(d)) for d in glob(drive_glob)]
def partitions(disk):
 if disk.startswith('.') or '/' in disk:
 raise ValueError('Invalid disk name {0}'.format(disk))
 partition_glob = '/sys/block/{0}/*/start'.format(disk)
 return [basename(dirname(p)) for p in glob(partition_glob)]

I used the -s option because the output seemed easier to parse. I will definitelly check out your alternative solution. I didn't like using text parsing for my functions either, i was hoping for something like an API call but was unable to find one. Thank you.

Stack Exchange Network

Parsing the lsblk output

3 Answers 3

lsblk

Subprocess execution

Alternative solution

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Parsing the lsblk output

3 Answers 3

lsblk

Subprocess execution

Alternative solution

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions