Return to Answer

replaced http://stackoverflow.com/ with https://stackoverflow.com/

edited May 23, 2017 at 12:41

since you are initializing a lot of slow_query and query_stats (also see note about the naming below) class instances on the fly, to improve the memory usage and performance, use __slots__ __slots__:
```
 class slow_query:
 __slots__ = ["operation", "stats", "timeout", "keyspace", "table", "is_cross_node"]
 # ...
```
switching from json to ujson may dramatically improve the JSON parsing speed
or, you can try the PyPy and simplejson combination (ujson won't work on PyPy since it is written in C, simplejson is a fast pure-python parser)
think about the capturing groups in your regular expressions, you can avoid capturing more things than you actually need. For example, in the "start" regular expression you have 2 capturing groups, but you actually use only the first one:
```
 r'DEBUG.*- (\d+) operations were slow in the last \d+ msecs:$'
 no group here^
```
the wild card matches in the regular expressions can be non-greedy non-greedy - .*? instead of .* (not sure if it will have a measurable impact on performance measurable impact on performance)
class names should use a "CamelCase" convention (PEP8 reference)
the .get_json_objects() method *can be static* static*
for the CLI parameter parsing I would use argparse module - you would avoid the boilerplate code you have in the main() and usage() functions
use 2 spaces before the # for the inline comment (PEP8 reference)
fix typo "avergae" -> "average"

you can improve the readability of the sort_queries() method by introducing a mapping between the key and the sort attribute name, something along these lines:

 def sort_queries(self):
 """Sorts "queries" in place, default sort is "by time"."""
 sort_attributes = {
 't': 'time',
 'at': 'avg',
 'c': 'count'
 }
 sort_attribute = sort_attributes.get(self.key, 't')
 self.queries.sort(key=lambda x: getattr(x.stats, sort_attribute), 
 reverse=self.reverse)

Note that this is what I can see by looking at the code. Of course, to really identify the bottleneck(s), you should profile the code properly profile the code properly on a large input.

since you are initializing a lot of slow_query and query_stats (also see note about the naming below) class instances on the fly, to improve the memory usage and performance, use __slots__:
```
 class slow_query:
 __slots__ = ["operation", "stats", "timeout", "keyspace", "table", "is_cross_node"]
 # ...
```
switching from json to ujson may dramatically improve the JSON parsing speed
or, you can try the PyPy and simplejson combination (ujson won't work on PyPy since it is written in C, simplejson is a fast pure-python parser)
think about the capturing groups in your regular expressions, you can avoid capturing more things than you actually need. For example, in the "start" regular expression you have 2 capturing groups, but you actually use only the first one:
```
 r'DEBUG.*- (\d+) operations were slow in the last \d+ msecs:$'
 no group here^
```
the wild card matches in the regular expressions can be non-greedy - .*? instead of .* (not sure if it will have a measurable impact on performance)
class names should use a "CamelCase" convention (PEP8 reference)
the .get_json_objects() method *can be static*
for the CLI parameter parsing I would use argparse module - you would avoid the boilerplate code you have in the main() and usage() functions
use 2 spaces before the # for the inline comment (PEP8 reference)
fix typo "avergae" -> "average"

you can improve the readability of the sort_queries() method by introducing a mapping between the key and the sort attribute name, something along these lines:

 def sort_queries(self):
 """Sorts "queries" in place, default sort is "by time"."""
 sort_attributes = {
 't': 'time',
 'at': 'avg',
 'c': 'count'
 }
 sort_attribute = sort_attributes.get(self.key, 't')
 self.queries.sort(key=lambda x: getattr(x.stats, sort_attribute), 
 reverse=self.reverse)

Note that this is what I can see by looking at the code. Of course, to really identify the bottleneck(s), you should profile the code properly on a large input.

since you are initializing a lot of slow_query and query_stats (also see note about the naming below) class instances on the fly, to improve the memory usage and performance, use __slots__:
```
 class slow_query:
 __slots__ = ["operation", "stats", "timeout", "keyspace", "table", "is_cross_node"]
 # ...
```
switching from json to ujson may dramatically improve the JSON parsing speed
or, you can try the PyPy and simplejson combination (ujson won't work on PyPy since it is written in C, simplejson is a fast pure-python parser)
think about the capturing groups in your regular expressions, you can avoid capturing more things than you actually need. For example, in the "start" regular expression you have 2 capturing groups, but you actually use only the first one:
```
 r'DEBUG.*- (\d+) operations were slow in the last \d+ msecs:$'
 no group here^
```
the wild card matches in the regular expressions can be non-greedy - .*? instead of .* (not sure if it will have a measurable impact on performance)
class names should use a "CamelCase" convention (PEP8 reference)
the .get_json_objects() method *can be static*
for the CLI parameter parsing I would use argparse module - you would avoid the boilerplate code you have in the main() and usage() functions
use 2 spaces before the # for the inline comment (PEP8 reference)
fix typo "avergae" -> "average"

you can improve the readability of the sort_queries() method by introducing a mapping between the key and the sort attribute name, something along these lines:

 def sort_queries(self):
 """Sorts "queries" in place, default sort is "by time"."""
 sort_attributes = {
 't': 'time',
 'at': 'avg',
 'c': 'count'
 }
 sort_attribute = sort_attributes.get(self.key, 't')
 self.queries.sort(key=lambda x: getattr(x.stats, sort_attribute), 
 reverse=self.reverse)

Note that this is what I can see by looking at the code. Of course, to really identify the bottleneck(s), you should profile the code properly on a large input.

added 703 characters in body

Source Link

edited Feb 17, 2017 at 3:46

alecxe

edited Feb 17, 2017 at 3:46

alecxe

17.5k
8
52
93

Note that this is what I can see by looking at the code. Of course, to really identify the bottleneck(s), you should profile the code properly on a large input.

added 703 characters in body

Source Link

edited Feb 17, 2017 at 3:41

alecxe

edited Feb 17, 2017 at 3:41

alecxe

17.5k
8
52
93

since you are initializing a lot of slow_query and query_stats (also see note about the naming below) class instances on the fly, to improve the memory usage and performance, use __slots__:
```
 class slow_query:
 __slots__ = ["operation", "stats", "timeout", "keyspace", "table", "is_cross_node"]
 # ...
```
switching from json to ujson may dramatically improve the JSON parsing speed
or, you can try the PyPy and simplejson combination (ujson won't work on PyPy since it is written in C, simplejson is a fast pure-python parser)
think about the capturing groups in your regular expressions, you can avoid capturing more things than you actually need. For example, in the "start" regular expression you have 2 capturing groups, but you actually use only the first one:
```
 r'DEBUG.*- (\d+) operations were slow in the last \d+ msecs:$'
 no group here^
```
the wild card matches in the regular expressions can be non-greedy - .*? instead of .* (not sure if it will have a measurable impact on performance )
class names should use a "CamelCase" convention (PEP8 reference)
the .get_json_objects() method *can be static*
for the CLI parameter parsing I would use argparse module - you would avoid the boilerplate code you have in the main() and usage() functions
use 2 spaces before the # for the inline comment (PEP8 reference)
fix typo "avergae" -> "average"

you can improve the readability of the sort_queries() method by introducing a mapping between the key and the sort attribute name, something along these lines:

 def sort_queries(self):
 """Sorts "queries" in place, default sort is "by time"."""
 sort_attributes = {
 't': 'time',
 'at': 'avg',
 'c': 'count'
 }
 sort_attribute = sort_attributes.get(self.key, 't')
 self.queries.sort(key=lambda x: getattr(x.stats, sort_attribute), 
 reverse=self.reverse)

since you are initializing a lot of slow_query and query_stats (also see note about the naming below) class instances on the fly, to improve the memory usage and performance, use __slots__:
```
 class slow_query:
 __slots__ = ["operation", "stats", "timeout", "keyspace", "table", "is_cross_node"]
 # ...
```
switching from json to ujson may dramatically improve the JSON parsing speed
or, you can try the PyPy and simplejson combination (ujson won't work on PyPy since it is written in C, simplejson is a fast pure-python parser)
class names should use a "CamelCase" convention (PEP8 reference)
the .get_json_objects() method *can be static*
for the CLI parameter parsing I would use argparse module - you would avoid the boilerplate code you have in the main() and usage() functions
use 2 spaces before the # for the inline comment (PEP8 reference)
fix typo "avergae" -> "average"

you can improve the readability of the sort_queries() method by introducing a mapping between the key and the sort attribute name, something along these lines:

 def sort_queries(self):
 """Sorts "queries" in place, default sort is "by time"."""
 sort_attributes = {
 't': 'time',
 'at': 'avg',
 'c': 'count'
 }
 sort_attribute = sort_attributes.get(self.key, 't')
 self.queries.sort(key=lambda x: getattr(x.stats, sort_attribute), 
 reverse=self.reverse)

since you are initializing a lot of slow_query and query_stats (also see note about the naming below) class instances on the fly, to improve the memory usage and performance, use __slots__:
```
 class slow_query:
 __slots__ = ["operation", "stats", "timeout", "keyspace", "table", "is_cross_node"]
 # ...
```
switching from json to ujson may dramatically improve the JSON parsing speed
or, you can try the PyPy and simplejson combination (ujson won't work on PyPy since it is written in C, simplejson is a fast pure-python parser)
think about the capturing groups in your regular expressions, you can avoid capturing more things than you actually need. For example, in the "start" regular expression you have 2 capturing groups, but you actually use only the first one:
```
 r'DEBUG.*- (\d+) operations were slow in the last \d+ msecs:$'
 no group here^
```
the wild card matches in the regular expressions can be non-greedy - .*? instead of .* (not sure if it will have a measurable impact on performance )
class names should use a "CamelCase" convention (PEP8 reference)
the .get_json_objects() method *can be static*
for the CLI parameter parsing I would use argparse module - you would avoid the boilerplate code you have in the main() and usage() functions
use 2 spaces before the # for the inline comment (PEP8 reference)
fix typo "avergae" -> "average"

you can improve the readability of the sort_queries() method by introducing a mapping between the key and the sort attribute name, something along these lines:

 def sort_queries(self):
 """Sorts "queries" in place, default sort is "by time"."""
 sort_attributes = {
 't': 'time',
 'at': 'avg',
 'c': 'count'
 }
 sort_attribute = sort_attributes.get(self.key, 't')
 self.queries.sort(key=lambda x: getattr(x.stats, sort_attribute), 
 reverse=self.reverse)