Skip to main content
Code Review

Return to Answer

replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link
  • since you are initializing a lot of slow_query and query_stats (also see note about the naming below) class instances on the fly, to improve the memory usage and performance, use __slots__ __slots__:

     class slow_query:
     __slots__ = ["operation", "stats", "timeout", "keyspace", "table", "is_cross_node"]
     # ...
    
  • switching from json to ujson may dramatically improve the JSON parsing speed

  • or, you can try the PyPy and simplejson combination (ujson won't work on PyPy since it is written in C, simplejson is a fast pure-python parser)

  • think about the capturing groups in your regular expressions, you can avoid capturing more things than you actually need. For example, in the "start" regular expression you have 2 capturing groups, but you actually use only the first one:

     r'DEBUG.*- (\d+) operations were slow in the last \d+ msecs:$'
     no group here^
    
  • the wild card matches in the regular expressions can be non-greedy non-greedy - .*? instead of .* (not sure if it will have a measurable impact on performance measurable impact on performance)

  • class names should use a "CamelCase" convention (PEP8 reference)

  • the .get_json_objects() method *can be static* static*

  • for the CLI parameter parsing I would use argparse module - you would avoid the boilerplate code you have in the main() and usage() functions

  • use 2 spaces before the # for the inline comment (PEP8 reference)

  • fix typo "avergae" -> "average"

  • you can improve the readability of the sort_queries() method by introducing a mapping between the key and the sort attribute name, something along these lines:

     def sort_queries(self):
     """Sorts "queries" in place, default sort is "by time"."""
     sort_attributes = {
     't': 'time',
     'at': 'avg',
     'c': 'count'
     }
     sort_attribute = sort_attributes.get(self.key, 't')
     self.queries.sort(key=lambda x: getattr(x.stats, sort_attribute), 
     reverse=self.reverse)
    

Note that this is what I can see by looking at the code. Of course, to really identify the bottleneck(s), you should profile the code properly profile the code properly on a large input.

  • since you are initializing a lot of slow_query and query_stats (also see note about the naming below) class instances on the fly, to improve the memory usage and performance, use __slots__:

     class slow_query:
     __slots__ = ["operation", "stats", "timeout", "keyspace", "table", "is_cross_node"]
     # ...
    
  • switching from json to ujson may dramatically improve the JSON parsing speed

  • or, you can try the PyPy and simplejson combination (ujson won't work on PyPy since it is written in C, simplejson is a fast pure-python parser)

  • think about the capturing groups in your regular expressions, you can avoid capturing more things than you actually need. For example, in the "start" regular expression you have 2 capturing groups, but you actually use only the first one:

     r'DEBUG.*- (\d+) operations were slow in the last \d+ msecs:$'
     no group here^
    
  • the wild card matches in the regular expressions can be non-greedy - .*? instead of .* (not sure if it will have a measurable impact on performance)

  • class names should use a "CamelCase" convention (PEP8 reference)

  • the .get_json_objects() method *can be static*

  • for the CLI parameter parsing I would use argparse module - you would avoid the boilerplate code you have in the main() and usage() functions

  • use 2 spaces before the # for the inline comment (PEP8 reference)

  • fix typo "avergae" -> "average"

  • you can improve the readability of the sort_queries() method by introducing a mapping between the key and the sort attribute name, something along these lines:

     def sort_queries(self):
     """Sorts "queries" in place, default sort is "by time"."""
     sort_attributes = {
     't': 'time',
     'at': 'avg',
     'c': 'count'
     }
     sort_attribute = sort_attributes.get(self.key, 't')
     self.queries.sort(key=lambda x: getattr(x.stats, sort_attribute), 
     reverse=self.reverse)
    

Note that this is what I can see by looking at the code. Of course, to really identify the bottleneck(s), you should profile the code properly on a large input.

  • since you are initializing a lot of slow_query and query_stats (also see note about the naming below) class instances on the fly, to improve the memory usage and performance, use __slots__:

     class slow_query:
     __slots__ = ["operation", "stats", "timeout", "keyspace", "table", "is_cross_node"]
     # ...
    
  • switching from json to ujson may dramatically improve the JSON parsing speed

  • or, you can try the PyPy and simplejson combination (ujson won't work on PyPy since it is written in C, simplejson is a fast pure-python parser)

  • think about the capturing groups in your regular expressions, you can avoid capturing more things than you actually need. For example, in the "start" regular expression you have 2 capturing groups, but you actually use only the first one:

     r'DEBUG.*- (\d+) operations were slow in the last \d+ msecs:$'
     no group here^
    
  • the wild card matches in the regular expressions can be non-greedy - .*? instead of .* (not sure if it will have a measurable impact on performance)

  • class names should use a "CamelCase" convention (PEP8 reference)

  • the .get_json_objects() method *can be static*

  • for the CLI parameter parsing I would use argparse module - you would avoid the boilerplate code you have in the main() and usage() functions

  • use 2 spaces before the # for the inline comment (PEP8 reference)

  • fix typo "avergae" -> "average"

  • you can improve the readability of the sort_queries() method by introducing a mapping between the key and the sort attribute name, something along these lines:

     def sort_queries(self):
     """Sorts "queries" in place, default sort is "by time"."""
     sort_attributes = {
     't': 'time',
     'at': 'avg',
     'c': 'count'
     }
     sort_attribute = sort_attributes.get(self.key, 't')
     self.queries.sort(key=lambda x: getattr(x.stats, sort_attribute), 
     reverse=self.reverse)
    

Note that this is what I can see by looking at the code. Of course, to really identify the bottleneck(s), you should profile the code properly on a large input.

added 703 characters in body
Source Link
alecxe
  • 17.5k
  • 8
  • 52
  • 93

Note that this is what I can see by looking at the code. Of course, to really identify the bottleneck(s), you should profile the code properly on a large input.

Note that this is what I can see by looking at the code. Of course, to really identify the bottleneck(s), you should profile the code properly on a large input.

added 703 characters in body
Source Link
alecxe
  • 17.5k
  • 8
  • 52
  • 93
  • since you are initializing a lot of slow_query and query_stats (also see note about the naming below) class instances on the fly, to improve the memory usage and performance, use __slots__:

     class slow_query:
     __slots__ = ["operation", "stats", "timeout", "keyspace", "table", "is_cross_node"]
     # ...
    
  • switching from json to ujson may dramatically improve the JSON parsing speed

  • or, you can try the PyPy and simplejson combination (ujson won't work on PyPy since it is written in C, simplejson is a fast pure-python parser)

  • think about the capturing groups in your regular expressions, you can avoid capturing more things than you actually need. For example, in the "start" regular expression you have 2 capturing groups, but you actually use only the first one:

     r'DEBUG.*- (\d+) operations were slow in the last \d+ msecs:$'
     no group here^
    
  • the wild card matches in the regular expressions can be non-greedy - .*? instead of .* (not sure if it will have a measurable impact on performance )

  • class names should use a "CamelCase" convention (PEP8 reference)

  • the .get_json_objects() method *can be static*

  • for the CLI parameter parsing I would use argparse module - you would avoid the boilerplate code you have in the main() and usage() functions

  • use 2 spaces before the # for the inline comment (PEP8 reference)

  • fix typo "avergae" -> "average"

  • you can improve the readability of the sort_queries() method by introducing a mapping between the key and the sort attribute name, something along these lines:

     def sort_queries(self):
     """Sorts "queries" in place, default sort is "by time"."""
     sort_attributes = {
     't': 'time',
     'at': 'avg',
     'c': 'count'
     }
     sort_attribute = sort_attributes.get(self.key, 't')
     self.queries.sort(key=lambda x: getattr(x.stats, sort_attribute), 
     reverse=self.reverse)
    
  • since you are initializing a lot of slow_query and query_stats (also see note about the naming below) class instances on the fly, to improve the memory usage and performance, use __slots__:

     class slow_query:
     __slots__ = ["operation", "stats", "timeout", "keyspace", "table", "is_cross_node"]
     # ...
    
  • switching from json to ujson may dramatically improve the JSON parsing speed

  • or, you can try the PyPy and simplejson combination (ujson won't work on PyPy since it is written in C, simplejson is a fast pure-python parser)

  • class names should use a "CamelCase" convention (PEP8 reference)

  • the .get_json_objects() method *can be static*

  • for the CLI parameter parsing I would use argparse module - you would avoid the boilerplate code you have in the main() and usage() functions

  • use 2 spaces before the # for the inline comment (PEP8 reference)

  • fix typo "avergae" -> "average"

  • you can improve the readability of the sort_queries() method by introducing a mapping between the key and the sort attribute name, something along these lines:

     def sort_queries(self):
     """Sorts "queries" in place, default sort is "by time"."""
     sort_attributes = {
     't': 'time',
     'at': 'avg',
     'c': 'count'
     }
     sort_attribute = sort_attributes.get(self.key, 't')
     self.queries.sort(key=lambda x: getattr(x.stats, sort_attribute), 
     reverse=self.reverse)
    
  • since you are initializing a lot of slow_query and query_stats (also see note about the naming below) class instances on the fly, to improve the memory usage and performance, use __slots__:

     class slow_query:
     __slots__ = ["operation", "stats", "timeout", "keyspace", "table", "is_cross_node"]
     # ...
    
  • switching from json to ujson may dramatically improve the JSON parsing speed

  • or, you can try the PyPy and simplejson combination (ujson won't work on PyPy since it is written in C, simplejson is a fast pure-python parser)

  • think about the capturing groups in your regular expressions, you can avoid capturing more things than you actually need. For example, in the "start" regular expression you have 2 capturing groups, but you actually use only the first one:

     r'DEBUG.*- (\d+) operations were slow in the last \d+ msecs:$'
     no group here^
    
  • the wild card matches in the regular expressions can be non-greedy - .*? instead of .* (not sure if it will have a measurable impact on performance )

  • class names should use a "CamelCase" convention (PEP8 reference)

  • the .get_json_objects() method *can be static*

  • for the CLI parameter parsing I would use argparse module - you would avoid the boilerplate code you have in the main() and usage() functions

  • use 2 spaces before the # for the inline comment (PEP8 reference)

  • fix typo "avergae" -> "average"

  • you can improve the readability of the sort_queries() method by introducing a mapping between the key and the sort attribute name, something along these lines:

     def sort_queries(self):
     """Sorts "queries" in place, default sort is "by time"."""
     sort_attributes = {
     't': 'time',
     'at': 'avg',
     'c': 'count'
     }
     sort_attribute = sort_attributes.get(self.key, 't')
     self.queries.sort(key=lambda x: getattr(x.stats, sort_attribute), 
     reverse=self.reverse)
    
added 703 characters in body
Source Link
alecxe
  • 17.5k
  • 8
  • 52
  • 93
Loading
added 703 characters in body
Source Link
alecxe
  • 17.5k
  • 8
  • 52
  • 93
Loading
Source Link
alecxe
  • 17.5k
  • 8
  • 52
  • 93
Loading
lang-py

AltStyle によって変換されたページ (->オリジナル) /