Better Regex and exception handling for this small code
Ganesh Pal
ganesh1pal at gmail.com
Tue Jul 18 12:52:28 EDT 2017
Thanks Cameron Simpson for you suggestion and reply quite helpful :)
On Wed, Jul 12, 2017 at 5:06 AM, Cameron Simpson <cs at zip.com.au> wrote:
> On 11Jul2017 22:01, Ganesh Pal <ganesh1pal at gmail.com> wrote:
>>> I am trying to open a file and check if there is a pattern has changed
>> after the task got completed?
>>>> file data:
>> ........................................................
>>>> #tail -f /file.txt
>> ..........................................
>> Note: CRC:algo = 2, split_crc = 1, unused = 0, initiator_crc = b6b20a65,
>> journal_crc = d2097b00
>> Note: Task completed successfully.
>> Note: CRC:algo = 2, split_crc = 1, unused = 0, initiator_crc = d976d35e,
>> journal_crc = a176af10
>>>>>> I have the below piece of code but would like to make this better more
>> pythonic , I found regex pattern and exception handling poor here , any
>> quick suggestion in your spare time is welcome.
>>>>>> #open the existing file if the flag is set and check if there is a match
>>>> log_file='/file.txt'
>> flag_is_on=1
>>>> Use "True" instead of "1". A flag is a Boolean thing, and should use a
> Boolean value. This lets you literally speak "true" and 'false" rather than
> imoplicitly saying that "0 means false and nonzero means true".
>> data = None
>>>> There is no need to initialise data here because you immediately overwrite
> it below.
>> with open(log_file, 'r') as f:
>> data = f.readlines()
>>>> if flag_is_on:
>>>> Oh yes. Just name this variable "flag". "_is_on" is kind of implicit.
>> logdata = '\n'.join(data)
>>>> Do other parts of your programme deal with the file data as lines? If not,
> there is little point to reading the file and breaking it up into lines
> above, then joining them together against here. Just go:
>> with open(log_file) as f:
> log_data = f.read()
>> reg = "initiator_crc =(?P<ini_crc>[\s\S]*?), journal_crc"
>>>> Normally we write regular expressions as "raw" python strings, thus:
>> reg = r'initiator_crc =(?P<ini_crc>[\s\S]*?), journal_crc'
>> because backslashes etc are punctuation inside normal strings. Within a
> "raw" string started with r' nothing is special until the closing '
> character. This makes writing regular expressions more reliable.
>> Also, why the character range "[\s\S]"? That says whitespace or
> nonwhitespace i.e. any character. If you want any character, just say ".".
>> crc = re.findall(re.compile(reg), logdata)
>>>> It is better to compile a regexp just the once, getting a Regexp object,
> and then you just use the compiled object.
>> if not crc:
>> raise Exception("Pattern not found in logfile")
>>>> ValueError would be a more appropriate exception here; plain old
> "Exception" is pretty vague.
>> checksumbefore = crc[0].strip()
>> checksumafter = crc[1].strip()
>>>> Your regexp cannot start or end with whitespace. Those .strip calls are
> not doing anything for you.
>> This reads like you expect there to be exactly 2 matches in the file. What
> if there are more or fewer?
>> logging.info("checksumbefore :%s and checksumafter:%s"
>> % (checksumbefore, checksumafter))
>>>> if checksumbefore == checksumafter:
>> raise Exception("checksum not macthing")
>>>> Don't you mean != here?
>> I wouldn't be raising exceptions in this code. Personally I would make
> this a function that returns True or False. Exceptions are a poor way of
> returning "status" or other values. They're really for "things that should
> not have happened", hence their name.
>> It looks like you're scanning a log file for multiple lines and wanting to
> know if successive ones change. Why not write a function like this
> (untested):
>> RE_CRC_LINE = re.compile(r'initiator_crc =(?P<ini_crc>[\s\S]*?),
> journal_crc')
>> def check_for_crc_changes(logfile):
> old_crc_text = ''
> with open(logfile) as f:
> for line in f:
> m = RE_CRC_LINE.match(line)
> if not m:
> # uninteresting line
> continue
> crc_text = m.group(0)
> if crc_text != old_crc_text:
> # found a change
> return True
> if old_crc_text == '':
> # if this is really an error, you might raise this exception
> # but maybe no such lines is just normal but boring
> raise ValueError("no CRC lines seen in logfile %r" % (logfile,))
> # found no changes
> return False
>> See that there is very little sanity checking. In an exception supporting
> language like Python you can often write code as if it will always succeed
> by using things which will raise exceptions if things go wrong. Then
> _outside_ the function you can catch any exceptions that occur (such as
> being unable to open the log file).
>> Cheers,
> Cameron Simpson <cs at zip.com.au>
>
More information about the Python-list
mailing list