2
\$\begingroup\$

I am a Python beginner, and I wrote a simple Python program, which does the following:

  • search the pattern in the lines of a file (contains messages)
  • pick the information from the lines and save it to disk
  • remove the messages which matches with the regex (regular expression)
  • saves remaining messages to another file,

Regexps are not available, so I pick one message -> create regex for it -> remove the matching messages and repeat the same with remaining messages.

# coding: utf-8
# In[50]:
import re
import csv
# ### Run this part only once in the starting. From here 
# In[2]:
# ### Change the directory to working folder and give the right filename (hdfcbk), 
# ### if unsure what to do go to your folder and right click and copy the filen here, it will look like /home/XYZ/.../Your_folder_name/hdfcbk
smsFile = open('hdfcbk', 'r')
data = smsFile.read()
data = data.split('\n')
main_data = data
regex_list = []
regl = []
# In[3]:
def regex_search(pattern, file_name):
 remove_arr = []
 res = []
 remain_sms = []
 for sms in file_name:
 j= re.match(pattern,sms)
 if j != None:
 res.append(j.groupdict())
 remove_arr.append(sms)
 else:
 remain_sms.append(sms)
 return res, remove_arr, remain_sms
# In[4]:
def write_to_csv(result,csv_name):
 keys = result[0][0].keys()
 with open(csv_name, 'wb') as output_file:
 dict_writer = csv.DictWriter(output_file, keys, dialect='excel')
 dict_writer.writeheader()
 dict_writer.writerows(result[0])
# In[12]: 
# ### To here, now the repetitive run start
# ### Update this pattern file
# In[1]:
pat1 = 'INR (?P<Amount>(.*)) deposited to A\/c No (?P<AccountNo>(.*)) towards (?P<Towards>(.*)) Val (?P<Date>(.*)). Clr Bal is INR (?P<Balance>(.*)) subject to clearing.'
# In[8]:
A = regex_search(pat1,main_data)
# ### Updating main_data to remaining messages
# In[11]:
main_data = A[2]
# ### Writing remaining sms to a file, you don't need to change the file name as it will be updated everything as you run the script. Just look at the remaining sms and make new regex
# In[21]:
with open('remaining_sms.txt', 'w') as fp:
 fp.write('\n'.join('%s' % x for x in main_data))
# ### Update the csv file
# In[ ]:
write_to_csv(A, 'hdfc_test_3.csv')
# ### Keeping all the regexes in one list, update the index number in [i,pat1]
# In[52]:
regl.append([1,pat1])
# ### Wrting the regex index to csv, run this part in the end, or if you're unsure that you will make the mistake run this part and keep changing the output file name
# In[53]:
with open("output.csv", "wb") as f:
 writer = csv.writer(f)
 writer.writerows(regl)

I commented everything in the code. Now the thing is, I need to send this task to some people, who does not know anything in coding. That's why I commented so much.

Can you please review my code and suggest what else I can do to improve the code so that people can run the code without any hassle?

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Oct 28, 2016 at 10:30
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

10 Tips, Problems, etc.


  1. Don't begin comments with # ###
  2. Put the code that you had outside of functions in a main() function. This makes it so that if the module is imported, this code won't be run inadvertently. You can use this standard framework:

    def main():
     # put code here that you want to run when file is executed
    if __name__ == "__main__":
     main()
    
  3. Spell words in your comments correctly, and wrap them to a new line if they are getting too long. (After a few years of coding I have found that the vast majority of comments needn't be longer than one line.)

  4. Use with open("file.txt", 'r') as f: whenever working with files
  5. Always end your file with a new line
  6. When comparing to None use is and not instead of == and !=
  7. Actually catch the three return values of the regex_search method explicitly with a well-named variable
  8. No capital letters in variable names according to pep-8 guidelines
  9. I find it much easier to use print than write when writing to a file inside of a with open() statement:

    print(some_text, file=some_file, flush=True)
    
  10. Use string.format() instead of the deprecated old string formatting tools (%s...)


# coding: utf-8
import re
import csv
def regex_search(pattern, file_name):
 remove_arr = []
 res = []
 remain_sms = []
 for sms in file_name:
 j = re.match(pattern, sms)
 if j is not None:
 res.append(j.groupdict())
 remove_arr.append(sms)
 else:
 remain_sms.append(sms)
 return res, remove_arr, remain_sms
def write_to_csv(result, csv_name):
 keys = result[0][0].keys()
 with open(csv_name, 'wb') as output_file:
 dict_writer = csv.DictWriter(output_file, keys, dialect='excel')
 dict_writer.writeheader()
 dict_writer.writerows(result[0])
def main():
 # Run this part only once in the starting. From here
 # change the directory to working folder and give the right filename (hdfcbk),
 # if unsure what to do go to your folder and right click and copy the file here,
 # it will look like /home/XYZ/.../Your_folder_name/hdfcbk
 with open('hdfcbk', 'r') as smsFile:
 data = smsFile.read()
 data = data.split('\n')
 main_data = data
 regl = []
 pat1 = 'INR (?P<Amount>(.*)) deposited to A\/c No (?P<AccountNo>(.*)) towards (?P<Towards>(.*)) Val (?P<Date>(.*)). Clr Bal is INR (?P<Balance>(.*)) subject to clearing.'
 # TODO - Use much more descriptive names...no idea what's going on here without searching for a while
 a, b, c = regex_search(pat1, main_data)
 # Updating main_data to remaining messages
 main_data = c
 # Writing remaining sms to a file, you don't need to change the file name as it will be updated
 # everything as you run the script. Just look at the remaining sms and make new regex.
 with open('remaining_sms.txt', 'w') as fp:
 fp.write('\n'.join('{}'.format(x) for x in main_data))
 # Update the csv file
 write_to_csv([a, b, c], 'hdfc_test_3.csv')
 # Keeping all the regexes in one list, update the index number in [i, pat1]
 regl.append([1, pat1])
 # Writing the regex index to csv, run this part in the end, or if you're unsure that you will
 # make the mistake run this part and keep changing the output file name.
 with open("output.csv", "wb") as f:
 writer = csv.writer(f)
 writer.writerows(regl)
if __name__ == "__main__":
 main()
answered Oct 28, 2016 at 12:06
\$\endgroup\$
4
  • \$\begingroup\$ BTW I didn't know what all of the # In[x] things were all about, so I removed those too. \$\endgroup\$ Commented Oct 28, 2016 at 12:15
  • \$\begingroup\$ I used jupyter notebook (ipython notebook), when I downloaded the notebook as .py file. It shows executed parts as # In[x]. \$\endgroup\$ Commented Oct 28, 2016 at 12:59
  • \$\begingroup\$ Can you explain the def main() part. I understood it will not execute while loading but then how I will execute the parts in the main method? Sorry for so many edits in comment. First time posting questions and comments. \$\endgroup\$ Commented Oct 28, 2016 at 13:13
  • \$\begingroup\$ main() gets called when the __name__ == "__main__" or when the file is being executed by itself. If you print out the __name__ from a file that is being executed it will be "__main__", but when it is imported as a module, the __name__ will be the actual name of the file in most cases. \$\endgroup\$ Commented Oct 28, 2016 at 14:56

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.