5
\$\begingroup\$

Case 1: rank1_naming

This function takes two arguments

  • list_proteins_pattern_available
  • best_match_protein_name

Objective: Extract the three letter pattern from the both arguments. Match the pattern and keep only the matched items. Extract the numbers from list_proteins_pattern_available and also sort it. Find the maximum number from the collected numbers and add 1 to get the next number.

Please let me know if you have any questions.

Can you point ways to improve this script?

import re
def case_rank1_naming(list_proteins_pattern_available, best_match_protein_name):
 #This will store the list of numbers
 available_list_numbers = []
 #extract the three letter pattern
 protein_pattern = re.search(r"[A-Z]{1}[a-z]{2}", best_match_protein_name)
 protein_pattern = protein_pattern.group()
 #extract the numbers
 for name in list_proteins_pattern_available:
 pattern = re.search(r"[A-Z]{1}[a-z]{2}\d{1,3}", name)
 number = re.search(r"\d{1,3}", pattern.group())
 available_list_numbers.append(number.group())
 #Convert all the string numbers to integers
 available_list_numbers = [int(x) for x in available_list_numbers]
 #Sort the available number. Just realized I use two times sort function.
 available_list_numbers.sort()
 # Sort the available number, get the maximum number and add one to get next number
 # Example: result will be 50
 primary_number_prediction = int(max(sorted(available_list_numbers))) + 1
 #Add the protein pattern, the next predicted number and 'Aa1' at the suffix
 predicted_name = protein_pattern + str(primary_number_prediction) + 'Aa1'
 return predicted_name
list_proteins_pattern_available = ['Xpp1Aa1', 'Xpp2Aa1', 'Xpp35Aa1', 'Xpp35Ab1', 'Xpp35Ac1', 'Xpp35Ba1', 'Xpp36Aa1', 'Xpp49Aa1', 'Xpp49Ab1']
best_match_protein_name = 'Xpp35Ba1'
predicted_name = case_rank1_naming(list_proteins_pattern_available, best_match_protein_name)
print(predicted_name)
#Xpp50Aa1
dfhwze
14.1k3 gold badges40 silver badges101 bronze badges
asked Sep 17, 2019 at 15:01
\$\endgroup\$
0

1 Answer 1

3
\$\begingroup\$

I'll show an example implementation first, and then describe it:

from typing import Iterable
import re
def case_rank1_naming(proteins_available: Iterable[str], best_match_protein_name: str) -> str:
 # extract the three-letter pattern
 protein_pattern = re.search(r"[A-Z][a-z]{2}", best_match_protein_name).group()
 # extract the numbers
 best_number = max(
 int(re.search(r"[A-Z][a-z]{2}(\d{1,3})", name)[1])
 for name in proteins_available
 )
 # Add the protein pattern, the next predicted number and 'Aa1' at the suffix
 return f'{protein_pattern}{best_number + 1}Aa1'
def main():
 proteins_available = (
 'Xpp1Aa1', 'Xpp2Aa1', 'Xpp35Aa1', 'Xpp35Ab1', 'Xpp35Ac1',
 'Xpp35Ba1', 'Xpp36Aa1', 'Xpp49Aa1', 'Xpp49Ab1'
 )
 best_match_protein_name = 'Xpp35Ba1'
 predicted_name = case_rank1_naming(proteins_available, best_match_protein_name)
 assert predicted_name == 'Xpp50Aa1'
if __name__ == '__main__':
 main()
  • Add type hints to better-define your function signature
  • Don't write {1} in a regex - you can just drop it
  • Call max immediately on a generator rather than making and sorting a list
  • Shorten your variable names. Especially don't include the type of the variable in its name. Type hints and appropriate pluralization will cover you instead.
  • Use f-strings
  • Have a main function
  • In main, use a tuple for proteins_available instead of a list because it doesn't need to mutate
answered Sep 17, 2019 at 15:26
\$\endgroup\$
4
  • \$\begingroup\$ you are on a roll :) \$\endgroup\$ Commented Sep 17, 2019 at 15:32
  • \$\begingroup\$ Thank you. I will go through it and understand. \$\endgroup\$ Commented Sep 17, 2019 at 15:46
  • \$\begingroup\$ Why this step assert predicted_name == 'Xpp50Aa1'. In many cases we don't know the predicted name. right? Is this for testing? \$\endgroup\$ Commented Sep 17, 2019 at 17:20
  • \$\begingroup\$ Yes, it's only for testing. You'll definitely want to leave that out in general program use, and move it to a unit test. \$\endgroup\$ Commented Sep 17, 2019 at 17:22

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.