5
\$\begingroup\$

This is my code to generate a FASTA file containing multiple records with randomized DNA sequences with distinct length. I am looking for feedback on how to write this script better.

"""
 Generates random fasta file.
 Number of records and length of each record is given by user.
"""
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from random import choices
DNA = ['A', 'T', 'C', 'G']
prob = [0.25, 0.25, 0.25, 0.25] #probability of each base
number_of_records = 100 #number of fasta records
length_of_records = 180 #length of dna sequnece
with open("outputfile.fasta", "w") as output_handle:
 for i in range(number_of_records):
 dna_seq = choices(DNA, prob, k=length_of_records)
 dna_seq = ''.join(dna_seq)
 #Create a SeqRecord object
 record = SeqRecord(Seq(dna_seq),
 id="Chowder_"+str(i+1),
 name="Chow-Chow",
 description="random sequence")
 SeqIO.write(record, output_handle, "fasta")
asked Oct 9, 2023 at 2:19
\$\endgroup\$

1 Answer 1

4
\$\begingroup\$

DNA should be immutable - a tuple ().

prob should go away, and you can just omit it from the choices call. The default non-weighted behaviour will be the same as what you've written.

do_thing(); # Do thing comments like #Create a SeqRecord object are less helpful than having no comment at all, so you can just drop it.

"Chowder_"+str(i+1) is more easily written as f"Chowder_{i+1}".

It's not a great idea to re-assign a different type to the same variable dna_seq; just use a different name.

answered Oct 9, 2023 at 22:20
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.