This is my code to generate a FASTA file containing multiple records with randomized DNA sequences with distinct length. I am looking for feedback on how to write this script better.
"""
Generates random fasta file.
Number of records and length of each record is given by user.
"""
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from random import choices
DNA = ['A', 'T', 'C', 'G']
prob = [0.25, 0.25, 0.25, 0.25] #probability of each base
number_of_records = 100 #number of fasta records
length_of_records = 180 #length of dna sequnece
with open("outputfile.fasta", "w") as output_handle:
for i in range(number_of_records):
dna_seq = choices(DNA, prob, k=length_of_records)
dna_seq = ''.join(dna_seq)
#Create a SeqRecord object
record = SeqRecord(Seq(dna_seq),
id="Chowder_"+str(i+1),
name="Chow-Chow",
description="random sequence")
SeqIO.write(record, output_handle, "fasta")
1 Answer 1
DNA
should be immutable - a tuple ()
.
prob
should go away, and you can just omit it from the choices
call. The default non-weighted behaviour will be the same as what you've written.
do_thing(); # Do thing
comments like #Create a SeqRecord object
are less helpful than having no comment at all, so you can just drop it.
"Chowder_"+str(i+1)
is more easily written as f"Chowder_{i+1}"
.
It's not a great idea to re-assign a different type to the same variable dna_seq
; just use a different name.