module Biocaml_fasta:sig..end
# comment # comment ... >header sequence >header sequence ...
where the sequence may span multiple lines, and a ';' may be used instead of '#' to start comments.
Header lines begin with the '>' character. It is often considered that all characters until the first whitespace define the name of the content, and any characters beyond that define additional information in a format specific to the file provider.
Sequence are most often a sequence of characters denoting
nucleotides or amino acids. However, sometimes FASTA files provide
quality scores, either as ASCII encoded, e.g. as supported by
modules Biocaml_phred_score and Biocaml_solexa_score, or as space-separated integers.
Thus, the FASTA format is really a family of formats with a fairly loose specification of the header and content formats. The only consistently followed meaning of the format is:
sequence to generically
mean either kind of data found in the sequence lines, char_seq
to mean specifically a sequence of characters, and int_seq to
mean specifically a sequence of integers.
Parsing functions throughout this module take the following optional arguments:
filename - used only for error messages when the data source
is not the file.pedantic - if true, which is the default, report more
errors: Biocaml_transform.no_error lines, non standard
characters.sharp_comments and semicolon_comments - if true, allow
comments beginning with a '#' or ';' character,
respectively. Setting both to true is okay, although it is not
recommended to have such files. Setting both to false implies that
comments are disallowed.type char_seq = string
type int_seq = int list
type 'a item = {
header :string;
sequence :'a;
module Error:sig..end
exception Error of Error.t
val in_channel_to_char_seq_item_stream : ?buffer_size:int ->
?filename:string ->
?pedantic:bool ->
?sharp_comments:bool ->
?semicolon_comments:bool ->
Pervasives.in_channel -> char_seq item Stream.tchar_seq items. Initial comments are
discarded.Error in case of any errors.val in_channel_to_int_seq_item_stream : ?buffer_size:int ->
?filename:string ->
?pedantic:bool ->
?sharp_comments:bool ->
?semicolon_comments:bool ->
Pervasives.in_channel -> int_seq item Stream.tint_seq items. Initial comments are
discarded.Error in case of any errors.module Result:sig..end
module Transform:sig..end
val sexp_of_char_seq : char_seq -> Sexplib.Sexp.tval char_seq_of_sexp : Sexplib.Sexp.t -> char_seq val sexp_of_int_seq : int_seq -> Sexplib.Sexp.tval int_seq_of_sexp : Sexplib.Sexp.t -> int_seq val sexp_of_item : ('a -> Sexplib.Sexp.t) -> 'a item -> Sexplib.Sexp.tval item_of_sexp : (Sexplib.Sexp.t -> 'a) -> Sexplib.Sexp.t -> 'a item