Data structures to represent sets of (possibly annotated) genomic regions
This module is useful to deal with sets of genomic regions. It provides set operations like union, intersection, difference or membership tests. Specific data types are also provided when the regions are annotated with some value.
Genomic regions are represented as a pair formed by a range and an
abstract representation of a sequence/chromosome identifier. The
data structures implemented here are parameterized over this
abstract type. To obtain an implementation for the most common case
where chromosomes are identified with a string, simply apply the
functor Make on the String module.
The functor Make provides four datatypes, which corresponds to
variants where:
module Biocaml_genomeMap: sigmodule Make:functor (Chromosome:Chromosome) ->sigtype range =Biocaml_range.ttype location =Biocaml_genomeMap.Chromosome.t * rangeA collection of non-overlapping regions (e.g. a set of CpG islands)
module Selection:sigtype tval intersects :t -> Biocaml_genomeMap.Make.location -> boolintersects loc selreturnstrueiflochas a non-empty intersection withsel, andfalseotherwise.
val overlap :t -> Biocaml_genomeMap.Make.location -> intval to_stream :t ->
Biocaml_genomeMap.Make.location Stream.tval of_stream :Biocaml_genomeMap.Make.location Stream.t ->
tof_stream ecomputes a selection (i.e. a set of non overlapping locations) as the union of the locations contained ine
endmodule type Signal =sigtype'atval eval :'a t ->
default:'a -> Biocaml_genomeMap.Chromosome.t -> int -> 'afunction evaluation at some point in the genome
val fold :'a t ->
init:'c -> f:('c -> Biocaml_genomeMap.Make.location -> 'b -> 'c) -> 'cfolds on constant intervals of the function, in increasing order
val to_stream :'a t ->
(Biocaml_genomeMap.Make.location * 'a) Stream.tstream over all constant intervals of the function, in increasing order
val of_stream :('a -> 'a -> 'a) ->
(Biocaml_genomeMap.Make.location * 'a) Stream.t ->
'a tof_stream f lsbuilds a signal from a collection of annotated locations.fis used when two locations intersect, to compute the annotation on their intersection. *Beware*,fshould be associative and commutative since when many locations inlsintersect, there is no guarantee on the order followed to aggregate them and their annotation.
endPartial function over the genome (e.g.A set of locations (e.g. a set of gene loci)
module LSet:sigtype tval to_stream :t -> Biocaml_genomeMap.Make.location Stream.tval of_stream :Biocaml_genomeMap.Make.location Stream.t -> tval intersects :t -> Biocaml_genomeMap.Make.location -> boolintersects lset locreturnstrueiflochas a non-empty intersection with one of the locations inlset, and returnsfalseotherwise
val closest :t ->
Biocaml_genomeMap.Make.location ->
(Biocaml_genomeMap.Make.location * int) optionclosest lset locreturns the location inlsetthat is the closest toloc, along with the actual (minimal) distance. ReturnsNoneif there is no location inlsetthat comes from the same chromosome thanloc.
val intersecting_elems :t ->
Biocaml_genomeMap.Make.location -> Biocaml_genomeMap.Make.location Stream.tintersecting_elems lset locreturns a stream of all locations inlsetthat intersectloc.
endA set of locations with an attached value on each of them
module LMap:sigtype'atval to_stream :'a t ->
(Biocaml_genomeMap.Make.location * 'a) Stream.tval of_stream :(Biocaml_genomeMap.Make.location * 'a) Stream.t ->
'a tval intersects :'a t -> Biocaml_genomeMap.Make.location -> boolintersects lmap locreturnstrueiflochas a non-empty intersection with one of the locations inlmap, and returnsfalseotherwise
val closest :'a t ->
Biocaml_genomeMap.Make.location ->
(Biocaml_genomeMap.Make.location * 'a * int) optionclosest lmap locreturns the location inlmapthat is the closest toloc, along with its annotation and the actual (minimal) distance. ReturnsNoneif there is no location inlmapthat comes from the same chromosome thanloc.
val intersecting_elems :'a t ->
Biocaml_genomeMap.Make.location ->
(Biocaml_genomeMap.Make.location * 'a) Stream.tintersecting_elems lmap locreturns a stream of elements inlmapwhose location intersects withloc.
endend
end