SourceForge logo
SourceForge logo
Menu

gmod-schema — For discussion of GMOD schema development

You can subscribe to this list here.

2002 Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
(28)
Nov
(87)
Dec
(16)
2003 Jan
(109)
Feb
(107)
Mar
(117)
Apr
(5)
May
(156)
Jun
(83)
Jul
(86)
Aug
(25)
Sep
(17)
Oct
(14)
Nov
(82)
Dec
(50)
2004 Jan
(14)
Feb
(75)
Mar
(110)
Apr
(83)
May
(20)
Jun
(36)
Jul
(12)
Aug
(37)
Sep
(9)
Oct
(11)
Nov
(52)
Dec
(68)
2005 Jan
(46)
Feb
(94)
Mar
(68)
Apr
(55)
May
(67)
Jun
(65)
Jul
(67)
Aug
(96)
Sep
(79)
Oct
(46)
Nov
(24)
Dec
(64)
2006 Jan
(39)
Feb
(31)
Mar
(48)
Apr
(58)
May
(31)
Jun
(57)
Jul
(29)
Aug
(40)
Sep
(22)
Oct
(31)
Nov
(44)
Dec
(51)
2007 Jan
(103)
Feb
(172)
Mar
(59)
Apr
(41)
May
(33)
Jun
(50)
Jul
(60)
Aug
(51)
Sep
(21)
Oct
(40)
Nov
(89)
Dec
(39)
2008 Jan
(28)
Feb
(20)
Mar
(19)
Apr
(29)
May
(29)
Jun
(24)
Jul
(32)
Aug
(16)
Sep
(35)
Oct
(23)
Nov
(17)
Dec
(19)
2009 Jan
(4)
Feb
(23)
Mar
(16)
Apr
(16)
May
(38)
Jun
(54)
Jul
(18)
Aug
(40)
Sep
(58)
Oct
(6)
Nov
(8)
Dec
(29)
2010 Jan
(40)
Feb
(40)
Mar
(63)
Apr
(95)
May
(136)
Jun
(58)
Jul
(91)
Aug
(55)
Sep
(77)
Oct
(52)
Nov
(85)
Dec
(37)
2011 Jan
(22)
Feb
(46)
Mar
(73)
Apr
(138)
May
(75)
Jun
(35)
Jul
(41)
Aug
(13)
Sep
(13)
Oct
(11)
Nov
(21)
Dec
(5)
2012 Jan
(13)
Feb
(34)
Mar
(59)
Apr
(4)
May
(13)
Jun
(1)
Jul
(1)
Aug
(1)
Sep
(3)
Oct
(2)
Nov
(4)
Dec
(1)
2013 Jan
(18)
Feb
(28)
Mar
(19)
Apr
(42)
May
(43)
Jun
(41)
Jul
(41)
Aug
(31)
Sep
(6)
Oct
(2)
Nov
(2)
Dec
(70)
2014 Jan
(55)
Feb
(98)
Mar
(44)
Apr
(40)
May
(15)
Jun
(18)
Jul
(20)
Aug
(1)
Sep
(13)
Oct
(3)
Nov
(37)
Dec
(85)
2015 Jan
(16)
Feb
(12)
Mar
(16)
Apr
(13)
May
(16)
Jun
(3)
Jul
(23)
Aug
Sep
Oct
Nov
(9)
Dec
(2)
2016 Jan
(12)
Feb
(1)
Mar
(9)
Apr
(13)
May
(4)
Jun
(5)
Jul
Aug
Sep
(10)
Oct
(11)
Nov
(1)
Dec
2017 Jan
Feb
(1)
Mar
(11)
Apr
(8)
May
Jun
(6)
Jul
Aug
Sep
Oct
(3)
Nov
(2)
Dec
(1)
2018 Jan
(6)
Feb
(6)
Mar
(3)
Apr
(9)
May
(3)
Jun
Jul
Aug
(3)
Sep
(8)
Oct
(1)
Nov
(1)
Dec
(4)
2019 Jan
(4)
Feb
Mar
(1)
Apr
May
(2)
Jun
Jul
Aug
Sep
Oct
(2)
Nov
(1)
Dec
2020 Jan
(22)
Feb
(4)
Mar
Apr
May
Jun
(1)
Jul
(2)
Aug
(2)
Sep
(1)
Oct
Nov
Dec
(1)
2021 Jan
Feb
Mar
Apr
May
(1)
Jun
Jul
(2)
Aug
(2)
Sep
Oct
Nov
Dec
2022 Jan
(1)
Feb
Mar
(1)
Apr
May
Jun
Jul
Aug
(2)
Sep
Oct
Nov
Dec
2023 Jan
Feb
Mar
(1)
Apr
(1)
May
(5)
Jun
Jul
Aug
Sep
Oct
Nov
Dec
2024 Jan
Feb
Mar
Apr
May
Jun
Jul
(3)
Aug
(3)
Sep
Oct
Nov
Dec
2025 Jan
Feb
Mar
Apr
(1)
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
S M T W T F S






1
(1)
2
3
(7)
4
(5)
5
(6)
6
(5)
7
(15)
8
(2)
9
(1)
10
(4)
11
(17)
12
(11)
13
(10)
14
(5)
15
16
17
18
(1)
19
(1)
20
(2)
21
22
23
24
25
(16)
26
(4)
27
(4)
28
29
30
31





Showing results of 117

1 2 3 .. 5 > >> (Page 1 of 5)
From: Hilmar L. <hl...@gn...> - 2003年03月27日 20:31:39
On Thursday, March 27, 2003, at 12:09 PM, Scott Cain wrote:
> On Thu, 2003年03月27日 at 14:52, Hilmar Lapp wrote:
>>
>> BTW the fact that turning off X-windows helps means you don't have a
>> lot of memory on the box? What that would do is not give PostgreSQL
>> more memory, but more memory available to the kernel disk cache (which
>> is essentially what Postgres needs).
>
> Hilmar,
>
> "a lot of memory" is very subjective. When I ordered this laptop (a
> Dell Latitude C840) I thought a half a gig was a lot of memory. Is 
> more
> memory added to the disk cache via /etc/sysctl.conf by changing
> (upping?) kernel.shmmax?
The kernel should grab whatever is available in free buffers. But you 
may indeed have to adjust the kernel parameters for shmmax before your 
shared_buffers setting will take full effect. There is a doc on the Pg 
site I believe that says how to do this. Don't have the link off hand 
and don't remember the commands anymore, but I can tell that I did have 
to do this for Mac OSX.
	-hilmar
> That was a suggested optimization given at
> http://www.lyris.com/lm_help/6.0/tuning_postgresql.html.
>
> Thanks,
> Scott
>
> -- 
> ----------------------------------------------------------------------- 
> -
> Scott Cain, Ph. D. 
> ca...@cs...
> GMOD Coordinator (http://www.gmod.org/) 
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
From: Scott C. <ca...@cs...> - 2003年03月27日 20:10:18
On Thu, 2003年03月27日 at 14:52, Hilmar Lapp wrote:
> 
> BTW the fact that turning off X-windows helps means you don't have a 
> lot of memory on the box? What that would do is not give PostgreSQL 
> more memory, but more memory available to the kernel disk cache (which 
> is essentially what Postgres needs).
Hilmar,
"a lot of memory" is very subjective. When I ordered this laptop (a
Dell Latitude C840) I thought a half a gig was a lot of memory. Is more
memory added to the disk cache via /etc/sysctl.conf by changing
(upping?) kernel.shmmax? That was a suggested optimization given at
http://www.lyris.com/lm_help/6.0/tuning_postgresql.html.
Thanks,
Scott
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. ca...@cs...
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
From: Hilmar L. <hl...@gn...> - 2003年03月27日 19:52:33
On Thursday, March 27, 2003, at 10:17 AM, Scott Cain wrote:
> So, what did I miss?
>
Nothing at first sight. The most notable (and totally expected) result 
of the timings is that the range overlap query has huge variance and 
hence its performance is unreliable, although excellent if you happen 
to slice the data in a fortunate way. The geometric variant won't 
return lightning fast ever, but it also won't be terribly slow ever.
The reason the range overlap query is unreliable is that it depends so 
much on how you slice the index tree with your first two conditions 
(feature_id and max, assuming the index is (feature_id,max,min)). It is 
easily possible that you have to read half of the index from disk in 
order to filter for the third condition (min), which is going to be 
expensive (and given enough memory for disk cache even more expensive 
than a sequential table scan, because you need to read from the table 
anyway subsequently). With the geometric query it appears the size of 
the slice is much more consistent, although never as small as with the 
composite index in the lucky cases.
BTW the fact that turning off X-windows helps means you don't have a 
lot of memory on the box? What that would do is not give PostgreSQL 
more memory, but more memory available to the kernel disk cache (which 
is essentially what Postgres needs).
	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
From: Scott C. <ca...@cs...> - 2003年03月27日 18:17:24
Attachments: benchmark.pl parse_log.pl
Hello all,
I've spent the last couple of days learning what I can about Postgres
tuning, though I'm not sure it did much good. Here is what I did to
benchmark comparisons between two queries, one using min/max columns in
featureloc (Query1), the other using an RTree index in featureloc
(Query2). I wrote a short Perl script to run several ranges on all of
the arms in gadfly (the specific build of gadfly-chado I used was 3b
without residues for the arms). It ran 720 queries for each Query,
covering ranges from 1000 to 500000 bp. Here is a table of my results
(all numbers are in seconds):
 Query1 | Query2
 mean stdev var min max | mean stdev var min max
------------------------------------------------------------------------------
native 2.47 8.27 68.47 0.0015 62.2 | 1.82 2.00 4.02 0.028 14.16
 |
opt1 2.57 8.85 78.32 0.0002 69.3 | 1.76 1.91 3.66 0.013 12.74
 |
opt2 2.57 8.64 74.58 0.0019 67.9 | 1.77 2.14 4.57 0.014 29.66
 |
opt3 2.47 8.29 68.78 0.0004 62.9 | 1.77 1.93 3.71 0.019 12.64
 |
opt4 2.16 8.51 72.49 0.0034 66.6 | 1.52 1.65 2.71 0.016 12.04
Now for boatloads of notes:
native: no optimizations done, only VACUUM ANALYZE before running
opt1: effective_cache_size = 2000 #default 1000
 sort_mem = 4096 #default 1000
 shared_buffers = 2000 #default 64
opt2: effective_cache_size = 2000 
 sort_mem = 4096 
 shared_buffers = 1000 
opt3: effective_cache_size = 2000 
 sort_mem = 2048 
 shared_buffers = 1000 
opt4: effective_cache_size = 2000 
 sort_mem = 2048 
 shared_buffers = 1000 
 XWindows OFF
The times are in wall clock seconds, as extracted from syslog. EXPLAIN
ANALYZE generally gives more optimistic numbers.
Query1:
select distinct f.name,fl.min,fl.max,fl.strand,f.type_id,f.feature_id
from feature f, featureloc fl 
where fl.srcfeature_id = ? and 
 f.feature_id = fl.feature_id and 
 fl.max >= ? and fl.min <= ?
Query2:
select distinct f.name,fl.min,fl.max,fl.strand,f.type_id,f.feature_id
from feature f, featureslice(?,?) fl 
where fl.srcfeature_id = ? and 
 f.feature_id = fl.feature_id
Comments about the data:
These data present a mixed bag. Clearly, if all we look at is lump
statistics, Query2 wins. In every case, it is faster, and has smaller
standard deviations and variances. However, it is interesting to note
that Query1 nearly always easily wins in the minimum time column. So
why would it be so fast sometimes and so slow at others? While I don't
have a good answer to that, I looked at the raw data to look for
trends. Query1 and Query2 perform comparably for most of the test,
however, when Query1 gets some distance into srcfeature_id 6 (arm X), it
falls apart for every range size, with query times going over 60 seconds
consistently. (Let me explain "some distance" a little better: when it
starts with srcfeature_id 6, it does fine for 20 or so queries (times
less than 3 seconds, then abruptly query times go to about 60 seconds
and stays there.) That explains the very large variance for that data
set. As I recall from statistics, variance is a measure of how
symmetric a data set is, and this data set is bimodal. I can tell you
that it is not because postgres suddenly decided to start using
seqscans. I interrupted one run while it was doing these slow queries
and ran EXPLAIN ANALYZE on a query and it was using appropriate indexes
for each table. The most time consuming step was the index scan on
feature_id in feature.
As for the "optimizations," they mostly don't seem to matter; in fact,
for Query1, it made it worse as often as better. I believe this is
because I tried to hard to optimize, allocating more memory than I had
to give, causing disk swapping sometimes. The only "optimization" that
mattered noticeably was running with XWindows off, which is essentially
giving the database more memory. Go figure, give the database more
memory and it behaves better. I believe that the take home message is
this: while sometimes Query1 performs well, Query2 is generally safer
and probably ought to be used for ROI queries.
A few comments about methods:
I've attached the two Perl scripts I wrote to do this work. The first,
benchmark.pl, uses DBI to run though arms and ranges. The order in which
it does things is this: pick a query, pick an arm, pick a starting
point, pick a range. Putting range iteration on the inner loop was done
on purpose to simulate what you might expect to happen when using
gbrowse, and perhaps letting postgres take advantage of caching. The
second script parses /var/log/messages to get durations for each query
and perform statistics on the data. To get duration data to go to
syslog, I set log_pid = true, log_statement = true, log_duration = true
in postgres.conf and restarted postgres. As I noted above, this puts
wall clock time in syslog, so it is actually a better representation of
real world performance. I used /usr/sbin/logrotate -f
/etc/logrotate.conf to force log rotation between runs.
So, what did I miss?
Scott
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. ca...@cs...
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
From: Colin W. <cw...@fr...> - 2003年03月26日 18:40:14
Just to be clear:
The feature graph is defined using the feature_relationship table. Hits 
have no parents and hsp's have one parent (the hit that it is part of). 
Locations, on the other hand, are specified in featureloc. In gadfly3b, 
hits have no entries and hsp's have two - one specifying the location of 
the hsp on the query sequence (the chromosome arm), which gets a rank or 
0, and one specifying the location of the hsp on the subject sequence 
(e.g. a protein sequence from another species), which gets a rank of 1.
Colin
Scott Cain wrote:
>Sorry--I meant that hits have two parents, but that isn't relevant to
>this issue.
>
>On Tue, 2003年03月25日 at 19:39, Chris Mungall wrote:
> 
>
>>Ok, I'm completely confused. When would an HSP have two parents?
>>
>>On 25 Mar 2003, Scott Cain wrote:
>>
>> 
>>
>>>Ahh, yes you've pointed out the problem in another way--HSPs have two
>>>parents. From gbrowse's perspective, the parent that matters is the
>>>scaffold/chromosome/arm/whatever.
>>>
>>>On Tue, 2003年03月25日 at 17:09, Charles Hauser wrote:
>>> 
>>>
>>>>On Tue, 2003年03月25日 at 17:00, Scott Cain wrote:
>>>> 
>>>>
>>>>>I think perhaps I didn't explain it clearly enough. There is a
>>>>>parent-child relationship, but the parent doesn't have any coordinates
>>>>>associated with it, so it can never show up in a ROI query. Now I am
>>>>>sure there are good reasons for not having coordinates associated with
>>>>>the parent hit: what if an unmasked repeat was blasted against the
>>>>>genome? The coordinates would range from one end of genome to the
>>>>>other, and then it would show up in EVERY ROI query, which is just as
>>>>>useless.
>>>>> 
>>>>>
>>>>
>>>>In the case of mapping ests wouldn't the parent be the FL EST seq w/
>>>>children represented by the exonic subseqs?
>>>>
>>>>
>>>>(P)alignment_hit: est[start..end] --> featureloc	[....]
>>>>(C)	alignment_hsp: exon1[start..end] --> featureloc	[....]
>>>>(C)	alignment_hsp: exon2[start..end] --> featureloc	[....]
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>-------------------------------------------------------
>>>>This SF.net email is sponsored by:
>>>>The Definitive IT and Networking Event. Be There!
>>>>NetWorld+Interop Las Vegas 2003 -- Register today!
>>>>http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
>>>>_______________________________________________
>>>>Gmod-schema mailing list
>>>>Gmo...@li...
>>>>https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>> 
>>>>
From: Scott C. <ca...@cs...> - 2003年03月26日 03:18:14
Sorry--I meant that hits have two parents, but that isn't relevant to
this issue.
On Tue, 2003年03月25日 at 19:39, Chris Mungall wrote:
> Ok, I'm completely confused. When would an HSP have two parents?
> 
> On 25 Mar 2003, Scott Cain wrote:
> 
> > Ahh, yes you've pointed out the problem in another way--HSPs have two
> > parents. From gbrowse's perspective, the parent that matters is the
> > scaffold/chromosome/arm/whatever.
> >
> > On Tue, 2003年03月25日 at 17:09, Charles Hauser wrote:
> > > On Tue, 2003年03月25日 at 17:00, Scott Cain wrote:
> > > > I think perhaps I didn't explain it clearly enough. There is a
> > > > parent-child relationship, but the parent doesn't have any coordinates
> > > > associated with it, so it can never show up in a ROI query. Now I am
> > > > sure there are good reasons for not having coordinates associated with
> > > > the parent hit: what if an unmasked repeat was blasted against the
> > > > genome? The coordinates would range from one end of genome to the
> > > > other, and then it would show up in EVERY ROI query, which is just as
> > > > useless.
> > >
> > >
> > >
> > > In the case of mapping ests wouldn't the parent be the FL EST seq w/
> > > children represented by the exonic subseqs?
> > >
> > >
> > > (P)alignment_hit: est[start..end] --> featureloc	[....]
> > > (C)	alignment_hsp: exon1[start..end] --> featureloc	[....]
> > > (C)	alignment_hsp: exon2[start..end] --> featureloc	[....]
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > -------------------------------------------------------
> > > This SF.net email is sponsored by:
> > > The Definitive IT and Networking Event. Be There!
> > > NetWorld+Interop Las Vegas 2003 -- Register today!
> > > http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
> > > _______________________________________________
> > > Gmod-schema mailing list
> > > Gmo...@li...
> > > https://lists.sourceforge.net/lists/listinfo/gmod-schema
> >
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. ca...@cs...
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
From: Chris M. <cj...@fr...> - 2003年03月26日 01:07:45
On 25 Mar 2003, Scott Cain wrote:
> Hi Chris,
>
> I know it's not set in stone, which explains my reluctance to write
> special case code.
>
> I'm curious, did you try the code below? You said it could work. What
> if there are cases that fall outside your assumptions? Ignored
> (likely)? What if there are cases like I envisioned (an unmasked repeat
> gets in that then covers several megabases on one arm)?
looking at the two assumptions; ignore the mechanics of the query for the
moment and the issues of hits vs other features
* child featurelocs all have identical (rank=0) strands
I am pretty sure there is only once case of this being violated in the
db, that is the transcpliced gene mod(mdg4). This is stored as transcripts
with mixed-strand exons. (this is not the plan for how to store
transspliced genes in general but dealing with all these pathological
cases in the conversion is difficult).
What should the nbeg, nend and strand be in this case? my SQL below would
choose the outermost boundaries and an arbitrary strand. In actual fact
the db uses the strand of the initial exon, which is partly logical,
partly arbitrary.
* child featurelocs all have identical (rank=0) srcfeature_ids
I don't think there is any cases of this being violated in your
instantiation, but this is perfectly acceptable. For instance, you may
have an alignment spanning two contigs (obviously you'd have to do some
processing to get this, rather than directly from a single blast report).
Under these circumstances, the SQL will populate the parent featureloc
with arbitrary nbeg, nend and srcfeature_id. This is bad. But then, what
is the correct thing to do here?
You could choose an arbitrary child srcfeature_id and use that. This is
better, but still bad.
You could have multiple featurelocs for the parent (incrementing locgroup
by one for every child featureloc).
Or you could simply choose not to instantiate a featureloc for non-leaf
features. This would have to be applied consistently. This has the
advantage of zero redundancy, but the disadvantage of slower range based
queries. This is why I favour a hybrid approach, depending on whether the
instantiation is for querying or managing data.
* big hits
there shouldn't be any cases of chromosome-spanning hits such as the
repeat example you mentioned. we process all our blasts before loading
them into the db. if someone does have a hit with hsps spanning the
chromosome, then it's logical to treat this the same as all the others.
This is fine I think.
* other assumptions
There are some other unstated assumptions - eg that at least one HSP will
have stand == -1 or +1, at least one will have nbeg, nend and
srcfeature_id set. The behaviour of the query should be completely
predictable. The declarative nature of SQL makes this easier to check and
actually prove what the query will do. However, I'm too lazy to actually
check that so make sure you have a backup before actually trying it,
otherwise all manner of awful things might happen.
(I just realised you also need to set the strand as well)
> Thanks,
> Scott
>
> On Tue, 2003年03月25日 at 17:20, Chris Mungall wrote:
> > Hi Scott
> >
> > Remember a lot of this is just artifacts of how the gadfly->chado copy was
> > generated. please don't take any of this as set in stone, succeeding chado
> > instantiations will not do this the same way. Clearly the policy as to
> > whether we instantiate locations for non-leaf features in the feature
> > composition graph must be consistent - either only leaf features will have
> > locations, or all locatable features will have locations. See my previous
> > email on this subject.
> >
> > If you want to populate hit featurelocs based on hsp featurelocs, this
> > piece of SQL might work. It should be generalisable to any fixed-level
> > feature composition graph. It's possibly really slow - you could do the
> > same thing imperatively in the application code.
> >
> > --- populates parent locations based on maximal extent of child
> > --- locations.
> > --- * assumes that no hits contain hsps on mixed strands * ---
> > --- * assumes all hsps are on the same feature * ---
> > --- note: min(hsploc1.srcfeature_id) is there because we need
> > --- an aggregate function; in fact all srcfeature_ids should be
> > --- the same
> > ---
> > --- this should really be split into a view part and an insert
> > --- part
> > ---
> > --- WARNING: type_id is hardcoded below - check your instantiation
> >
> > INSERT INTO featureloc (feature_id, nbeg, nend, srcfeature_id)
> > SELECT DISTINCT hit.feature_id,
> > min(hsploc1.nbeg), max(hsploc2.nend),
> > min(hsploc1.srcfeature_id)
> > FROM
> > feature AS hit
> > INNER JOIN
> > feature_relationship ON (hit.feature_id = objfeature_id)
> > INNER JOIN
> > featureloc AS hsploc1 ON (hsploc1.feature_id = subjfeature_id)
> > INNER JOIN
> > featureloc AS hsploc2 ON (hsploc2.feature_id = subjfeature_id)
> > WHERE hit.type_id = 28
> > AND hsploc1.strand == 1
> > AND hsploc1.rank = 0
> > AND hsploc2.rank = 0
> > GROUP BY hit.feature_id
> > UNION
> > SELECT DISTINCT hit.feature_id,
> > max(hsploc1.nbeg), min(hsploc2.nend),
> > min(hsploc1.srcfeature_id)
> > FROM
> > feature AS hit
> > INNER JOIN
> > feature_relationship ON (hit.feature_id = objfeature_id)
> > INNER JOIN
> > featureloc AS hsploc1 ON (hsploc1.feature_id = subjfeature_id)
> > INNER JOIN
> > featureloc AS hsploc2 ON (hsploc2.feature_id = subjfeature_id)
> > WHERE hit.type_id = 28
> > AND hsploc1.strand == -1
> > AND hsploc1.rank = 0
> > AND hsploc2.rank = 0
> > GROUP BY hit.feature_id;
> >
> >
> >
> > On 25 Mar 2003, Scott Cain wrote:
> >
> > > I think the problem is finding relationships between HSPs. The problem
> > > is that there are no featureloc entries for alignment hits (the parent
> > > of HSPs), so they never show up in a ROI query. That is a problem
> > > because GBrowse works by getting parent objects in a range, then finding
> > > out what children it has, so it can use that information to connect
> > > related HSPs. (This is the way it works for gene->transcript->exon.)
> > > Since HSPs don't work that way, I have to write special case code to
> > > first find the HSPs, then find the parent hits, then associate the
> > > parents with the children. It is a fairly straight forward problem, I
> > > just haven't take the time to write special case code (I'm hoping the
> > > need will go away :-)
> > >
> > > Scott
> > >
> > > On Tue, 2003年03月25日 at 15:41, SLe...@ao... wrote:
> > > > In a message dated 3/25/2003 3:21:17 PM Eastern Standard Time,
> > > > ca...@cs... writes:
> > > >
> > > > > Yep, that would work. The problem with HSP data is that they have
> > > > > more
> > > > > complicated relationships than other features (at least the way it
> > > > > is
> > > > > stored in the gadfly port), I just haven't written the SQL and
> > > > > related
> > > > > code to take it into account.
> > > > >
> > > > > Scott
> > > >
> > > >
> > > > Scott,
> > > >
> > > > HSPs are just features to a first approx. There is some
> > > > complexity around
> > > > the full HSP as something potentially linked both to source and
> > > > target, but you are
> > > > only Gbrowsing one of these at a time. They have a distinctive type
> > > > ("alignment hsp" in the version I have). Can't you just display them
> > > > as features of that type?
> > > >
> > > > Cheers, -Stan
> > >
>
From: Chris M. <cj...@fr...> - 2003年03月26日 00:39:31
Ok, I'm completely confused. When would an HSP have two parents?
On 25 Mar 2003, Scott Cain wrote:
> Ahh, yes you've pointed out the problem in another way--HSPs have two
> parents. From gbrowse's perspective, the parent that matters is the
> scaffold/chromosome/arm/whatever.
>
> On Tue, 2003年03月25日 at 17:09, Charles Hauser wrote:
> > On Tue, 2003年03月25日 at 17:00, Scott Cain wrote:
> > > I think perhaps I didn't explain it clearly enough. There is a
> > > parent-child relationship, but the parent doesn't have any coordinates
> > > associated with it, so it can never show up in a ROI query. Now I am
> > > sure there are good reasons for not having coordinates associated with
> > > the parent hit: what if an unmasked repeat was blasted against the
> > > genome? The coordinates would range from one end of genome to the
> > > other, and then it would show up in EVERY ROI query, which is just as
> > > useless.
> >
> >
> >
> > In the case of mapping ests wouldn't the parent be the FL EST seq w/
> > children represented by the exonic subseqs?
> >
> >
> > (P)alignment_hit: est[start..end] --> featureloc	[....]
> > (C)	alignment_hsp: exon1[start..end] --> featureloc	[....]
> > (C)	alignment_hsp: exon2[start..end] --> featureloc	[....]
> >
> >
> >
> >
> >
> >
> >
> > -------------------------------------------------------
> > This SF.net email is sponsored by:
> > The Definitive IT and Networking Event. Be There!
> > NetWorld+Interop Las Vegas 2003 -- Register today!
> > http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
> > _______________________________________________
> > Gmod-schema mailing list
> > Gmo...@li...
> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
From: Scott C. <ca...@cs...> - 2003年03月25日 22:28:03
Hi Chris,
I know it's not set in stone, which explains my reluctance to write
special case code.
I'm curious, did you try the code below? You said it could work. What
if there are cases that fall outside your assumptions? Ignored
(likely)? What if there are cases like I envisioned (an unmasked repeat
gets in that then covers several megabases on one arm)?
Thanks,
Scott
On Tue, 2003年03月25日 at 17:20, Chris Mungall wrote:
> Hi Scott
> 
> Remember a lot of this is just artifacts of how the gadfly->chado copy was
> generated. please don't take any of this as set in stone, succeeding chado
> instantiations will not do this the same way. Clearly the policy as to
> whether we instantiate locations for non-leaf features in the feature
> composition graph must be consistent - either only leaf features will have
> locations, or all locatable features will have locations. See my previous
> email on this subject.
> 
> If you want to populate hit featurelocs based on hsp featurelocs, this
> piece of SQL might work. It should be generalisable to any fixed-level
> feature composition graph. It's possibly really slow - you could do the
> same thing imperatively in the application code.
> 
> --- populates parent locations based on maximal extent of child
> --- locations.
> --- * assumes that no hits contain hsps on mixed strands * ---
> --- * assumes all hsps are on the same feature * ---
> --- note: min(hsploc1.srcfeature_id) is there because we need
> --- an aggregate function; in fact all srcfeature_ids should be
> --- the same
> ---
> --- this should really be split into a view part and an insert
> --- part
> ---
> --- WARNING: type_id is hardcoded below - check your instantiation
> 
> INSERT INTO featureloc (feature_id, nbeg, nend, srcfeature_id)
> SELECT DISTINCT hit.feature_id,
> min(hsploc1.nbeg), max(hsploc2.nend),
> min(hsploc1.srcfeature_id)
> FROM
> feature AS hit
> INNER JOIN
> feature_relationship ON (hit.feature_id = objfeature_id)
> INNER JOIN
> featureloc AS hsploc1 ON (hsploc1.feature_id = subjfeature_id)
> INNER JOIN
> featureloc AS hsploc2 ON (hsploc2.feature_id = subjfeature_id)
> WHERE hit.type_id = 28
> AND hsploc1.strand == 1
> AND hsploc1.rank = 0
> AND hsploc2.rank = 0
> GROUP BY hit.feature_id
> UNION
> SELECT DISTINCT hit.feature_id,
> max(hsploc1.nbeg), min(hsploc2.nend),
> min(hsploc1.srcfeature_id)
> FROM
> feature AS hit
> INNER JOIN
> feature_relationship ON (hit.feature_id = objfeature_id)
> INNER JOIN
> featureloc AS hsploc1 ON (hsploc1.feature_id = subjfeature_id)
> INNER JOIN
> featureloc AS hsploc2 ON (hsploc2.feature_id = subjfeature_id)
> WHERE hit.type_id = 28
> AND hsploc1.strand == -1
> AND hsploc1.rank = 0
> AND hsploc2.rank = 0
> GROUP BY hit.feature_id;
> 
> 
> 
> On 25 Mar 2003, Scott Cain wrote:
> 
> > I think the problem is finding relationships between HSPs. The problem
> > is that there are no featureloc entries for alignment hits (the parent
> > of HSPs), so they never show up in a ROI query. That is a problem
> > because GBrowse works by getting parent objects in a range, then finding
> > out what children it has, so it can use that information to connect
> > related HSPs. (This is the way it works for gene->transcript->exon.)
> > Since HSPs don't work that way, I have to write special case code to
> > first find the HSPs, then find the parent hits, then associate the
> > parents with the children. It is a fairly straight forward problem, I
> > just haven't take the time to write special case code (I'm hoping the
> > need will go away :-)
> >
> > Scott
> >
> > On Tue, 2003年03月25日 at 15:41, SLe...@ao... wrote:
> > > In a message dated 3/25/2003 3:21:17 PM Eastern Standard Time,
> > > ca...@cs... writes:
> > >
> > > > Yep, that would work. The problem with HSP data is that they have
> > > > more
> > > > complicated relationships than other features (at least the way it
> > > > is
> > > > stored in the gadfly port), I just haven't written the SQL and
> > > > related
> > > > code to take it into account.
> > > >
> > > > Scott
> > >
> > >
> > > Scott,
> > >
> > > HSPs are just features to a first approx. There is some
> > > complexity around
> > > the full HSP as something potentially linked both to source and
> > > target, but you are
> > > only Gbrowsing one of these at a time. They have a distinctive type
> > > ("alignment hsp" in the version I have). Can't you just display them
> > > as features of that type?
> > >
> > > Cheers, -Stan
> >
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. ca...@cs...
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
From: Chris M. <cj...@fr...> - 2003年03月25日 22:20:28
Hi Scott
Remember a lot of this is just artifacts of how the gadfly->chado copy was
generated. please don't take any of this as set in stone, succeeding chado
instantiations will not do this the same way. Clearly the policy as to
whether we instantiate locations for non-leaf features in the feature
composition graph must be consistent - either only leaf features will have
locations, or all locatable features will have locations. See my previous
email on this subject.
If you want to populate hit featurelocs based on hsp featurelocs, this
piece of SQL might work. It should be generalisable to any fixed-level
feature composition graph. It's possibly really slow - you could do the
same thing imperatively in the application code.
--- populates parent locations based on maximal extent of child
--- locations.
--- * assumes that no hits contain hsps on mixed strands * ---
--- * assumes all hsps are on the same feature * ---
--- note: min(hsploc1.srcfeature_id) is there because we need
--- an aggregate function; in fact all srcfeature_ids should be
--- the same
---
--- this should really be split into a view part and an insert
--- part
---
--- WARNING: type_id is hardcoded below - check your instantiation
INSERT INTO featureloc (feature_id, nbeg, nend, srcfeature_id)
 SELECT DISTINCT hit.feature_id,
 min(hsploc1.nbeg), max(hsploc2.nend),
 min(hsploc1.srcfeature_id)
 FROM
 feature AS hit
 INNER JOIN
 feature_relationship ON (hit.feature_id = objfeature_id)
 INNER JOIN
 featureloc AS hsploc1 ON (hsploc1.feature_id = subjfeature_id)
 INNER JOIN
 featureloc AS hsploc2 ON (hsploc2.feature_id = subjfeature_id)
 WHERE hit.type_id = 28
 AND hsploc1.strand == 1
 AND hsploc1.rank = 0
 AND hsploc2.rank = 0
 GROUP BY hit.feature_id
 UNION
 SELECT DISTINCT hit.feature_id,
 max(hsploc1.nbeg), min(hsploc2.nend),
 min(hsploc1.srcfeature_id)
 FROM
 feature AS hit
 INNER JOIN
 feature_relationship ON (hit.feature_id = objfeature_id)
 INNER JOIN
 featureloc AS hsploc1 ON (hsploc1.feature_id = subjfeature_id)
 INNER JOIN
 featureloc AS hsploc2 ON (hsploc2.feature_id = subjfeature_id)
 WHERE hit.type_id = 28
 AND hsploc1.strand == -1
 AND hsploc1.rank = 0
 AND hsploc2.rank = 0
 GROUP BY hit.feature_id;
On 25 Mar 2003, Scott Cain wrote:
> I think the problem is finding relationships between HSPs. The problem
> is that there are no featureloc entries for alignment hits (the parent
> of HSPs), so they never show up in a ROI query. That is a problem
> because GBrowse works by getting parent objects in a range, then finding
> out what children it has, so it can use that information to connect
> related HSPs. (This is the way it works for gene->transcript->exon.)
> Since HSPs don't work that way, I have to write special case code to
> first find the HSPs, then find the parent hits, then associate the
> parents with the children. It is a fairly straight forward problem, I
> just haven't take the time to write special case code (I'm hoping the
> need will go away :-)
>
> Scott
>
> On Tue, 2003年03月25日 at 15:41, SLe...@ao... wrote:
> > In a message dated 3/25/2003 3:21:17 PM Eastern Standard Time,
> > ca...@cs... writes:
> >
> > > Yep, that would work. The problem with HSP data is that they have
> > > more
> > > complicated relationships than other features (at least the way it
> > > is
> > > stored in the gadfly port), I just haven't written the SQL and
> > > related
> > > code to take it into account.
> > >
> > > Scott
> >
> >
> > Scott,
> >
> > HSPs are just features to a first approx. There is some
> > complexity around
> > the full HSP as something potentially linked both to source and
> > target, but you are
> > only Gbrowsing one of these at a time. They have a distinctive type
> > ("alignment hsp" in the version I have). Can't you just display them
> > as features of that type?
> >
> > Cheers, -Stan
>
From: Scott C. <ca...@cs...> - 2003年03月25日 22:14:43
Ahh, yes you've pointed out the problem in another way--HSPs have two
parents. From gbrowse's perspective, the parent that matters is the
scaffold/chromosome/arm/whatever.
On Tue, 2003年03月25日 at 17:09, Charles Hauser wrote:
> On Tue, 2003年03月25日 at 17:00, Scott Cain wrote:
> > I think perhaps I didn't explain it clearly enough. There is a
> > parent-child relationship, but the parent doesn't have any coordinates
> > associated with it, so it can never show up in a ROI query. Now I am
> > sure there are good reasons for not having coordinates associated with
> > the parent hit: what if an unmasked repeat was blasted against the
> > genome? The coordinates would range from one end of genome to the
> > other, and then it would show up in EVERY ROI query, which is just as
> > useless.
> 
> 
> 
> In the case of mapping ests wouldn't the parent be the FL EST seq w/ 
> children represented by the exonic subseqs?
> 
> 
> (P)alignment_hit: est[start..end] --> featureloc	[....]		
> (C)	alignment_hsp: exon1[start..end] --> featureloc	[....]
> (C)	alignment_hsp: exon2[start..end] --> featureloc	[....]
> 		
> 
> 
> 
> 
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by:
> The Definitive IT and Networking Event. Be There!
> NetWorld+Interop Las Vegas 2003 -- Register today!
> http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
> _______________________________________________
> Gmod-schema mailing list
> Gmo...@li...
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. ca...@cs...
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
From: Charles H. <ch...@du...> - 2003年03月25日 22:09:50
On Tue, 2003年03月25日 at 17:00, Scott Cain wrote:
> I think perhaps I didn't explain it clearly enough. There is a
> parent-child relationship, but the parent doesn't have any coordinates
> associated with it, so it can never show up in a ROI query. Now I am
> sure there are good reasons for not having coordinates associated with
> the parent hit: what if an unmasked repeat was blasted against the
> genome? The coordinates would range from one end of genome to the
> other, and then it would show up in EVERY ROI query, which is just as
> useless.
In the case of mapping ests wouldn't the parent be the FL EST seq w/ 
children represented by the exonic subseqs?
(P)alignment_hit: est[start..end] --> featureloc	[....]		
(C)	alignment_hsp: exon1[start..end] --> featureloc	[....]
(C)	alignment_hsp: exon2[start..end] --> featureloc	[....]
		
From: Scott C. <ca...@cs...> - 2003年03月25日 22:05:06
Yes, I think that would do it. Also make sure that there are part_of
relationships (and not a 'contains' relationships) defined in
feature_relationship for the exon->transcript->gene
On Tue, 2003年03月25日 at 16:59, Charles Hauser wrote:
> So, the way to load an EST masquerading as a gene (2 exon case) would be
> to:
> 
> table feature:
> 
> feature_id	name	residues	seqlen	cvterm
> 1		ESTxxx	\N		600	10	'gene'
> 2		-RA	GATC...GGA	600	11	'transcript'
> 3		:1	\N		100	13	'exon'
> 4		:2	\N		90	13	'exon'
> 
> table featureloc:
> 
> featureloc_id	f_id	srcf_id		nbeg	nend	locgroup	rank
> xxx		1	scaffold_id	1000	1600	0		0
> xxx		2	scaffold_id	1000	1600	0		0
> xxx		3	scaffold_id	1000	1100	0		0
> xxx		4	scaffold_id	1510	1600	0		0
> 
> 
> 
> On Tue, 2003年03月25日 at 16:00, Scott Cain wrote:
> > I think the problem is finding relationships between HSPs. The problem
> > is that there are no featureloc entries for alignment hits (the parent
> > of HSPs), so they never show up in a ROI query. That is a problem
> > because GBrowse works by getting parent objects in a range, then finding
> > out what children it has, so it can use that information to connect
> > related HSPs. (This is the way it works for gene->transcript->exon.) 
> > Since HSPs don't work that way, I have to write special case code to
> > first find the HSPs, then find the parent hits, then associate the
> > parents with the children. It is a fairly straight forward problem, I
> > just haven't take the time to write special case code (I'm hoping the
> > need will go away :-)
> > 
> > Scott
> > 
> > On Tue, 2003年03月25日 at 15:41, SLe...@ao... wrote:
> > > In a message dated 3/25/2003 3:21:17 PM Eastern Standard Time,
> > > ca...@cs... writes:
> > > 
> > > > Yep, that would work. The problem with HSP data is that they have
> > > > more
> > > > complicated relationships than other features (at least the way it
> > > > is
> > > > stored in the gadfly port), I just haven't written the SQL and
> > > > related
> > > > code to take it into account.
> > > > 
> > > > Scott
> > > 
> > > 
> > > Scott, 
> > > 
> > > HSPs are just features to a first approx. There is some
> > > complexity around
> > > the full HSP as something potentially linked both to source and
> > > target, but you are
> > > only Gbrowsing one of these at a time. They have a distinctive type
> > > ("alignment hsp" in the version I have). Can't you just display them
> > > as features of that type?
> > > 
> > > Cheers, -Stan
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D. ca...@cs...
> > GMOD Coordinator (http://www.gmod.org/) 216-392-3087
> > Cold Spring Harbor Laboratory
> > 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. ca...@cs...
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
From: Scott C. <ca...@cs...> - 2003年03月25日 22:01:07
I think perhaps I didn't explain it clearly enough. There is a
parent-child relationship, but the parent doesn't have any coordinates
associated with it, so it can never show up in a ROI query. Now I am
sure there are good reasons for not having coordinates associated with
the parent hit: what if an unmasked repeat was blasted against the
genome? The coordinates would range from one end of genome to the
other, and then it would show up in EVERY ROI query, which is just as
useless.
On Tue, 2003年03月25日 at 16:53, SLe...@ao... wrote:
> In a message dated 3/25/2003 4:04:01 PM Eastern Standard Time,
> ca...@cs... writes:
> 
> > It is a fairly straight forward problem, I
> > just haven't take the time to write special case code (I'm hoping
> > the
> > need will go away :-)
> 
> 
> It should -- there should be a relationship between the HSPs and
> parent
> hits; this is probably a bug in Colin's migration application. 
> 
> 
> Cheers, -Stan
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. ca...@cs...
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
From: Charles H. <ch...@du...> - 2003年03月25日 21:59:41
So, the way to load an EST masquerading as a gene (2 exon case) would be
to:
table feature:
feature_id	name	residues	seqlen	cvterm
1		ESTxxx	\N		600	10	'gene'
2		-RA	GATC...GGA	600	11	'transcript'
3		:1	\N		100	13	'exon'
4		:2	\N		90	13	'exon'
table featureloc:
featureloc_id	f_id	srcf_id		nbeg	nend	locgroup	rank
xxx		1	scaffold_id	1000	1600	0		0
xxx		2	scaffold_id	1000	1600	0		0
xxx		3	scaffold_id	1000	1100	0		0
xxx		4	scaffold_id	1510	1600	0		0
On Tue, 2003年03月25日 at 16:00, Scott Cain wrote:
> I think the problem is finding relationships between HSPs. The problem
> is that there are no featureloc entries for alignment hits (the parent
> of HSPs), so they never show up in a ROI query. That is a problem
> because GBrowse works by getting parent objects in a range, then finding
> out what children it has, so it can use that information to connect
> related HSPs. (This is the way it works for gene->transcript->exon.) 
> Since HSPs don't work that way, I have to write special case code to
> first find the HSPs, then find the parent hits, then associate the
> parents with the children. It is a fairly straight forward problem, I
> just haven't take the time to write special case code (I'm hoping the
> need will go away :-)
> 
> Scott
> 
> On Tue, 2003年03月25日 at 15:41, SLe...@ao... wrote:
> > In a message dated 3/25/2003 3:21:17 PM Eastern Standard Time,
> > ca...@cs... writes:
> > 
> > > Yep, that would work. The problem with HSP data is that they have
> > > more
> > > complicated relationships than other features (at least the way it
> > > is
> > > stored in the gadfly port), I just haven't written the SQL and
> > > related
> > > code to take it into account.
> > > 
> > > Scott
> > 
> > 
> > Scott, 
> > 
> > HSPs are just features to a first approx. There is some
> > complexity around
> > the full HSP as something potentially linked both to source and
> > target, but you are
> > only Gbrowsing one of these at a time. They have a distinctive type
> > ("alignment hsp" in the version I have). Can't you just display them
> > as features of that type?
> > 
> > Cheers, -Stan
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. ca...@cs...
> GMOD Coordinator (http://www.gmod.org/) 216-392-3087
> Cold Spring Harbor Laboratory
> 
From: <SLe...@ao...> - 2003年03月25日 21:54:54
In a message dated 3/25/2003 4:04:01 PM Eastern Standard Time, ca...@cs... 
writes:
> It is a fairly straight forward problem, I
> just haven't take the time to write special case code (I'm hoping the
> need will go away :-)
> 
It should -- there should be a relationship between the HSPs and parent
hits; this is probably a bug in Colin's migration application. 
Cheers, -Stan
From: Lincoln S. <ls...@cs...> - 2003年03月25日 21:10:37
Scott implemented the overlap stored procedure on the R-tree index and fo=
und=20
that it actually slowed down the query. This was on chado/gadfly. I've=20
asked him to try to reproduce this on the benchmark database.
Lincoln
On Tuesday 25 March 2003 02:03 pm, SLe...@ao... wrote:
> All,
>
> After the various expressions of enthusiasm and intent, did anyo=
ne
> actually take this on?
>
> Cheers, -Stan
>
> In a message dated 3/13/2003 9:26:34 AM Eastern Standard Time, SLetovsk=
y
>
> writes:
> > Subj: Re: [Gmod-schema] Benchmarking range query
> > Date: 3/13/2003 9:26:34 AM Eastern Standard Time
> > From: <A HREF=3D"mailto:SLetovsky">SLetovsky</A>
> > To: <A
> > HREF=3D"mailto:gmo...@li...">gmod-schema@lists.s=
ourcef
> >orge.net</A> CC: <A HREF=3D"mailto:hl...@gn...">hl...@gn...</A>, <=
A
> > HREF=3D"mailto:ls...@cs...">ls...@cs...</A>
> >
> >
> >
> > All,
> >
> > I am wondering how we get this thread to converge once and for =
all.
> > I think
> > what is needed is:
> >
> > *a "reference implementation" of an acceptably efficient indexing sch=
eme
> > for positional queries,
> > checked into sourceforge
> > *ideally, a set of PostGreSQL table (from-clause-able) functions that
> > encapsulate those, with an API
> > along the following lines:
> > * contained_in(contig_id, start, end) -- most common
> > * overlaps(contig_id, start, end) -- not uncommon, especially w=
hen
> > panning
> > * contains(contig_id, start, end) -- not so common, but includ=
ed
> > for completeness
> > These should return rows from featureloc, I would think; unless ther=
e is
> > some performance advantage
> > to doing the join to feature in the function.
> >
> > I expect that performance tuning will end up needing to be platform
> > specific, so the cross-platform
> > criterion should be held lightly -- i.e., the implementation should b=
e
> > portable, but we can't
> > expect to verify or tune performance until someone actually does the
> > port. I wouldn't let the
> > presence or absence of R-trees in other platforms be an issue -- othe=
r
> > types of indexing
> > will deliver some performance, and the implementation of the API coul=
d be
> > rewritten for different
> > platforms if needed. The nice thing about using such functions is tha=
t if
> > applications never
> > write explicit geometric queries, but only use the API, then all the
> > performance tuning
> > would hopefully be encapsulated in the function definitions, and the
> > applications would
> > run without change on platform-tuned functions.
> >
> > The implementation would include the index definitions,
> > and perhaps a little documentation on appropriate use of analyze,
> > enable_seqscan, etc.
> > to keep things humming. In addition I would expect the design to
> > prescribe a solution
> > to the min/max vs nbeg/nend issue (or declare its indifference). I wo=
uld
> > suggest that
> > anyone willing to do the above work has earned the right to define
> > featureloc as needed.
> >
> > What I am wondering is whether anyone is stepping up to deliver this =
or
> > are we hoping
> > that it gets done by accretion? It would be great if some person or
> > persons would say "we own this,
> > and we will deliver the implementation by <fill in date here>"... .
> > Probably this is already
> > more or less happening; but it would raise my comfort level if
> > responsibility was explicit.
> >
> > Cheers, -Stan
--=20
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Lincoln D. Stein Cold Spring Harbor Laboratory
ls...@cs...=09=09=09 Cold Spring Harbor, NY
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
From: Scott C. <ca...@cs...> - 2003年03月25日 21:00:58
I think the problem is finding relationships between HSPs. The problem
is that there are no featureloc entries for alignment hits (the parent
of HSPs), so they never show up in a ROI query. That is a problem
because GBrowse works by getting parent objects in a range, then finding
out what children it has, so it can use that information to connect
related HSPs. (This is the way it works for gene->transcript->exon.) 
Since HSPs don't work that way, I have to write special case code to
first find the HSPs, then find the parent hits, then associate the
parents with the children. It is a fairly straight forward problem, I
just haven't take the time to write special case code (I'm hoping the
need will go away :-)
Scott
On Tue, 2003年03月25日 at 15:41, SLe...@ao... wrote:
> In a message dated 3/25/2003 3:21:17 PM Eastern Standard Time,
> ca...@cs... writes:
> 
> > Yep, that would work. The problem with HSP data is that they have
> > more
> > complicated relationships than other features (at least the way it
> > is
> > stored in the gadfly port), I just haven't written the SQL and
> > related
> > code to take it into account.
> > 
> > Scott
> 
> 
> Scott, 
> 
> HSPs are just features to a first approx. There is some
> complexity around
> the full HSP as something potentially linked both to source and
> target, but you are
> only Gbrowsing one of these at a time. They have a distinctive type
> ("alignment hsp" in the version I have). Can't you just display them
> as features of that type?
> 
> Cheers, -Stan
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. ca...@cs...
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
From: <SLe...@ao...> - 2003年03月25日 20:42:31
In a message dated 3/25/2003 3:21:17 PM Eastern Standard Time, ca...@cs... 
writes:
> Yep, that would work. The problem with HSP data is that they have more
> complicated relationships than other features (at least the way it is
> stored in the gadfly port), I just haven't written the SQL and related
> code to take it into account.
> 
> Scott
> 
Scott, 
 HSPs are just features to a first approx. There is some complexity 
around
the full HSP as something potentially linked both to source and target, but 
you are
only Gbrowsing one of these at a time. They have a distinctive type 
("alignment hsp" in the version I have). Can't you just display them as 
features of that type?
Cheers, -Stan
From: Scott C. <ca...@cs...> - 2003年03月25日 20:18:12
Yep, that would work. The problem with HSP data is that they have more
complicated relationships than other features (at least the way it is
stored in the gadfly port), I just haven't written the SQL and related
code to take it into account.
Scott
On Tue, 2003年03月25日 at 15:10, Charles Hauser wrote:
> Hi,
> 
> I'm populate tables feature and featureloc with the hsps/exons predicted
> by exonerate (comparing EST::Genome), and plan to use GBrowse to display
> the data. 
> 
> The 5' and 3' predicted exons will contain UTR sequence as exonerate
> does not use start/stop codons to begin/end an exon.
> 
> What cvterm to use for these? 
> 	- alignment hsp 
> 
> Scott, as I recall you said that the only thing the chado-pg-gbrowse
> adaptor does not do is display HSP/alignment type data at this time -
> true?
> 
> I could just call them 'genes' for now and change the cvterm later?
> 
> Charles 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. ca...@cs...
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
From: Charles H. <ch...@du...> - 2003年03月25日 20:10:46
Hi,
I'm populate tables feature and featureloc with the hsps/exons predicted
by exonerate (comparing EST::Genome), and plan to use GBrowse to display
the data. 
The 5' and 3' predicted exons will contain UTR sequence as exonerate
does not use start/stop codons to begin/end an exon.
What cvterm to use for these? 
	- alignment hsp 
Scott, as I recall you said that the only thing the chado-pg-gbrowse
adaptor does not do is display HSP/alignment type data at this time -
true?
I could just call them 'genes' for now and change the cvterm later?
Charles 
From: Scott C. <ca...@cs...> - 2003年03月25日 19:45:09
I am going to try very hard to do it this week. I'll let you know...
Scott
On Tue, 2003年03月25日 at 14:03, SLe...@ao... wrote:
> All,
> 
> After the various expressions of enthusiasm and intent, did
> anyone actually take this on?
> 
> Cheers, -Stan
> 
> In a message dated 3/13/2003 9:26:34 AM Eastern Standard Time,
> SLetovsky writes:
> 
> > Subj: Re: [Gmod-schema] Benchmarking range query 
> > Date: 3/13/2003 9:26:34 AM Eastern Standard Time
> > From: SLetovsky
> > To: gmo...@li...
> > CC: hl...@gn..., ls...@cs...
> > 
> > 
> > 
> > All,
> > 
> > I am wondering how we get this thread to converge once and for
> > all. I think
> > what is needed is:
> > 
> > *a "reference implementation" of an acceptably efficient indexing
> > scheme for positional queries,
> > checked into sourceforge
> > *ideally, a set of PostGreSQL table (from-clause-able) functions
> > that encapsulate those, with an API
> > along the following lines:
> > * contained_in(contig_id, start, end) -- most common
> > * overlaps(contig_id, start, end) -- not uncommon, especially
> > when panning
> > * contains(contig_id, start, end) -- not so common, but
> > included for completeness
> > These should return rows from featureloc, I would think; unless
> > there is some performance advantage
> > to doing the join to feature in the function.
> > 
> > I expect that performance tuning will end up needing to be platform
> > specific, so the cross-platform
> > criterion should be held lightly -- i.e., the implementation should
> > be portable, but we can't
> > expect to verify or tune performance until someone actually does the
> > port. I wouldn't let the
> > presence or absence of R-trees in other platforms be an issue --
> > other types of indexing
> > will deliver some performance, and the implementation of the API
> > could be rewritten for different
> > platforms if needed. The nice thing about using such functions is
> > that if applications never
> > write explicit geometric queries, but only use the API, then all the
> > performance tuning
> > would hopefully be encapsulated in the function definitions, and the
> > applications would
> > run without change on platform-tuned functions.
> > 
> > The implementation would include the index definitions,
> > and perhaps a little documentation on appropriate use of analyze,
> > enable_seqscan, etc.
> > to keep things humming. In addition I would expect the design to
> > prescribe a solution
> > to the min/max vs nbeg/nend issue (or declare its indifference). I
> > would suggest that
> > anyone willing to do the above work has earned the right to define
> > featureloc as needed.
> > 
> > What I am wondering is whether anyone is stepping up to deliver this
> > or are we hoping
> > that it gets done by accretion? It would be great if some person or
> > persons would say "we own this,
> > and we will deliver the implementation by <fill in date here>"... .
> > Probably this is already
> > more or less happening; but it would raise my comfort level if
> > responsibility was explicit.
> > 
> > Cheers, -Stan
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. ca...@cs...
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
From: Hilmar L. <hl...@gn...> - 2003年03月25日 19:21:51
My statement of willingness stays the same, but that about time=20
constraints does too. I have to put out a couple of fires which will=20
take me until the end of this week if not middle of next week. If=20
anyone else wants to go ahead in the meantime, please do so...
	-hilmar
On Tuesday, March 25, 2003, at 11:03 AM, SLe...@ao... wrote:
> All,
>
> =A0=A0=A0=A0=A0=A0 After the various expressions of enthusiasm and =
intent, did=20
> anyone actually take this on?
>
> Cheers, -Stan
>
> In a message dated 3/13/2003 9:26:34 AM Eastern Standard Time,=20
> SLetovsky writes:
>
> Subj: Re: [Gmod-schema] Benchmarking range query
> Date: 3/13/2003 9:26:34 AM Eastern Standard Time
> From: SLetovsky
> To: gmo...@li...
> CC: hl...@gn..., ls...@cs...
>
>
>
> All,
>
> =A0=A0=A0=A0=A0 I am wondering how we get this thread to converge once =
and for=20
> all. I think
> what is needed is:
>
> *a "reference implementation" of an acceptably efficient indexing=20
> scheme for positional queries,
> checked into sourceforge
> *ideally, a set of PostGreSQL table (from-clause-able) functions that=20=
> encapsulate those, with an API
> along the following lines:
> =A0=A0=A0=A0=A0 * contained_in(contig_id, start, end) -- most common
> =A0=A0=A0=A0=A0 * overlaps(contig_id, start, end) -- not uncommon, =
especially=20
> when panning
> =A0=A0=A0=A0=A0 * contains(contig_id, start, end)=A0 -- not so common, =
but=20
> included for completeness
> These should return rows from featureloc,=A0 I would think; unless =
there=20
> is some performance advantage
> to doing the join to feature in the function.
>
> I expect that performance tuning will end up needing to be platform=20
> specific, so the cross-platform
> criterion should be held lightly -- i.e., the implementation should be=20=
> portable, but we can't
> expect to verify or tune performance until someone actually does the=20=
> port. I wouldn't let the
> presence or absence of R-trees in other platforms be an issue -- other=20=
> types of indexing
> will deliver some performance, and the implementation of the API could=20=
> be rewritten for different
> platforms if needed. The nice thing about using such functions is that=20=
> if applications never
> write explicit geometric queries, but only use the API, then all the=20=
> performance tuning
> would hopefully be encapsulated in the function definitions, and the=20=
> applications would
> run without change on platform-tuned functions.
>
> The implementation would include the index definitions,
> and perhaps a little documentation on appropriate use of analyze,=20
> enable_seqscan, etc.
> to keep things humming. In addition I would expect the design to=20
> prescribe a solution
> to the min/max vs nbeg/nend issue (or declare its indifference). I=20
> would suggest that
> anyone willing to do the above work has earned the right to define=20
> featureloc as needed.
>
> What I am wondering is whether anyone is stepping up to deliver this=20=
> or are we hoping
> that it gets done by accretion? It would be great if some person or=20
> persons would say "we own this,
> and we will deliver the implementation by <fill in date here>"... .=20
> Probably this is already
> more or less happening; but it would raise my comfort level if=20
> responsibility was explicit.
>
> Cheers, -Stan
>
>
>
>
--=20
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
From: <SLe...@ao...> - 2003年03月25日 19:03:20
All,
 After the various expressions of enthusiasm and intent, did anyone 
actually take this on?
Cheers, -Stan
In a message dated 3/13/2003 9:26:34 AM Eastern Standard Time, SLetovsky 
writes:
> Subj: Re: [Gmod-schema] Benchmarking range query 
> Date: 3/13/2003 9:26:34 AM Eastern Standard Time
> From: <A HREF="mailto:SLetovsky">SLetovsky</A>
> To: <A HREF="mailto:gmo...@li...">gmo...@li...</A>
> CC: <A HREF="mailto:hl...@gn...">hl...@gn...</A>, <A HREF="mailto:ls...@cs...">ls...@cs...</A>
> 
> 
> 
> All,
> 
> I am wondering how we get this thread to converge once and for all. I 
> think
> what is needed is:
> 
> *a "reference implementation" of an acceptably efficient indexing scheme 
> for positional queries,
> checked into sourceforge
> *ideally, a set of PostGreSQL table (from-clause-able) functions that 
> encapsulate those, with an API
> along the following lines:
> * contained_in(contig_id, start, end) -- most common
> * overlaps(contig_id, start, end) -- not uncommon, especially when 
> panning
> * contains(contig_id, start, end) -- not so common, but included for 
> completeness
> These should return rows from featureloc, I would think; unless there is 
> some performance advantage
> to doing the join to feature in the function.
> 
> I expect that performance tuning will end up needing to be platform 
> specific, so the cross-platform
> criterion should be held lightly -- i.e., the implementation should be 
> portable, but we can't
> expect to verify or tune performance until someone actually does the port. 
> I wouldn't let the
> presence or absence of R-trees in other platforms be an issue -- other 
> types of indexing
> will deliver some performance, and the implementation of the API could be 
> rewritten for different
> platforms if needed. The nice thing about using such functions is that if 
> applications never
> write explicit geometric queries, but only use the API, then all the 
> performance tuning
> would hopefully be encapsulated in the function definitions, and the 
> applications would
> run without change on platform-tuned functions.
> 
> The implementation would include the index definitions,
> and perhaps a little documentation on appropriate use of analyze, 
> enable_seqscan, etc.
> to keep things humming. In addition I would expect the design to prescribe 
> a solution
> to the min/max vs nbeg/nend issue (or declare its indifference). I would 
> suggest that
> anyone willing to do the above work has earned the right to define 
> featureloc as needed.
> 
> What I am wondering is whether anyone is stepping up to deliver this or are 
> we hoping
> that it gets done by accretion? It would be great if some person or persons 
> would say "we own this,
> and we will deliver the implementation by <fill in date here>"... . 
> Probably this is already
> more or less happening; but it would raise my comfort level if 
> responsibility was explicit.
> 
> Cheers, -Stan
From: David E. <em...@mo...> - 2003年03月20日 15:16:16
I've added feature_lc_name on cvs.
-Dave
From: SLe...@ao...
>> Subject: Re: [Gmod-schema] Re: slow query
>> To: ca...@cs...
>> CC: gmo...@li...
>> 
>> Scott,
>> 
>> Wicked pissah. Will do...
>> 
>> Cheers, -Stan
>> 
>> In a message dated 3/19/2003 1:33:07 PM Eastern Standard Time, ca...@cs... 
>> writes:
>> 
>> > gadfly=> create index feature_lc_name on feature (lower(name));
>> > 
>> > and
>> > 
>> > select name,feature_id,seqlen,type_id from feature
>> > where lower(name) = '3r.3';
>> > 
>> > and it is wicked fast (that's Boston-speak, right?). I would suggest
>> > adding this index to the chado schema.
>> > 
>> 
>> 

Showing results of 117

1 2 3 .. 5 > >> (Page 1 of 5)
Thanks for helping keep SourceForge clean.
X





Briefly describe the problem (required):
Upload screenshot of ad (required):
Select a file, or drag & drop file here.
Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL:

AltStyle によって変換されたページ (->オリジナル) /