Showing posts with label publishing. Show all posts
Showing posts with label publishing. Show all posts

Thursday, November 20, 2014

Open access, ACM and the Gates Foundation.

Matt Cutts, in an article on the new Gates Foundation open access policy (ht +Fernando Pereira) says that
while the ACM continues to drag its heels, the Gates Foundation has made a big move to encourage Open Access...
Which got me thinking. Why can't the ACM use this policy as a guidelines to encourage open access ? Specifically,

  • Announce that from now on, it will subsidize/support the open access fees paid by ACM members
  • (partially) eat the cost of publication in ACM publications (journals/conferences/etc)
  • Use the resulting clout to negotiate cheaper open access rates with various publishers in exchange for supporting open access fees paid to those journals.
Of course this would put the membership side of ACM at odds with its publication side, which maybe points to another problem with ACM having these dual roles.

Thursday, November 03, 2011

Life in a crowd-sourced research world

(Sung to the tune of "War", and with a Jackie Chan accent for bonus points)

Jo-ur-nals !
What are they good for !
Absolutely nothing !

There are popular tropes in current discussions on crowd-sourcing research. There's the "scientists as Mean Girls" view of the current state of affairs. There's the utopian "Let a thousand papers bloom in the open research garden". There's the anti-capitalist "Down with evil money-grubbing publishers", and there's of course the always popular "Everything tastes better with crowd-sourced reputation points and achievement badges".

But have we really thought through the implications of doing away with the current frameworks for "dissemination, verification and attention management" ?

Here's a tl;dr a la Cosma Shalizi:

A more open research environment, where all work is published before review, and anyone is free to comment on any work in public without repercussions, is both valuable as well as more chaotic and unpleasant than we might be ready for.


Consider a pure "publish-then-filter" world, in which you dumped your paper in a public repository that had commenting, reviewing, reputation features, achievement badges and whatever other technological goodies you wanted to throw in.

You'd be in a world not unlike the world that writers and musicians live in today. Since big music/book publishers (read "journals") take a big cut of the royalty revenues ("journal subscriptions") in exchange for promotion/marketing (read "stamps of authenticity"), many authors and musicians have developed smaller but successful brands by going on the road themselves, doing online promotions, cultivating their fan base with special material, downloads, T-shirts, event tickets and what not, and relying on underground word-of-mouth to establish a presence.
Are you ready to do the same ?
It's naive to think that merely putting papers on a repository and waiting for attention to appear will actually work to disseminate your work. Attention is probably the most valuable resource available to us in this connected era, and the one most fiercely fought over by everyone. No one is going to even be able to pay attention to your work unless you promote it extensively, OR unless there are external ways of signalling value.

If you think that reputation mechanisms will help, I will merely ask you to look at the attention garnered by the latest Ke$ha single compared to the attention given to <insert name of your favorite underground-not-selling-out-obscure-indie-band-that-will-set-the-world-on-fire here >.

Secondly, I think as researchers, we would cringe at the kind of explicit promotion that authors/musicians have to indulge in. Would you really want to sell tickets for the "1.46 approximation to graphic TSP paper tour?". How would you afford it ?

There's a third aspect to living in a crowd-sourced research world: a loss of mental space. While it should be clear to anyone who follows my blog/tweets/G+/comments on cstheory that I enjoy the modern networked world, it's also clear to me that actual research requires some distance.

In Anathem, Neal Stephenson describes a monastery of mathematics, where monks do their own research, and at regular intervals (1/10/100/1000 years) open their doors to the "seculars" to reveal their discoveries to the outside world.

Even with collaborations, skype, shared documents and github, you still need time (and space) to think. And in a completely open research environment where everything you post can be commented on by anyone, I can assure you that you'll spend most of your time dealing with comment threads and slashdotting/reditting/HNing (if you're lucky). Are you ready to deploy a 24 hour rapid-response team to deal with the flaming your papers will get ?

Let me be very clear about something. I think there are many academic institutions (journals and conferences especially) that are in desperate need of overhauls and the Internet makes much of this possible. I think it's possible (but I'm less convinced) that we are on the cusp of a new paradigm for doing research, and debates like the ones we are having are extremely important to shape this new paradigm if it comes into being. In that context, I think that what Timothy Gowers (here and now here) and Noam Nisan (here and here) are trying to do is very constructive: not just complain about the current state of affairs or defend the status quo, but try to identify the key things that are good AND bad about our current system AND find a path to a desired destination.

But human nature doesn't change that quickly. New ways of disseminating, valuing and verifying research will definitely change what's valued and what's not, and can help open up the research enterprise to those who feel the current system isn't working (i.e most of us). But when you replace one evaluation system by another, don't be too sure that the new system is fairer - it might merely change what gets valued (i.e the peaks might change but not the distribution itself)

Friday, March 25, 2011

Permanent record of work

In our hurry to tar and feather the ACM, the IEEE and other LargePubs, I'm not sure we are quite ready to face the universe that will result, and the amount of work we'll need to do on our own.

Consider:
  • I was recently asked if I had presentation slides for my paper on universal MDS. I managed to ferret them out from my collection of talks, and sent it over. What I should have also done was add them to the paper page as well, but I've been busy and haven't got around to it (I have other talks that I need to upload as well)
  • This CS professor asks on reddit: "Where should I host code for the paper I just wrote ?". Good answers are provided, with github.com being the most popular choice.
Researchers are peripatetic: join company, leave company, join university, leave university for other university, leave university for company, rinse and repeat. The obvious way to keep a single fixed repository of your work is to maintain your own website with your own domain, and many researchers now do that.

But it is a pain in the neck. Granted, it's a necessary pain in the neck, but there was something to be said for being able to write a paper, ship it off somewhere, and have it be maintained by someone else.

The arxiv is doing a pretty good job of this for papers, as long as it can continue to receive funding, but what about talks, code fragments, and other related material that goes into research ?

Thursday, February 03, 2011

On the future of conferences

There's a superb article out today in the CACM by Jonathan Grudin (non-paywall version here, thanks to Alexandre Passos) on the future (and past) of our CS conference-driven model. I'm biased in favor because it takes a generally negative view of the current model, but that's not why I like it.

I like it because he systematically tracks back to the origins of the conference culture, discusses why it's so prevalent, and avoids the glib "let's do everything in journals" argument that even I've been guilty of in the past.

Some of the tl;dr points (but do read the whole thing):
  • Technology and a Professional Organization Drove the Shift to Conference Publication: not speed of development of the field, as is commonly stated. It was also a more American-centered phenomenon.
  • Formal archiving of conference proceedings made creating journal versions more difficult (because of the 30% rule and so on)
  • "When conferences became archival, it was natural to focus on quality and selectivity." so the focus of conferences became more gatekeeping and less community.
  • This in turn has an impact on community: when your papers are rejected, you don't go to the conference. For more on the impact of rejection, see Claire Mathieu's post.
  • A further consequence is that computer scientists do not develop the skills needed to navigate large, community-building conferences.This is so true ! As someone who frequents SODA, SoCG and occasionally FOCS/STOC, I often find myself gasping for breath at VLDB (600+ participants) or KDD (800+). It's overwhelming to keep track of everything. And it makes it harder for me to attend such conferences regularly, even though it's important for me to go.
His analysis of where we should be heading is also sound. Much like the Blue-ray-HD-DVD wars of a few years ago, the whole journal vs conference argument seems like an argument between two dinosaurs as the meteor arrives. We have many different ways of disseminating, commenting on, and reviewing works of research now, and it's time to think beyond the current models. Some of his ideas:
  • Accept many more papers at conferences, but designate some to be the 'best in show'
  • Center attention on the work, rather than the conference, by keeping wikipedia-like entries for pieces of work as they evolve. This is similar to Dan Wallach's idea.
p.s for those of you who want to complain about ACM's closed policy on the CACM, and how you'll never read an article in the CACM because of the paywall, consider your opinion expressed.

Tuesday, January 11, 2011

Are open tech report sites taking off in CS ?

For a while now, the math and physics have amused themselves by wondering why the CS community is slow to adopt the arxiv. In the past year or so, I've noticed an uptick in postings on the arxiv (especially around conference deadlines).

Prompted by David Eppstein's review of 2010 in cs.DS, I decided to get some stats on publication counts at the arxiv and ECCC for the past four years. My method:
  1. go to arxiv.org/list/FIELD/YY (thanks, David)
  2. Read off the total number of papers listed
For the ECCC, papers are numbered by YEAR-COUNT, so looking at the last paper published each year sufficed to get the count.

I did this for cs.{CC, DS, CG, LG} (LG is machine learning/learning theory)

Caveat: I ignored cross submissions, so there's some overcounting. I'm hoping that at least to determine trends this is not a major issue.

Here are the results:

Overall, it's clear that arxiv submissions in theory CS are climbing (and rapidly in the case of cs.DS), which I'm quite pleased to see. The growth rates themselves seem quite steady, so it's not clear to me whether the fraction of papers going on the arxiv is itself increasing (there's good evidence that the total number of papers people are writing in general is increasing).

Friday, June 25, 2010

Bad Research As Spam

Jon Katz and Matt Welsh have both written recently about the problems of dealing with crap papers, mainly the pain of having to review them. In an unrelated event, I got into a discussion with some colleagues about the problems of "crap research', and ended up formulating a theory: viz,
bad research is like spam email
in the sense that

  1. There seems to be no way to stop it, and many economic incentives to continue it
  2. You can't stop it by diktat
  3. The only way to deal with it appears to be effective filtering. Like spam, bad research has less of an effect when it gains no attention.
There are other similarities:
  1. we can block spam by filtering certain domains. We also tend to ignore certain kinds of conferences
  2. we can block spam by blocking certain email addresses. We also might ignore certain researchers, or at least downweight their work after a series of bad experiences.
  3. More explicit spam blocking policies create a false-negative problem. False-negatives are also a big problem in research.
But this analogy also suggests that we shouldn't be designing strategies to eliminate bad research. We should be designing better methods to focus attention on good research, via better filtering and highlighting mechanisms (social, authoritative or otherwise).

Personally, I think bad research is less of a problem than lots of cargo-cult research, where it looks a lot like good research is being done, but when you look closely, you realize that nothing of value has been added to the universe. Sadly, a lot of funded research is also like this.

PS: False negatives are a highly underrated problem in research. I think we vastly overestimate our ability to judge what kinds of research will have lasting value and what potential impact a paper can have. So it's important to err on the side of being more generous to papers, rather than less.

Thursday, June 17, 2010

Rebooting how we publish in CS.

Dan Wallach has a thought-provoking proposal on how to reboot the CS publication process from the ground up. Read the entire proposal here.

Here's an edited version of a response I sent to him (short version: I like it !)

I think the time is ripe for this: it seems that people are getting more and more used to using the arxiv/iacr/eccc for tech reports and DBLP as a de facto list of papers, and even regularly subscribing to arxiv rss feeds to see what's new. bibref management systems like Mendeley/citeulike would also really benefit from this.


While (like others) I'm concerned about facilitating ranking schemes too much (I personally think the h-index is an abomination, but that's a different discussion), I think that even if the only outcome of this was to have a centralized single repository for CS publications, that in itself would be a major benefit.

I'm less sure about attention/reputation mechanisms though. It's clear that one of the challenges for researchers today is the 'eyeballs problem': how to get attention to your work amidst the sea of publications. While one might argue that Google and page-rank have done a good job of this, i think that over time it's become more and more top heavy, with a few locations acquiring sticky reputation and sucking in attention, and while this might be ok for general news, it's not so for research, where more often than not, good ideas can come from less "well known" sources.

I don't think CSPub causes any additional problems in this regard - but it would seem like much more thought is needed to design *transparent* ranking schemes. While google can do what they want with their ranking scheme, and keep it as a trade secret, a public service such as CSPub should try to keep ranking methods as transparent as possible. (hack-proof ranking methods ? I know there's research on this !)

Friday, January 08, 2010

Guest Post: Question on posting to the arxiv

ed. note: this post is by Jeff Phillips. For another recent post on arxiv publishing issues, see Hal Daume on the arxiv, NLP and ML.


It seems that over the last few months, the number of papers posted to the arXiv has been noticeably increasing, especially in the categories of Computational Geometry and Data Structures and Algorithms.

I have posted several (but not all) of my papers on the arXiv. I still do not have a consistent set of rule under which I post the papers. Here are a couple circumstances under which I have posted paper to the arXiv.

A: Along with Proceedings Version:
When conference version does not have space for full proofs, so in conjunction with proceedings version, post full version to arXiv. This is a placeholder for the full version until the journal version appears. Additionally, the arXiv paper can be updated when the final journal version appears if it has changed.

Sometimes, I link to the arXiv version in the proceedings version. This makes it easy for a reader of the proceedings to find the full proofs.

If more conferences move to the SODA model where proceedings versions can be much longer (~20 pages), then this situation may not often be necessary.


B: Along with Submitted Version:
When you want to advertise a piece of work, but it has only been submitted, post a version to arXiv. This is useful if you are giving talks on the work, and want a documented time stamp so you can't get scooped, or say, you are applying for jobs and want to make your work very available and public.

This is closer to the math philosophy where many (most?) people submit a version of a paper to arXiv as soon as they submit it to a journal. I think it would be great if CS adapted this policy, as it would be a lot easier to track results. I have a friend who as a math graduate student would start every day by perusing the dozen or so new arXiv post in his area and choosing one paper to read. He told me that almost every paper he read as a grad student was on the arXiv. Wouldn't a world like that be extremely convenient?

However, I have had an issue following this rule. Last year I submitted a paper to a conference and concurrently, submitted a longer version to the arXiv. The paper was unfortunately, not accepted to the conference. My coauthor and I extended the results to the point where it made sense to split the paper. Half was then submitted and accepted to another conference, and full proofs were made available through a tech report at my coauthor's institution, as he was required to do. The second half which has also been extended is now under submission.

I might like to post the (full) second half to the arXiv, but do not want to double the part from the previous post. I am not sure if it make sense to merge the papers at this point either. And I would also like to note on the arXiv page that that version has been extended and part appears as a tech report.

What is the proper arXiv etiquette for this situation?

Wednesday, July 15, 2009

Consistent BibTeX formatting

I try not to write BibTeX by hand any more: too easy to introduce errors. So I usually use either DBLP or the ACM digital library to get BibTeX for papers. Sometimes the journal has BibTeX, or some format that can be converted. As an aside, IEEE is extremely lame: you have to login to their digital library even to get a citation !

For the most part, I don't need to go beyond ACM or DBLP, which is great. But here's the problem: their formats are different ! I needed the BibTeX for a recent paper of mine, and found it on both sites. Here's what ACM gave me:
@inproceedings{1516372,
author = {Ahmadi, Babak and Hadjieleftheriou, Marios and Seidl, Thomas and Srivastava, Divesh and Venkatasubramanian, Suresh},
title = {Type-based categorization of relational attributes},
booktitle = {EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology},
year = {2009},
isbn = {978-1-60558-422-5},
pages = {84--95},
location = {Saint Petersburg, Russia},
doi = {http://doi.acm.org/10.1145/1516360.1516372},
publisher = {ACM},
address = {New York, NY, USA},
}

and here's what DBLP gave me:
@inproceedings{DBLP:conf/edbt/AhmadiHSSV09,
author = {Babak Ahmadi and
Marios Hadjieleftheriou and
Thomas Seidl and
Divesh Srivastava and
Suresh Venkatasubramanian},
title = {Type-based categorization of relational attributes},
booktitle = {EDBT},
year = {2009},
pages = {84-95},
ee = {http://doi.acm.org/10.1145/1516360.1516372},
crossref = {DBLP:conf/edbt/2009},
bibsource = {DBLP, http://dblp.uni-trier.de}
}

@proceedings{DBLP:conf/edbt/2009,
editor = {Martin L. Kersten and
Boris Novikov and
Jens Teubner and
Vladimir Polutin and
Stefan Manegold},
title = {EDBT 2009, 12th International Conference on Extending Database
Technology, Saint Petersburg, Russia, March 24-26, 2009,
Proceedings},
booktitle = {EDBT},
publisher = {ACM},
series = {ACM International Conference Proceeding Series},
volume = {360},
year = {2009},
isbn = {978-1-60558-422-5},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
So as you can see, we have a problem. The formats are not consistent, which means that if I need to get some references from DBLP, and others from the ACM, my references file is going to look very irregular.

Other critiques:
  • I have never understood why DBLP splits up the conference and the paper: with BibTeX, if you cite three or more papers that use the same crossref, the crossref is included itself as a reference, which is just strange.
  • Unless you use double curly braces, capitalizations inside a string get removed, which is mucho annoying: It's "Riemannian", not "riemannian".
  • The DBLP name for the conference is too cryptic: who'd even know what EDBT is outside the database community. On the other hand, the ACM citation is clunky, and is a page-length disaster waiting to happen.
Thoughts ?

Monday, May 11, 2009

Physicists understand the web better than us, part 28596

A new physics review site. via the baconmeister:
Physicists are drowning in a flood of research papers in their own fields and coping with an even larger deluge in other areas of physics. The Physical Review journals alone published over 18,000 papers last year. How can an active researcher stay informed about the most important developments in physics?

Physics highlights exceptional papers from the Physical Review journals. To accomplish this, Physics features expert commentaries written by active researchers who are asked to explain the results to physicists in other subfields. These commissioned articles are edited for clarity and readability across fields and are accompanied by explanatory illustrations.

Each week, editors from each of the Physical Review journals choose papers that merit this treatment, aided by referee comments and internal discussion. We select commentary authors from around the world who are known for their expertise and communication skills and we devote much effort to editing these commentaries for broad accessibility.

Physics features three kinds of articles: Viewpoints are essays of approximately 1000–1500 words that focus on a single Physical Review paper or PRL letter and put this work into broader context. Trends are concise review articles (3000–4000 words in length) that survey a particular area and look for interesting developments in that field. Synopses (200 words) are staff-written distillations of interesting and important papers each week. In addition, we intend to publish selected Letters to the Editor to allow readers a chance to comment on the commentaries and summaries.

Physics provides a much-needed guide to the best in physics, and we welcome your comments (physics@aps.org).

What an excellent idea !

Thursday, April 30, 2009

New open access proceedings archive

Via Luca Aceto, news of a new effort to create an open access electronic proceedings. The idea is that you organize your conference/workshop, and apply for inclusion of your proceedings in the EPTCS: you handle the reviewing, they take the papers and manage the long term archiving (via the arxiv).

I'm imagining this will only be useful for small conferences/workshops, but it's a good way of making sure you don't have a dead-link problem a few years later when the person who hosted the website for the conference leaves their institution and forgets to hand over maintainence to someone else. Also, the arxiv imprimatur will make sure the papers will get more visibility than otherwise.

Sunday, March 22, 2009

(ab)use of wikipedia ?

From IHE:
Recently, a small journal entitled RNA Biology announced that it will now require all authors to also create Wikipedia pages about their discoveries.

Specifically, the journal says:
At least one stub article (essentially an extended abstract) for the paper should be added to either an author's userspace at Wikipedia (preferred route) or added directly to the main Wikipedia space (be sure to add literature references to avoid speedy deletion). This article will be reviewed alongside the manuscript and may require revision before acceptance. Upon acceptance the former articles can easily be exported to the main Wikipedia space. See below for guidelines on how to do this. Existing articles can be updated in accordance with the latest published results.

I'm not a Wikipedia expert (hello 0xDE), but isn't this a violation of the Wikipedia policies on copyrighted material and (non)-publishing of original research ?

Update: As 0xDE points out, Wikipedia is already on top of this.

Tuesday, February 17, 2009

Maybe I need to reconsider this open access biz

I was lukewarm to Joachim's proposal for an open access journal, but I'm changing my mind, after seeing the latest shenanigans being perpetrated by the scientific publishers in collusion with Congress. There's a new bill making its way through the house that would overturn the NIH open access policy (all papers should be placed on a public site within 12 months of publication), as well prohibit the government from obtaining a license to post such works on the internet.

Time to call your congressman, or use the (in)famous Obama outreach program.

Sunday, February 15, 2009

A new open access CG journal

The last few years has seen many attempts by researchers to break free from the shackles of (commercial) journal publishers. There's the whole-sale exodus that produced the ACM Transactions on Algorithms, as well as other journals, and technology aided creation of open-access free journals like the Theory of Computing. There's also a movement to create an open access computational linguistics journal, spearheaded by my colleague Hal Daume here at the U. of Utah.

Joachim Gudmundsson and Pat Morin have been investigating the feasibility of making such a journal for Comp. Geom, motivated by costs, and copyright issues with current journals. Here are posts one, two and three on the topic.

They've worked out most of the logistical issues involved in creating such a journal, and are now trying to reach out to the community to see what kind of interest there is. After all, the main currency of a journal is its reputation, and that comes from community participation (and then perception). So if you have any opinion on the matter, hop over to Dense Outliers, take the poll and post a comment (don't post comments here).

My personal view: I think open access is a great idea in principle, but I'm not seeing a pressing need within CG itself for such a journal at this point in time. (Disclaimer: I'm involved with the International Journal for Comp. Geom and Applications).

Wednesday, January 28, 2009

Levels of hell (heaven?) when writing practically motivated theoretical papers

I was going to post this as a comment on Michael's post, but it started getting longer and longer.

The main question being discussed there is: how do you balance the theory and practical sides of your work effectively from a point of view of getting and keeping faculty jobs ? it's good to remember that not every problem is amenable to a joyous merge of theory and practice: There are levels of hell (heaven?) involved here, that go something like:

* prove fundamental new result, and this leads to breakthrough implementation for a problem people couldn't solve (this happens usually in an area relatively untouched by theory thus far, and can really make you famous) (I'd imagine RSA/Diffie-Helman fall in this category)
* Brand new result: leads to improvements in efficiency AND accuracy of known methods by orders of magnitude
* Brand new result: improves on efficiency OR accuracy of known methods, by orders of magnitude.

Below this line, you're unlikely to get a theory publication out of the contribution:

* Observation that known approaches (or derivations thereof) lead to improvements in efficiency AND/or accuracy by orders of magnitude
* New theory result, some improvements in efficiency AND accuracy

And here's where it gets positively hellish:
* mildly new theory result, reasonable improvement in efficiency and accuracy, but not orders of magnitude, and you go up against an entrenched, highly optimized heuristic (k-means, anyone ?)

At this point you really have to choose which you care about, the problem or the theory, and then branch out accordingly. Papers in this last realm are really difficult to publish anywhere, even when they nontrivially improve the state of the art.

Tuesday, August 05, 2008

Math != calculation, part 537...

From the NYT, on scoring the i-can't-believe-it's-not-a-sport sport of gymnastics:
The new system is heavy on math and employs two sets of judges, an A panel and a B panel, to do the computations. Two A-panel judges determine the difficulty and technical content of each routine. Six B-panel judges score routines for execution, artistry, composition and technique.

The A-panel judges’ scorecards start at zero, and points are added to give credit for requirements, individual skills and skills performed in succession.

The A panel counts only the gymnast’s 10 most difficult skills, which are ranked from easiest to most difficult (from A to G for women and from A to F for men). An A-level skill, like a back handspring in the floor exercise, is worth one-tenth of a point. The value increases by one-tenth of a point for each subsequent level, meaning a B-level skill is worth two-tenths and an F-level is worth six-tenths.

Required elements add a maximum of 2.5 points to the score. Extra points, either one-tenth or two-tenths, are given for stringing skills together.

Each judge adds the marks, then the two reach a consensus. Elite gymnasts usually have a difficulty score in the 6’s; the toughest routines generally have difficulty scores in the high 6’s or 7’s.

[...]
The system rewards difficulty. But the mistakes are also more costly.

Which is where the judges on the B panel come in. They rate the execution, artistry and technique of a routine, starting at a score of 10.0 and deducting for errors.

This score, called an execution score, is where the perfect 10.0 still exists. But reaching it is unlikely.

A slightly bent knee can be a deduction of one-tenth of a point. A more drastically bent knee can cost three-tenths. In this system, the deductions jump from one-tenth to three-tenths to five-tenths. A fall costs a gymnast eight-tenths. In the old system, a fall was a five-tenths deduction.

The highest and the lowest of the judges’ scores are thrown out. The remaining four scores are averaged to obtain the final B-panel score.

On the scoreboard, the final score appears in big numbers, just above the gymnast’s marks for difficulty and execution.

Apart from my grumble about the level of 'math', it's an interesting way of doing the scoring.

I wonder if this could work for conferences: replace 'degree of difficulty' by 'hardness of problem, general hotness of the area, etc', and then you could deduct points for things like
  • More than two digits after the decimal point in the approximation ratio
  • exponent of running time requires the \frac environment
  • More than two parameters in the running time
  • Gratuitous use of O() notation to hide dependencies you don't like (yes I'm talking to you, high dimensional clustering folk)
  • Requiring your winged pigs to be large in dimension, have extra glitter on the wing tips, and carry golden harps in order to make the horses take off (Hi there, complexity denizens)

Monday, June 30, 2008

An open letter to journals not mirrored by ACM and DBLP

Dear journal,
If it isn't too much of a problem, could you possibly attach a BibTeX link to each article that you list ? For the sake of my cruelly abused fingers ? Could you ? PLEASE ?

Sunday, April 20, 2008

Abstracts booklets

The ICDE proceedings is all digital. You get one CD with all the papers from the conference, and another with all the papers from all the workshops. This comes with a very nifty overview PDF that has clickable indices by name and session, as well as a search feature, and links to the PDFs for each paper. Along with this comes an abstracts booklet organized in order of schedule, with one section for titles, and another one for abstracts (the first is handy for planning your schedule).

I've been wishing that we could do something like this for a while now. The good news is that next year, SODA will do the same thing. Personally, I found the abstracts booklet more useful than one might think, and only rarely felt that I needed to look at the paper in detail.

A new (to CS) model for publishing

One of the things I like about the database community is their willingess to play with new ideas in the space of conference publishing. SIGMOD and VLDB have been experimenting with the idea of semi-persistent reviews, where reviews from SIGMOD get passed on to VLDB for papers deemed on the border; SIGMOD went to double-blind mode, over some opposition, and there's been some interesting back-and-forth since on the effectiveness of this (read this, and then this). There's also a weak rebuttal mechanism (where authors can in a limited way respond to reviewer comments in the submission process).

An even more radical idea, which from the sound of it is nearing approval, is described in detail by Panos Ipeirotis. The main points are these:
  • A new journal called the Journal for Database Management Research will be created.
  • It will have a fast review process and turnaround time (akin to the biological journals - 3 months or so)
  • Submission deadlines will be rolling: i.e you submit a paper when it's ready.
  • SIGMOD, VLDB and other database conferences will convert to a by-invitation model, where the conferences choose a sample of the published works in the journal (over the last 12 months I imagine) to be "presented" at the conference.
  • To discourage frivolous submissions, papers rejected from the journal will have to undergo a year-long cooling off before they can be resubmitted.
It's a radical approach, and approximates to a degree the prevailing practice in journal-based publication environments. It does raise some questions (some raised by Panos in the original post):
  • The year-long cooling off seems excessive punishment for what will still by necessity be a less than perfect review process
  • How will this new journal interact with other database journals ?
  • Can one journal hope to keep up with the volume of papers being produced ? Just SIGMOD, VLDB and ICDE take in over 2000 submissions between the three of them. That's close to 6 submissions EACH DAY.
  • What happens when you get into areas that overlap with DB ? For example, what about KDD, and other data mining conferences ?
  • What about all the years and years spent arguing with tenure committees about the primacy of conferences in computer science ? "Oops ! we changed our mind" ?
  • What kind of prestige will now be attached to giving presentations at the "new" conferences ? More importantly, since it was JUST ESTABLISHED that double blind submission helps remove bias at conferences like SIGMOD, isn't this a step backward in terms of which papers are chosen for presentations at conferences ? I can't imagine the process of getting invited for a talk at such a conference getting easier with this process. Are we heading towards (again) the bio model of big plenary talks by bigwigs, and lots of posters, or the math model where anyone who wants to give a talk can ?
Separating the idea of publication and dissemination is dear to my heart (I have always felt that conferences in CS fail by needing to serve both these masters at the same time), and so I'm bullish on proposals like this. But I do see problems in the details, and am curious to see how things pan out over time.

Tuesday, April 15, 2008

Deadlines (tax, conference, ...)

For the first time, I had to rush on the last day to get taxes out in time. It was a rather intense few hours, and I couldn't help but think that most of this would have been impossible in the pre-electronic era: I was able to download missing documents, get forms online, and even download tax software, all in a matter of a few clicks.

I'm also somewhat ashamed to admit that I rather enjoyed the adrenaline rush of getting to the post office with 10 minutes to spare and sending everything off. It reminded of conference submission deadlines (and I imagine that in the days before electronic submission, people literally had to rush to the post office to send copies of a submission in).

But then I got to wondering. In this hyper-competitive era, with submissions increasing, and acceptance rates, is it slowly becoming less likely that papers submitted at the last minute can compete with more polished submissions ?

Now I like data as much as anyone else, and was wondering what kinds of statistics we could glean from information already available. For example, every year PC chairs trot out statistics about acceptance rate as a function of when papers were submitted. Because of the almost hyper-exponential rate of submissions as the deadline approaches, these numbers are necesarily unreliable, but to the extent that one believes in such statistics, there's always a kind of maximum point somewhat away from the actual deadline.

I don't know if this tells me what I need to know though: does a drift of the maximum point away from the deadline prove my point ? Lacking any actual information about which papers are more "polished" i.e maybe re-submissions, or papers prepared earlier, one has to work off the assumption that papers that are ready earlier are submitted earlier, and I'm not willing to buy that argument without more data.

So maybe I should rely on anecdotal evidence. How about it, readers ? In your personal experience (I emphasize this point: I want to distinguish your opinions from what you've encountered in your own work), do you think that it's become less likely that a last-minute paper gets into a conference, especially the top ones ? Feel free to generalize this question to other deadline-based submission fora (even proposals, if you wish).

p.s A blog post without links ! Scandalous...
Subscribe to: Comments (Atom)

Disqus for The Geomblog

AltStyle によって変換されたページ (->オリジナル) /