Friday, 26 March 2010
Usage Statistics parsing and querying with redis and python
I have a distributable workflow, loosely coordinated using Redis and Supervisord - redis is used in two fashions: firstly using its lists as queues, buffering the communication between the workers, and secondly as a store, counting and associating the usage with the items and the metadata entities (people, subjects, etc) of those items.
I have written a very small python logger, that pushes loglines directly onto a redis list, providing me with live updating abilities, as well as manual log file parsing. This is currently switched on for testing in the live repository.
Current code base is here: http://github.com/benosteen/UsageLogAnalysis - it has a good number of things hardcoded to the perculiarities of my log files and repository. However, as part of the PIRUS 2 project, I am turning this into an easily reusable codebase, adding in the ability to push out OpenURLs to PIRUS statistics gatherers.
Overview:
Loglines -- lpush'd to 'q:loglines'
workers - 'debot.py' - pulls lines from this queue and parses them up, separating them into 4 categories:
- Any hit by a recognised Bot or spider
- Any view or download made by a real person on an item in the repository
- Any 404, etc
- And anything else
and the lines are moved onto 4 (5) queues respectively, q:bothits, q:objectviews (and q:count simultaneously), q:fof, and q:other. I am using prefixes as a convention when working with Redis keys - "q:" will almost always be a queue of some sort. These four queues are consumed by loggers, who commit the logs to disc, segregated into their categories.
The q:count queue is consumed by a further worker called - count.py. This does a number of jobs, and is the part that actually does the analysis.
For each repository item logged event, it finds the ID of the item and also whether this was a download of an item's files. With my repository, both these facts are deducible from the URL itself.
Given the ID, it checks redis to see if this item has had its metadata analysed before. If it hasn't, it grabs the metadata for the item from the repositories index (hosted by an instance of Apache Solr) and starts to add connections between metadata entity and ID to the redis index:
eg say item "pid:1" has the simple metadata of author_name='Ben' and subjects='foo, bar'
create unique IDs from the text by hashing the text and prefix it with the type of the field they came from:
Prefixes:
- name => "n:"
- institution => "i:"
- faculty => "f:"
- subjects => "s:"
- keyphrases => "k:"
- content type => "type:"
- collection => "col:"
- thesis type => "tt:"
eg
>>> from hashlib import md5
>>> md5("Ben").hexdigest()
'092f2ba9f39fbc2876e64d12cd662f72'
So, the hashkey of the 'name' 'Ben' is 'n:092f2ba9f39fbc2876e64d12cd662f72'
Now to make the connections in Redis:
- Add ID to the set 'objectitems' - to keep track of all the IDs (SADD objectitems {ID})
- Set 'n:092f2....' to 'Ben' (so we can keep a reverse mapping)
- Add 'n:092f2...' to 'names' set (to make it clearer. KEYS n:* should return an equivalent set)
- Add 'n:092f2...' to 'e:{id}' eg "e:pid:1" - (e -> prefix for collections of entities. e:{id} is a set of all entities that occur in id)
- Add 'e:pid:1' to 'e:n:092f2....' (gathers a list of item ids in which this entity 'Ben' occurs in)
Repeat for any entity you wish to track.
To make this more truth-manageable, you should include the id of record with the text when you generate the hashkey. That way, 'Ben' appearing in one record will have a different key than 'Ben' occuring in another. The assertion that these two entities are the same can easily take place in a different set, (I'm using b: as the prefix for these bundles of asserted equivalence)
Once you have made these assertions, you can set about counting :)
Conventions for tracking hits:
d[v|d|o]:{id} - set of the dates on which {id} was viewed (v), downloaded from (d) or any other page action (o)
eg dv:pid:1 -> set of dates on which pid:1 had page views.
YYYY-MM-DD:{id}:[v|d|o] - set of IP clients that accessed a particular item on a given day - v,d,o as above
eg 2010年02月03日:pid:1:d - set of IP clients that downloaded a file from pid:1 on 2010年02月03日
t:views:{hashkey}, t:dls:{hashkey}, t:other:{hashkey}
Grand totals of views, downloads or other accesses on a given entity or id. Good for quick lookups.
Let's walk through an example: consider that a client of IP 1.2.3.4 visits the record page for this 'pid:1' on 2010年01月01日:
ID = pid:1
Add the User Agent string ("mozilla... etc") to the 'ua:{IP}' set, to keep track of the fingerprints of the visitors.
Try to add the IP address to the set - in this case "2010-01-01:pid:1:v"
If the IP isn't already in this set (the client hasn't accessed this page already today) then:
- make sure that "2010-01-01" is a part of the 'dv:pid:1' set
- go through all the entities that are part of pid:1 (n:092... etc) and increment their totals by one.
- INCR t:views:n:092...
- INCR t:views:pid:1
Now, what about querying?
Say we wish to look up the activity on a given entity, say for 'Ben'?
First, find the hashkey(s) that exist that are equivalent - either directly using the simple md5sum hash, or by checking which bundles are for this entity.
You can get the grand totals by simply querying "t:views:key", "t:dls..." for each key and summing them together.
You can get more refined answers by getting the set of IDs that this entity is associated with, and querying that to gather all the daily IP sets for them, and summing the answer. This gives me a nice way to generate data suitable for a daily activity sparkline, like:
I have added another set of keys to the store, of the form 'geocode:{IP}' that record country code to IP address, which gives me a nice way to plot out graphs like the following also using the google chart API:
Python logging to Redis
This functionality is mainly in one file in the github repo: redislogger.py
As you can see, most of that file is taken up with a demonstration of how to invoke it! The file that holds the logging configuration which this demo uses is in logging.conf.example.
NB The usage analysis code and UI is very much a WIP
but, I just wanted to post quickly on the rough overview on how it is set up and working.
Thursday, 25 March 2010
Curating content from one repository to put into another
Start a python commandline:
Thursday, 11 February 2010
My swiss army toolkit for distributed/multiprocessing systems
- Redis - data structure server, providing atomic operations on integers, lists, sets, and sorted lists.
- RabbitMQ - messaging server, based on the AMQP spec. IMO Much cleaner, easier to manage, more flexible and more reliable than all the JMS systems I've used.
- Supervisor - a battle-tested, process manager that can be operated via XML-RPC or HTTP. Enables live control and status of your processes.
- a process to communicate with the OAI-PMH service to gain the list of identifiers for the items in the repository (with the ability to update itself at a later time). Including the ability to find the serialised form of the full metadata for the item, if it cannot be gotten from the OAI-PMH service (eg Eprints3 XML isn't often included in the OAI-PMH service, but can be retrieved from the Export function.),
- a process that simply downloads files to a point on the disc,
- and a service that allows process one to queue jobs for process 2 to download - in this case Redis.
- sudo apt-get install build-essential python-dev python-setuptools [make sure you can build and use easy_install - here shown for debian/ubuntu/etc]
- sudo easy_install supervisor
- mkdir oaipmh_directory # A directory to contain all the bits you need
- cd oaipmh_directory
[program:oaipmhgrabber]autorestart = falsenumprocs = 1autostart = falseredirect_stderr = Truestopwaitsecs = 10startsecs = 10priority = 10command = python harvest.pystartretries = 3stdout_logfile = workerlogs/harvest.log[program:downloader]autorestart = truenumprocs = 1autostart = falseredirect_stderr = Truestopwaitsecs = 10startsecs = 10priority = 999command = oaipmh_file_downloader q:download_liststartretries = 3stdout_logfile = workerlogs/download.log[program:redis]autorestart = truenumprocs = 1autostart = trueredirect_stderr = Truestopwaitsecs = 10startsecs = 10priority = 999command = path/to/the/redis-serverstartretries = 3stdout_logfile = workerlogs/redis.log[unix_http_server]file = /tmp/supervisor.sock[supervisord]minfds = 1024minprocs = 200loglevel = infologfile = /tmp/supervisord.loglogfile_maxbytes = 50MBnodaemon = falsepidfile = /tmp/supervisord.pidlogfile_backups = 10[supervisorctl]serverurl = unix:///tmp/supervisor.sock[rpcinterface:supervisor]supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface[inet_http_server]username = guestpassword = mypasswordport = 127.0.0.1:9001
#!/usr/bin/env pythonfrom oaipmhscraper import Eprints3Harvestero = Eprints3Harvester("repo", base_oai_url="http://eprints.maths.ox.ac.uk/cgi/oai2/")o.getRecords(metadataPrefix="XML",template="%(pid)s/%(prefix)s/mieprints-eprint-%(pid)s.xml")
Click on the 'redis' name to see the logfile that this is generating - you'll want to see lines like:
INFO:CombineHarvester File downloader:Starting download of XML (from http://eprints.maths.ox.ac.uk/cgi/export/370/XML/mieprints-eprint-370.xml) to object oai:generic.eprints.org:3702010年02月11日 13:43:51,284 - CombineHarvester File downloader - INFO - Download completed in 0 secondsINFO:CombineHarvester File downloader:Download completed in 0 seconds2010年02月11日 13:43:51,285 - CombineHarvester File downloader - INFO - Saving to Silo repoINFO:CombineHarvester File downloader:Saving to Silo repo2010年02月11日 13:43:51,287 - CombineHarvester File downloader - INFO - Starting download of XML (from http://eprints.maths.ox.ac.uk/cgi/export/371/XML/mieprints-eprint-371.xml) to object oai:generic.eprints.org:371INFO:CombineHarvester File downloader:Starting download of XML (from http://eprints.maths.ox.ac.uk/cgi/export/371/XML/mieprints-eprint-371.xml) to object oai:generic.eprints.org:371
#!/usr/bin/env pythonfrom oaipmhscraper import Eprints3Harvestero = Eprints3Harvester("repo", base_oai_url="http://eprints.maths.ox.ac.uk/cgi/oai2/")o.reprocessRecords()
[program:queuefilesfordownload]autorestart = falsenumprocs = 1autostart = falseredirect_stderr = Truestopwaitsecs = 10startsecs = 10priority = 999command = python download_files.pystartretries = 3stdout_logfile = workerlogs/download_files.log
Now, switch on the reprocess record worker and tail -f the downloader if you want to watch it work :)
Monday, 18 January 2010
Usage stats and Redis
Recently, it has let me cut through access logs munging like a hot knife through butter, all with multiprocessing goodness.
Key things:
Using sets to manage botlists:
>>> from redis import Redis
>>> r = Redis()
>>> for bot in r.smembers("botlist"):
... print bot
...
lycos.txt
non_engines.txt
inktomi.txt
misc.txt
askjeeves.txt
oucs_bots
wisenut.txt
altavista.txt
msn.txt
googlebotlist.txt
>>> total = 0
>>> for bot in r.smembers("botlist"):
... total = total + r.scard(bot)
...
>>> total
3882
So, I have 3882 different IP addresses that I have built up that I consider bots.
Keeping counts and avoiding race-conditions
By using the Redis INCR command, it's easy to write little workers that run in their own process but which atomically increment counts of hits.
What does the stat system look like?
I am treating each line of the Apache-style log as a message that I am passing through a number of workers.
Queues
All in the same AMQP exchange: ("stats")
Queue "loglines" - msg's = A single log line in the Apache format. Can be sourced from either local logs or from the live service.
loglines is listened to by a debot.py worker, just one at the moment. This worker feeds three queues:
Queue "bothits" - log lines from a request that matches a bot IP
Queue "objectviews" - log lines from a request that was a record page view or item download
Queue "other" - log lines that I am presently not so interested in.
[These three queues are consumed by 3 loggers and these maintain a copy of the logs, pre-separated. These are designed to be temporary parts of the workflow, to be discarded once we know what we want from the logs.]
objectviews is subscribed to by a count.py worker which does the heavy crunching as shown below.
Debot.py
The first worker is 'debot.py' - this does the broad separation and checking of a logged event. In essence, it uses the Redis SISMEMBER command to see if the IP address is in the blacklists and if not, applies a few regex's to see if it is a record view and/or a download or something else.
Broad Logging
There are three logger workers that debot.py feeds for "bothits", "objectviews", and "other" - these workers just sit and listen on the relevant queue for an apache log line and appends it to the logfile it has open. Saves me having to open/close logger objects or pass anything around.
The logfiles are purely as a record of the processing and so I can skip redoing it if I want to do any further analysis, like tracking individuals, etc.
The loggers also INCR a key in Redis for each line they see - u:objectviews, u:bothits, and u:other as appropriate - these give me a rough idea of how the processing is going.
(And you can generate pretty charts from it too:)
(data sourced at a point during the processing - 10million bot hits vs 360k object views/dls)
Counting hits (metadata and time based)
Most of the heavy lifting is in count.py - this is fed from the object views/downloads stream coming from the debot.py worker. It does a number of procedural steps for the metadata:
- Get metadata from ORA's Solr endpoint (as JSON)
- Specifically, get the 'authors' (names), subjects/keyphrases, institutions, content types, and collections things appear in.
- These fields correspond to certain keys in Redis. Eg names = 'number:names' = number of unique names, 'n:...' = hits to a given name, etc
- For each view/dl:
- INCR 'ids:XXXXX' where XXXXX is 'names', 'subjects', etc. It'll return the new value for this, eg 142
- SET X:142 to be equal to the text for this new entity, where X is the prefix for the field.
- SADD this id (eg X:142) to the relevant set for it, like 'names', 'subjects', etc - This is so we can have an accurate idea of the entities in use even after removing/merging them.
- Reverse lookup: Hash the text for the entity (eg md5("John F. Smith")) and SET r:X:{hash} to be equal to "X:142"
- SET X:views:142 to be equal to 1 to get the ball rolling (or X:dl:142 for downloads)
- If the name is not new:
- Hash the text and lookup r:{hash} to get the id (eg n:132)
- INCR the item's counter (eg INCR n:views:132)
- Time-based and other counts:
- INCR t:{object id} (total hits on that repository object since logs began)
- INCR t:MMYY (total 'proper' hits for that month)
- INCR t:MMYY:{object id} (total 'proper' hits for that repo item that month)
- INCR t:MMYY:{entity id} (Total hits for an entity, say 'n:132' that month)
A lot of pressure is put on Redis by count.py but it seems to be coping fine. A note for anyone else thinking about this: Redis keeps its datastore in RAM - running out of RAM is a Bad Thing(tm).
I know that I could also just use the md5 hashes as ids, rather than using a second id - I'm still developing this section and this outline just states it how it is now!
Also, it's worth noting that if I needed to, I can put remote redis 'shards' on other machines and they can just pull log lines from the main objectview queue to process. (It'll still need to create the id <-> entity name mapping on the main store though or a slave of the main store.)
But why did I do this?
I thought that it would mean I could handle both legacy logs and live data and have a framework I could put against other systems and in a way that would mean I would write less code and for the system to be more reliable.
So far, I still think this is the case. If people are interested, I'll abstract out a class or two (eg the metadata lookup function, etc) and stick it on google code. It's not really a lot of code so far, I think even this outline post is longer....
Thursday, 15 October 2009
Python in a Pairtree
"Pairtree? huh, what's that?" - in a nutshell it's 'just enough veneer on top of a conventional filesystem' for it to be able to store objects sensibly; a way of storing objects by id on a normal hierarchical filesystem in a pragmatic fashion. You could just have one directory that holds all the objects, but this would unbalance the filesystem and due to how most are implemented, would result in a less-than-efficient store. Filesystems just don't deal well with thousands or hundreds of thousands of directories in the same level.
Pairtree provides enough convention and fanning out of hierarchical directories to both spread the load of storing high numbers of objects, while retaining the ability to treat each object distinctly.
The Pairtree specification is a compromise between fanning out too much and too little and assumes that the ids used are opaque; that the ids have no meaning and are to all intents and purposes 'random'. If your ids are not, for example, they are human-readable words, then you will have to tweak how the ids are split into directories to ensure better performance.
[I'll copy&paste some examples from the specifications to illustrate what it does]
For example, to store objects that have identifiers like the following URI - http://n2t.info/ark:/13030/xt2{some string}
eg:
http://n2t.info/ark:/13030/xt2aacd
http://n2t.info/ark:/13030/xt2aaab
http://n2t.info/ark:/13030/xt2aaac
This works out to look like this on the filesystem:
current_directory/
| pairtree_version0_1 [which version of pairtree]
| ( This directory conforms to Pairtree Version 0.1. Updated spec: )
| ( http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html )
|
| pairtree_prefix
| ( http://n2t.info/ark:/13030/xt2 )
|
\--- pairtree_root/
|--- aa/
| |--- cd/
| | |--- foo/
| | | | README.txt
| | | | thumbnail.gif
| | ...
| |--- ab/ ...
| |--- af/ ...
| |--- ag/ ...
| ...
|--- ab/ ...
...
\--- zz/ ...
| ...
With the object http://n2t.info/ark:/13030/xt2aacd containing a directory 'foo', which itself contains a README and a thumbnail gif.
Creating this structure by hand is tedious, and luckily for you, you don't have to (if you use python that is)
To get the pairtree library that I've written, you can either install it from the Pypi site http://pypi.python.org/pypi/Pairtree or if python-setuptools/easy_install is on your system, you can just
sudo easy_install pairtreeYou can find API documentation and a quick start here.
The quick start should get you up and running in no time at all, but let's look at how we might store Fedora-like objects on disk using pairtree. (I don't mean how to replicate how Fedora stores objects on disk, I mean how to make an object store that gives us the basic framework of 'objects are bags of stuff')
>>> from pairtree import *
>>> f = PairtreeStorageFactory()
>>> fedora = f.get_store(store_dir="objects", uri_base="info:fedora/")
Right, that's the basic framework done, let's add some content:
>>> obj = fedora.create_object('changeme:1')
>>> with open('somefileofdublincore.xml', 'r') as dc:
... obj.add_bytestream('DC', dc)
>>> with open('somearticle.pdf', 'r') as pdf:
... obj.add_bytestream('PDF', pdf)
>>> obj.add_bytestream('RELS-EXT', """<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rel="info:fedora/fedora-system:def/relations-external#">
<rdf:Description rdf:about="info:fedora/changeme:1">
<rel:isMemberOf rdf:resource="info:fedora/type:article"/>
</rdf:Description>
</rdf:RDF>""")
The
add_bytestream method is adaptive - if you pass it something that supports a read() method, it will attempt to stream out the content in chunks to avoid reading the whole item into memory at once. If not, it will just write the content out as is.I hope this gives people some idea on what can be possible with a conventional filesystem, after all, filesystem code is pretty well tested in the majority of cases so why not make use of it.
(NB the
with python command is a nice way of dealing with file-like objects, made part of the core in python ~2.6 I think. It tries to make sure that the file is closed at the end of the block, equivalent to an "temp = open(foo) - do stuff - temp.close()")
Friday, 19 June 2009
What is a book if you can print one in 5 minutes?
This excites me a lot. Yes, that does imply I am a geek, but whatever.
So, what would I want to do with one? well, printing books that already exist is fun but not grasping the potential. If you can print a book in 5 minutes, for how long must that book have been in existence before you press print? Why can't we start talking about repurposing corrently licenced or public domain content?
Well, what I need (and am keen to get going with) is the following:
1) PDF generator -> pass it an RSS feed of items and it will do it's best to generate page content from these.
- blogs/etc: grab the RSS/Atom feed and parse out the useful content
- Include option to use blog comments or to gather comments/backlinks/tweets from the internet
- PDFs - simply concatenate the PDF as is into the final PDF
- Books/other digital items with ORE -> interleave these
- offer similar comment/backlink option as above
- ie the book can be added 'normally' with the internet-derived comments on the facing page to the book/excerpt they actually refer to, or the discussion can be mirrored with the comments in order and threaded, with the excerpts from the pages being attached to these. Or why not both?
- Automated indexes of URLs, dates and commenters can be generated without too much trouble on demand.
- Full-text indexes will be more demanding to generate, but I am sure that a little money and a crowd-sourced solution can be found.
2) Ability to (onsite) print these PDFs into a single, (highly sexy) bound volume using a machine such as can be found in many Blackwell's bookshops today.
3) A little capital to run competitions, targeting various levels in the university, asking the simple question "If you could print anything you want as a bound book in 5 minutes, what's the most interesting thing you can think of to print out?"
Why?
People like books. They work, they don't need batteries and people who can read can intuitively 'work' a book. But books are not very dynamic. You have to have editors, drafters, publishers, and so on and so forth, and the germination of a book has to be measured in years... right?
Print on demand smashes that and breaks down conceptions of what a book is. Is it a sacred tome that needs to be safeguarded and lent only to the most worthy? Or is is a snapshot of an ongoing teaching/research process? Or can it simply be a way to print out a notebook with page numbers as you would like them? Can a book be an alive and young collation of works, useful now, but maybe not as critical in a few years?
Giving people the ability to make and generate their own books offers more potential - what books are they creating? Which generated books garner the most reuse, comments and excitement? Would the comments about the generated works be worth studying and printing in due course? Will people break through the pen-barrier, that taboo of taking pen to a page? Or will we just see people printing wikitravel guides and their flickr account?
Use-cases to give a taste of the possibilities:
- Print and share a discussion about an author, with excepts ordered and surrounded by the chronologically ordered and threaded comments made by a research group, a teaching group or even just a book club.
- Library 'cafe' - library can subsidise the printing of existing works for use in the cafe, as long as the books stay in the cafe. Spillages, crumbs are not an issue to these facsimile books.
- Ability to record and store your terms/years/etc worth of notes in a single volume for posterity. At £5 a go, many students will want this.
- Test print a Thesis/Dissertation, without the expense of consulting a book binder.
- Archive in paper a snapshot of a digital labbook implemented on drupal or wordpress.
- Lecturer's notes from a given term, to avoid the looseleaf A4 overload spillage that often occurs.
- Printing of personalised or domain specific notebooks. (ie. a PDF with purposed fields, named columns and uniquely identified pages for recording data in the field - who says a printed book has to be full of info?)
- Maths sheets/tests/etc
- Past Papers
I am humbled by the work done by Russell Davies, Ben Terrett and friends in this area and I can pinpoint the time at which I started to think more about these things to BookCamp sponsored by Penguin UK and run by Jeremy Ettinghausen (blog)
Please, please see:
http://tinyurl.com/9qfoyt - Things Our Friends Have Written On The Internet 2008
Russell Davies UnNotebook: http://russelldavies.typepad.com/planning/2009/02/unnotebook.html
(http://tinyurl.com/cpdllw )
Friday, 15 May 2009
RDF + UI + Fedora for object metadata (RDF) editing
Requirements:
For the Web UI:
Using jQuery and 3 plugins: jEditable, autocomplete and rdfquery.
- jeditable: http://www.appelsiini.net/projects/jeditable
- jeditable live demo: http://www.appelsiini.net/projects/jeditable/default.html <-- see this to understand what it gives.
- http://jquery.bassistance.de/autocomplete/demo/ <-- example autocomplete demo
- http://code.google.com/p/rdfquery/ from jeni tennison for reading RDFa information from the DOM of an HTML page using javascript.
- create new session (specifically, a delta of the RDF expressed in iand's ChangeSet schema http://vocab.org/changeset/schema ) POST /{object-id}/{RDF}/session/new -> HTTP 201 - session url (includes object id root)
- POST triples to /{session-url}/update to add to the 'add' and/or 'delete' portions
- A POST to /{session-url}/commit or just DELETE /{session-url}
Workflow:
- Template grabs RDF info from object, and then displays it in the typical manner (substituting labels for uris when relevant), but also encodes the values with RDFa.
- If the user is auth'd to edit, each of these values has a css class added so that the inline editing for jeditable can act on it.
- It then reads for the given type of object the cardinality of the fields present (eg from an OWL markup for the class) and also the other predicates that can be applied to this object. For multivalued predicates, an 'add another' type link is appended below. For unused predicates, its up to the template to suggest these - currently, all the objects in the repo can have type specific templates, but for this example, I am considering generics.
- For predicates which have usefully typed ranges, ie foaf:knows in our system points to a URI, rather than a string - autocomplete is used to hook into our or maybe anothers index of known labels for uris to suggest correct values. For example, if an author was going to indicate their affiliation to a department here at oxford (BRII project) it would be handy if a correct list of department labels was used. A choice from the list would view as the label, but represent the URI in the page.
- When the user clicks on it to change the value, a session is created if none exists stamped with the start time of the edit and the last modified date of the RDF datastream, along with details of the editor, etc.
- rdfquery is used to pull the triple from the RDFa in the edited field. When the user submits a change, the rdfa triple is posted to the session url as a 'delete' triple and the new one is encoded as an 'add' triple.
- A simple addition would just post to the session with no 'delete' parameter.
- The UI should then reflect that the session is live and should be committed when the user is happy with the changes.
- On commit, the session would save the changeset to the object being edited, and update the RDF file in question. (so we keep rdfquery would then update the RDFa in the page to the new values, upon a 200/204 reply.
- On cancel, the values would be restored, and the session deleted.
If the lastmodified date on the datastream is different from the one marked on the session (ie possible conflict), the page information is updated to the most recent and the session is reapplied in the browser, highlighting the conflicts, and a warning given to the user.
I am thinking of increasing the feedback using a messaging system, while keeping the same optimistic edit model - you can see the status of an item, and that someone else has a session open on it. The degree to the feedback is something I am still thinking about - should the UI highlight or even reflect the values that the other user(s) is editing in realtime? is that useful?