Embed presentation
Download as PDF, PPTX
Cassandra Data Modeling Workshop Matthew F. Dennis // @mdennis
Overview ●くろまる Hopefully interactive ●くろまる Use cases submitted via Google Moderator, email, IRC, etc ●くろまる Interesting and/or common requests in the slides to get us started ●くろまる Bring up others if you have them !
Data Modeling Goals ●くろまる Keep data queried together on disk together ●くろまる In a more general sense think about the efficiency of querying your data and work backward from there to a model in Cassandra ●くろまる Don't try to normalize your data (contrary to many use cases in relational databases) ●くろまる Usually better to keep a record that something happened as opposed to changing a value (not always advisable or possible)
ClickStream Data (use case #1) ●くろまる A ClickStream (in this context) is the sequence of actions a user of an application performs ●くろまる Usually this refers to clicking links in a WebApp ●くろまる Useful for ad selection, error recording, UI/UX improvement, A/B testing, debugging, et cetera ●くろまる Not a lot of detail in the Google Moderator request on what the purpose of collecting the ClickStream data was – so I made some up
ClickStream Data Defined ●くろまる Record actions of a user within a session for debugging purposes if app/browser/page/server crashes
Recording Sessions ●くろまる CF for sessions a user has had ●くろまる Row Key is user name/id ●くろまる Column Name is session id (TimeUUID) ●くろまる Column Value is empty (or length of session, or some aggregated details about the session after it ended) ●くろまる CF for actual sessions ●くろまる Row Key is TimeUUID session id ●くろまる Column Name is timestamp/TimeUUID of each click ●くろまる Column Value is details about that click (serialized)
UserSessions Column Family Session_01 Session_02 Session_03 (TimeUUID) (TimeUUID) userId (TimeUUID) (empty/agg) (empty/agg) (empty/agg) ●くろまる Most recent session ●くろまる All sessions for a given time period
Sessions Column Family timestamp_01 timestamp_02 timestamp_03 SessionId (TimeUUID) ClickData ClickData ClickData (json/xml/etc) (json/xml/etc) (json/xml/etc) ●くろまる Retrieve entire session's ClickStream (row) ●くろまる Order of clicks/events preserved ●くろまる Retrieve ClickStream for a slice of time within the session ●くろまる First action taken in a session ●くろまる Most recent action taken in a session ●くろまる Why JSON/XML/etc?
Alternatives?
Of Course (depends on what you want to do) ●くろまる Secondary Indexes ●くろまる All Sessions in one row ●くろまる Track by time of activity instead of session
Secondary Indexes Applied ●くろまる Drop UserSessions CF and use secondary indexes ●くろまる Uses a "well known" column to record the user in the row; secondary index is created on that column ●くろまる Doesn't work so well when storing aggregates about sessions in the UserSessions CF ●くろまる Better when you want to retrieve all sessions a user has had
All Sessions In One Row Applied ●くろまる Row Key is userId ●くろまる Column Name is composite of timestamp and sessionId ●くろまる Can efficiently request activity of a user across all sessions within a specific time range ●くろまる Rows could potentially grow quite large, be careful ●くろまる Reads will almost always require at least two seeks on disk
Time Period Partitioning Applied ●くろまる Row Key is composite of userId and time "bucket" ●くろまる e.g. jan_2011 or jan_01_2011 for month or day buckets respectively ●くろまる Column Name is TimeUUID of click ●くろまる Column Value is serialized click data ●くろまる Avoids always requiring multiple seeks when the user has old data but only recent data is requested ●くろまる Easy to lazily aggregate old activity ●くろまる Can still efficiently request activity of a user across all sessions within a specific time range
Rolling Time Window Of Data Points (use case #2) ●くろまる Similar to RRDTool was the example given ●くろまる Essentially store a series of data points within a rolling window ●くろまる common request from Cassandra users for this and/or similar
Data Points Defined ●くろまる Each data point has a value (or multiple values) ●くろまる Each data point corresponds to a specific point in time or an interval/bucket (e.g. 5 th minute of th 17 hour on some date)
Time Window Model System7:RenderTime TimeUUID0 TimeUUID1 TimeUUID2 s7:rt 0.051 0.014 0.173 Some request took 0.014 seconds to render ●くろまる Row Key is the id of the time window data you are tracking (e.g. server7:render_time) ●くろまる Column Name is timestamp (or TimeUUID) the event occurred at ●くろまる Column Value is the value of the event (e.g. 0.051)
The Details ●くろまる Cassandra TTL values are key here ●くろまる When you insert each data point set the TTL to the max time range you will ever request; there is very little overhead to expiring columns ●くろまる When querying, construct TimeUUIDs for the min/max of the time range in question and use them as the start/end in your get_slice call ●くろまる Consider partitioning the rows by a known time period (e.g. "year") if you plan on keeping a long history of data (NB: requires slightly more complex logic in the app if a time range spans such a period) ●くろまる Very efficient queries for any window of time
Rolling Window Of Counters (use case #3) ●くろまる "How to model rolling time window that contains counters with time buckets of monthly (12 months), weekly (4 weeks), daily (7 days), hourly (24 hours)? Example would be; how many times user logged into a system in last 24 hours, last 7 days ..." ●くろまる Timezones and "rolling window" is what makes this interesting
Rolling Time Window Details ●くろまる One row for every granularity you want to track (e.g. day, hour) ●くろまる Row Key consists of the granularity, metric, user and system ●くろまる Column Name is a "fixed" time bucket on UTC time ●くろまる Column Values are counts of the logins in that bucket ●くろまる get_slice calls to return multiple counters which are them summed up
Rolling Time Window Counter Model user3:system5:logins:by_day 20110107 ... 20110523 U3:S5:L:D 2 ... 7 2 logins in Jan 7th 2011 7 logins on May 23rd 2011 for user 3 on system 5 for user 3 on system 5 user3:system5:logins:by_hour 2011010710 ... 2011052316 U3:S5:L:H 1 ... 7 one login for user 3 on system 5 2 logins for user 3 on system 5 on Jan 7th 2011 for the 10th hour on May 23rd 2011 for the 16th hour
Rolling Time Window Queries ●くろまる Time window is rolling and there are other timezones besides UTC ●くろまる one get_slice for the "middle" counts ●くろまる one get_slice for the "left end" ●くろまる one get_slice for the "right end"
Example: logins for the past 7 days ●くろまる Determine date/time boundaries ●くろまる Determine UTC days that are wholly contained within your boundaries to select and sum ●くろまる Select and sum counters for the remaining hours on either side of the UTC days ●くろまる O(1) queries (3 in this case), can be requested from C* in parallel ●くろまる NB: some timezones are annoying (e.g. 15 minute or 30 minutes offsets); I try to ignore them
Alternatives? (of course) ●くろまる If you're counting logins and each user doesn't login in hundreds of times a day, just have one row per user with a TimeUUID column name for the time the login occurred ●くろまる Supports any timezone/range/granularity easily ●くろまる More expensive for large ranges (e.g. year) regardless of granularity, so cache results (in C*) lazily. ●くろまる NB: caching results for rolling windows is not usually helpful (because, well it's rolling and always changes)
Eventually Atomic (use case #4) ●くろまる "When there are many to many or one to many relations involved how to model that and also keep it atomic? for eg: one user can upload many pictures and those pictures can somehow be related to other users as well." ●くろまる Attempting full ACID compliance in distributed systems is a bad idea (and impossible in the general sense) ●くろまる However, consistency is important and can certainly be achieved in C* ●くろまる Many approaches / alternatives ●くろまる I like transaction log approach, especially in the context of C*
Transaction Logs (in this context) ●くろまる Records what is going to be performed before it is actually performed ●くろまる Performs the actions that need to be atomic (in the indivisible sense, not the all at once sense) ●くろまる Marks that the actions were performed
In Cassandra ●くろまる Serialize all actions that need to be performed in a single column – JSON, XML, YAML (yuck!), cpickle, JSO, et cetera ●くろまる Row Key = randomly chosen C* node token ●くろまる Column Name = TimeUUID ●くろまる Perform actions ●くろまる Delete Column
Configuration Details ●くろまる Short GC_Grace on the XACT_LOG Column Family (e.g. 1 hour) ●くろまる Write to XACT_LOG at CL.QUORUM or CL.LOCAL_QUORUM for durability (if it fails with an unavailable exception, pick a different node token and/or node and try again; same semantics as a traditional relational DB) ●くろまる 1M memtable ops, 1 hour memtable flush time
Failures ●くろまる Before insert into the XACT_LOG ●くろまる After insert, before actions ●くろまる After insert, in middle of actions ●くろまる After insert, after actions, before delete ●くろまる After insert, after actions, after delete
Recovery ●くろまる Each C* has a crond job offset from every other by some time period ●くろまる Each job runs the same code: multiget_slice for all node tokens for all columns older than some time period ●くろまる Any columns need to be replayed in their entirety and are deleted after replay (normally there are no columns because normally things are working normally)
XACT_LOG Comments ●くろまる Idempotent writes are awesome (that's why this works so well) ●くろまる Doesn't work so well for counters (they're not idempotent) ●くろまる Clients must be able to deal with temporarily inconsistent data (they have to do this anyway) ●くろまる Could use a reliable queuing service (e.g. SQS) instead of polling – push to SQS first, then XACT log.
Q? Cassandra Data Modeling Workshop Matthew F. Dennis // @mdennis