Download as PDF, PPTX
AWS Gaming Solutions | GDC 2014 Game Analytics with AWS Or, How to learn what your players love so they will love your game Nate Wiger @nateware | Principal Gaming Solutions Architect
AWS Gaming Solutions | GDC 2014 Mobile Game Landscape • Free To Play • In-App Purchases • Long-Tail • Cross-Platform • Go Global • User Retention = Revenue
AWS Gaming Solutions | GDC 2014 Projected Mobile App Revenue 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 2011 2012 2013 2014 2015 2016 2017 Ads IAP Paid Source: Gartner
AWS Gaming Solutions | GDC 2014 Winning at Free to Play • Phase 1: Collect Data • Phase 2: Analyze • Phase 3: Profit
AWS Gaming Solutions | GDC 2014 Analyze What? Emotions • Enjoying game • Engaged • Like/dislike new content • Stuck on a level • Bored • Abandonment Behaviors • Hours played day/week • Number of sessions/day • Level progression • Friend invites/referrals • Response to mobile push • Money spent/week
AWS Gaming Solutions | GDC 2014 Example: Level Progression (One Metric) 0 2 4 6 8 10 L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 Tries / Level # of Tries
AWS Gaming Solutions | GDC 2014 Example: Level Progression (Two Metrics) 0 10 20 30 40 50 60 0 2 4 6 8 10 L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 Tries / Level % Highest Level # of Tries
AWS Gaming Solutions | GDC 2014 Key Takeaways • Multiple data sources • Correlate variables • Deltas vs absolutes • Settle on terminology (game vs level) • Time matters
AWS Gaming Solutions | GDC 2014
AWS Gaming Solutions | GDC 2014 Events & Metrics • Event = Moment in Time – Login/quit – Game start/end – Level up – In-app purchase • Metrics = What to Measure – KISS – Numbers – Booleans – Strings (Enums) • Always Include (ALWAYS) – User – Action – Session (context-dependent) – Timestamp in ISO8601 2014-‐03-‐16T16:28:26
AWS Gaming Solutions | GDC 2014 Off The Shelf Analytics • Easy To Integrate • Pre-Baked Reports • Rate Limits • Retention Windows • Data Lock-In
AWS Gaming Solutions | GDC 2014 Ok, A Real Business Plan Ingest Store Process Analyze
AWS Gaming Solutions | GDC 2014 Ok, A Real Business Plan Ingest • HTTP PUT • Kafka • Kinesis • Scribe Store • S3 • DynamoDB • HDFS • Redshift Process • EMR (Hadoop) • Spark • Storm Analyze • Tableau • Pentaho • Jaspersoft
AWS Gaming Solutions | GDC 2014 • Write Events File on Device • Periodically Upload to S3 • Process into Redshift • Point GUI Tool to Redshift Start Simple 2014-‐01-‐24,nateware,e4df,login 2014-‐01-‐24,nateware,e4df,gamestart 2014-‐01-‐24,nateware,e4df,gameend 2014-‐01-‐25,nateware,a88c,login 2014-‐01-‐25,nateware,a88c,friendlist 2014-‐01-‐25,nateware,a88c,gamestart Profit!
AWS Gaming Solutions | GDC 2014 Redshift at a Glance 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores Amazon S3/DynamoDB JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node • Leader Node – SQL endpoint – Stores metadata – Coordinates query execution • Compute Nodes – Columnar table storage – Load, backup, restore via Amazon S3 – Parallel load from Amazon DynamoDB • Single node version available
AWS Gaming Solutions | GDC 2014 Tableau + Redshift
AWS Gaming Solutions | GDC 2014 Plumbing 1 Create S3 bucket ("mygame-analytics-events") 2 Request a security token for your mobile app: http://docs.aws.amazon.com/STS/latest/UsingSTS/Welcome.html 3 Upload data from your users' devices 4 Run a scheduled copy to Redshift 5 Setup Tableau to access Redshift 6 Go to the Beach
AWS Gaming Solutions | GDC 2014 Loading Redshift from S3 copy events from 's3://mygame-‐analytics-‐events' credentials 'aws_access_key_id=<access-‐key-‐id>; aws_secret_access_key=<secret-‐access-‐key>' delimiter=','; Scheduled Redshift Load using Data Pipeline: http://aws.amazon.com/articles/1143507459230804
AWS Gaming Solutions | GDC 2014 • Also Collect Server Logs • Periodically Upload to S3 • Stuff into Redshift • External Analytics Data Too More Data Sources EC2 External Analytics
AWS Gaming Solutions | GDC 2014 Logrotate to S3 /var/log/apache2/*.log { sharedscripts postrotate sudo /usr/sbin/apache2ctl graceful s3cmd sync /var/log/*.gz s3://mygame-‐logs/ endscript } Blog Entry on Log Rotation: http://www.dowdandassociates.com/blog/content/howto-rotate-logs-to-s3/ And/or, Use ELB Access Logs: http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/ access-log-collection.html
AWS Gaming Solutions | GDC 2014 • Different File Formats • Device vs Apache vs CDN • Cleanup with EMR Job • Output to Clean Bucket • Load into Redshift Dealing With Messy Data EC2
AWS Gaming Solutions | GDC 2014 Redshift vs Elastic MapReduce Redshift • Columnar DB • Familiar SQL • Structured Data • Batch Load • Faster to Query • Long-term Storage Elastic MapReduce • Hadoop • Hive/Pig are SQL-like • Unstructured Data • Streaming Loop • Scales > PB's • Transient
AWS Gaming Solutions | GDC 2014 • Integrate Game DB • Load Directly into Redshift • Redshift does Intelligent Merge • Tracks Hash Keys, Columns Direct From DynamoDB EC2
AWS Gaming Solutions | GDC 2014 • Integrate Game DB • Load Directly into Redshift • Redshift does Intelligent Merge • Tracks Hash Keys, Columns • Or Stream into EMR Direct From DynamoDB EC2
AWS Gaming Solutions | GDC 2014 Loading Redshift from DynamoDB copy games from 'dynamodb://games' credentials 'aws_access_key_id=<access-‐key-‐id>; aws_secret_access_key=<secret-‐access-‐key>'; copy events from 's3://mygame-‐analytics-‐events' credentials 'aws_access_key_id=<access-‐key-‐id>; aws_secret_access_key=<secret-‐access-‐key>' delimiter=',';
AWS Gaming Solutions | GDC 2014
AWS Gaming Solutions | GDC 2014 Funnel Cake
AWS Gaming Solutions | GDC 2014 Back To Basics 2014-‐01-‐24,nateware,e4df,login 2014-‐01-‐24,nateware,e4df,gamestart 2014-‐01-‐24,nateware,e4df,gameend 2014-‐01-‐25,nateware,a88c,login 2014-‐01-‐25,nateware,a88c,friendlist 2014-‐01-‐25,nateware,a88c,gamestart
AWS Gaming Solutions | GDC 2014 Measure Retention: Repeated Plays create view events_by_user_by_month as select user_id, date_trunc('month', event_date) as month_active, count(*) as total_events from events group by user_id, month_active;
AWS Gaming Solutions | GDC 2014 First-Pass Retention – Too Noisy 0 5 10 15 20 25 30 35 40 # Play Sessions / Month nateware Lazyd0g AK187 3strikes
AWS Gaming Solutions | GDC 2014 Cohorts & Cambria • Enables calculating relative metrics • Group users by a common attribute – Month game installed – Demographics • Run analysis by cohort – Join with metrics • Use Redshift as it's SQL – Example of where SQL is a good fit
AWS Gaming Solutions | GDC 2014 Creating Cohorts with Redshift create view cohort_by_first_event_date as select user_id, date_trunc('month', min(event_date)) as first_month from events group by user_id; http://snowplowanalytics.com/analytics/customer- analytics/cohort-analysis.html
AWS Gaming Solutions | GDC 2014 Retention by Cohort – Join Events with Cohort 0 5 10 15 20 25 Week 1 Week 2 Week 3 Week 5 Week 6 Week 7 # Sessions / Week 2013-11 2013-12 2014-01 2014-02 2014-03 2014-04
AWS Gaming Solutions | GDC 2014 Moar Cohorts • Define multiple cohorts – By activity, time, demographics – As many as you like • Change cohort depending on analysis • Join same metrics with different cohorts – Retention by date – Retention by demographic – Retention by average plays/month quartile
AWS Gaming Solutions | GDC 2014 Example Event Stream 2014-‐03-‐17T09:52:08-‐07:00,nateware,e4b5,login 2014-‐03-‐17T09:52:54-‐07:00,nateware,e4b5,gamestart 2014-‐03-‐17T09:53:15-‐07:00,nateware,e4b5,levelup 2014-‐03-‐17T09:54:06-‐07:00,nateware,e4b5,gameend 2014-‐03-‐17T09:54:23-‐07:00,nateware,30a4,gamestart 2014-‐03-‐17T09:55:14-‐07:00,nateware,30a4,gameend 2014-‐03-‐17T09:55:41-‐07:00,nateware,30a4,gamestart 2014-‐03-‐17T09:57:12-‐07:00,nateware,6ebd,levelup 2014-‐03-‐17T09:58:50-‐07:00,nateware,6ebd,levelup 2014-‐03-‐17T09:59:52-‐07:00,nateware,6ebd,gameend
AWS Gaming Solutions | GDC 2014 Example Event Stream 2014-‐03-‐17T09:52:08-‐07:00,nateware,e4b5,login 2014-‐03-‐17T09:52:54-‐07:00,nateware,e4b5,gamestart 2014-‐03-‐17T09:53:15-‐07:00,nateware,e4b5,levelup 2014-‐03-‐17T09:54:06-‐07:00,nateware,e4b5,gameend 2014-‐03-‐17T09:54:23-‐07:00,nateware,30a4,gamestart 2014-‐03-‐17T09:55:14-‐07:00,nateware,30a4,gameend 2014-‐03-‐17T09:55:41-‐07:00,nateware,30a4,gamestart 2014-‐03-‐17T09:57:12-‐07:00,nateware,6ebd,levelup 2014-‐03-‐17T09:58:50-‐07:00,nateware,6ebd,levelup 2014-‐03-‐17T09:59:52-‐07:00,nateware,6ebd,gameend
AWS Gaming Solutions | GDC 2014 Cohorts by Type of Activity create view cohort_by_first_play_date as select user_id, date_trunc('month', min(event_date)) as first_month from events where action = 'gamestart' group by user_id;
AWS Gaming Solutions | GDC 2014
AWS Gaming Solutions | GDC 2014 Post-Match Heatmaps
AWS Gaming Solutions | GDC 2014 Real-Time Analytics Batch • What game modes do people like best? • How many people have downloaded DLC pack 2? • Where do most people die on map 4? • How many daily players are there on average? Real-Time • What game modes are people playing now? • Are more or less people downloading DLC today? • Are people dying in the same places? Different? • How many people are playing today? Variance?
AWS Gaming Solutions | GDC 2014 Why Real-Time Analytics? 30x in 24 hours What if you ran a promo?
AWS Gaming Solutions | GDC 2014 Real-Time Tools Spark • High-Performance Hadoop Alternative • Berkeley.edu • Compatible with HiveQL • 100x faster than Hadoop • Runs on EMR Kinesis • Amazon fully-managed streaming data layer • Similar to Kafka • Streams contain Shards • Each Shard ingests data up to 1MB/sec, 1000 TPS • Data stored for 24 hours
AWS Gaming Solutions | GDC 2014 • Always Batch Due to S3 Back To Basics [Dubstep Remix] EC2
AWS Gaming Solutions | GDC 2014 • Stream Data With Kinesis • Multiple Writers and Readers • Still Output to Redshift Need Data Faster! EC2
AWS Gaming Solutions | GDC 2014 • Stream Data With Kinesis • Multiple Writers and Readers • Still Output to Redshift • Stream to Spark on EMR • Storm via Kinesis Spout • Custom EC2 Workers Lots of Ins and Outs EC2 EC2
AWS Gaming Solutions | GDC 2014 Data Sources App.4 [Machine Learning] AWS Endpoint App.1 [Aggregate & De-‐Duplicate] Data Sources Data Sources Data Sources App.2 [Metric Extrac=on] S3 DynamoDB Redshift App.3 [Sliding Window Analysis] Data Sources Availability Zone Shard 1 Shard 2 Shard N Availability Zone Availability Zone Introducing Amazon Kinesis Service for Real-Time Big Data Ingestion
AWS Gaming Solutions | GDC 2014 Putting Data into Kinesis • Producers use PUT to send data to a Stream • PutRecord {Data, PartitionKey, StreamName} • Partition Key distributes PUTs across Shards • Unique Sequence # returned on PUT call • Documentation: http://docs.aws.amazon.com/kinesis/latest/dev/ introduction.html Producer Shard 1 Shard 2 Shard 3 Shard n Shard 4 Producer Producer Producer Producer Producer Producer Producer Producer Kinesis
AWS Gaming Solutions | GDC 2014 Writing to a Kinesis Stream POST / HTTP/1.1 Host: kinesis.<region>.<domain> x-‐amz-‐Date: <Date> Authorization: AWS4-‐HMAC-‐SHA256 Credential=<Credential>, SignedHeaders=content-‐ type;date;host;user-‐agent;x-‐amz-‐date;x-‐amz-‐target;x-‐amzn-‐requestid, Signature=<Signature> User-‐Agent: <UserAgentString> Content-‐Type: application/x-‐amz-‐json-‐1.1 Content-‐Length: <PayloadSizeBytes> Connection: Keep-‐Alive X-‐Amz-‐Target: Kinesis_20131202.PutRecord { "StreamName": "exampleStreamName", "Data": "XzxkYXRhPl8x", "PartitionKey": "partitionKey" }
AWS Gaming Solutions | GDC 2014 Kinesis + Spark http://aws.amazon.com/articles/4926593393724923
AWS Gaming Solutions | GDC 2014 Death in Real-Time PUT "kills" {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"274,591,48"} PUT "kills" {"game_id":"e4b5","map":"Boston","killer":13,"victim":27,"coord":"101,206,35"} PUT "kills" {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"165,609,17"} PUT "kills" {"game_id":"e4b5","map":"Boston","killer":6,"victim":29,"coord":"120,422,26"} PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":34,"victim":18,"coord":"163,677,18"} PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":20,"victim":37,"coord":"71,473,20"} PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":21,"victim":19,"coord":"332,381,17"} PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":0,"victim":10,"coord":"14,108,25"} PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":32,"victim":18,"coord":"13,685,32"} PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":7,"victim":14,"coord":"16,233,16"} PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":27,"victim":19,"coord":"16,498,29"} PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":1,"victim":38,"coord":"138,732,21"}
AWS Gaming Solutions | GDC 2014 Real-Time Heatmaps
AWS Gaming Solutions | GDC 2014 But A Bow On It • Collect data from the start • Store it even if you can't process it (yet) • Start simple – S3 + Redshift • Add data sources – process with EMR • Real-time – Kinesis + Spark • Tons of untapped potential for gaming
AWS Gaming Solutions | GDC 2014 Fallback Plan Cheers – Nate Wiger @nateware