Issue 120 — September 2, 2016
Featured
MyRocks: A Space- and Write-Optimized MySQL Storage Engine — Facebook’s Yoshinori Matsunobu looks at how deploying MyRocks (a key/value store forked from Google’s LevelDB) within a MySQL installation enabled a 50% reduction in storage requirements.
Facebook Code story
A Comparison of 10 Open Source Time Series Databases — The author looks at and ranks DalmatinerDB, InfluxDB, Prometheus, Riak TS, OpenTSDB, KairosDB, Elasticsearch, Druid, Blueflood and Graphite.
Steven Acreman
Tracking The Ever-Shifting Big Data Bottleneck — "Bottlenecks are a fact of life in IT. No matter how fast you build something, somebody will find a way to max it out." And it seems newer in-memory frameworks may make CPU shared caches the next bottleneck for big data systems.
Alex Woodie
Forget Technical Debt - Build Technical Wealth — First Round Review recently sat down with Andrea Goulet, CEO of Corgibytes, to discuss how "software remodeling"can help you pay down tech debt and tackle even the most difficult legacy codebases.
Corgibytes sponsored
Build a Raspberry Pi Hadoop Cluster to Run Spark on YARN — The Raspberry Pi isn’t the most natural platform for Hadoop or Spark work but it can provide a fun way to learn and this is a thorough walkthrough.
DQYDJ tutorial
Building A Recommendation Engine with AWS Data Pipeline, Elastic MapReduce and Spark — From Google’s advertisements to Amazon’s product suggestions, recommendation engines are everywhere.
Hubba tutorial
Apache Spark at Scale: A 60 TB+ Production Use Case — Facebook is a heavy user of analytics for data-driven decision making and this post looks at how they use Spark for doing real-time entity ranking at scale.
Facebook Code story
Jobs
Get 5+ Engineering Job Offers in 1 Week — With Hired, companies apply to hire you - get salary and equity offers before you interview from companies like Facebook, Postmates, & Square
Hired.com
In brief
Duo Labs news
Microsoft news
Daniel Bartholomew news
Installing DreamFactory from source on Ubuntu 14.0.4 LTS — DreamFactory auto-generates a rich API platform from any database. Here’s a quick tutorial for a common Linux flavor.
DreamFactory sponsored tutorial
Dan Robinson tutorial
Fred de Villamil tutorial
Periscope tutorial
Omar Bohsali tutorial
How A Japanese Cucumber Farmer Is Using Deep Learning and TensorFlow — Uses of machine learning and deep learning are only limited by our imaginations: A farmer can use deep learning to sort cucumbers. See how.
Google Developers story
How Mail.Ru Uses Tarantool to Help Scale Its Anti-Spam Service — Tarantool is a NoSQL database running in a Lua application server.
High Scalability story
Datanami story
Channel 9 video
Your Relational Database Management System Is Underutilized — If you’re just storing data, you’re missing out says the author.
Jason Porritt opinion
Spark Comparison: AWS vs. Google Cloud Platform — Assessing cost, performance, and run time of a typical Spark workload.
Michael Li and Ariel M'ndange-Pfupfu opinion
Email API from SendGrid — Reliably deliver your emails with a quick and simple API or SMTP integration. Try for Free
SendGrid sponsored tools