#260 — June 28, 2019
Database Weekly
Using AWK and R to Parse 25TB of Data — This is a fun, practical look at several approaches taken to process a large data set, including the dead ends and lessons learned before settling on a reasonably ‘rustic’ solution.
Nick Strayer
MongoDB 4.2 Previewed at MongoDB World — The 4.2 release of the popular document-oriented database will be packed with new features - here’s a look at four of them, including distributed transactions and client-side field-level encryption. (Via our MongoDB-focused newsletter.)
Dj Walker-Morgan (MongoDB)
Studio 3T Makes SQL Migration to MongoDB, Powerfully Simple — Now you can import an entire SQL database to MongoDB using Studio 3T and its new SQL Migration feature.
Studio 3T sponsor
The Major Features Coming in PostgreSQL 12 — The much esteemed Postgres expert Bruce Momijan has released a simple slidedeck highlighting the most significant improvements and features coming to PostgreSQL 12, including JIT compilation and REINDEX CONCURRENTLY. It’ll only take you a minute to scan.
Bruce Momijan
The Cloud is Now the Default Platform for Databases, Gartner Says — The overall database market is growing, says Gartner, and cloud deployments are responsible for 68% of the growth, principally on AWS and Azure. While cloud deployments are rapidly becoming ‘default’, however, the cloud only accounts for 23% of the market’s revenue, so there’s still a long way to go.
Datanami
A New Redis Benchmark Hits 200 Million Ops/Sec — Redis Labs have pushed the enterprise version of Redis, the data structure store, up to 200 million operations per second with under 1 ms latency on 40 EC2 instances - a 4x improvement on a similar test last year.
Redis Labs
Analyze BigQuery Data with Kaggle Kernels Notebooks — Kaggle is a sort of social network/sharing platform for data scientists, Google acquired it in 2017, and it’s now integrated into BigQuery, enabling BigQuery users to use Kaggle’s neat analysis tools.
Google Cloud
IN BRIEF:
- Redis 6 is under development and should be released this December.
- MongoDB has unveiled Atlas Data Lake, a service for querying data stored in Amazon S3 buckets using MongoDB's query language.
- EnterpriseDB has been acquired by a private equity firm.
- MongoDB's CEO was on CNBC explaining to Jim Cramer why MongoDB has the edge over Oracle.
- Oracle has launched Autonomous Database Dedicated, an enterprise-focused service that provides autonomous database infrastructure management.
💻 Jobs
Senior Software Engineer (Santa Barbara or Remote) — Join a team where everyone is striving to constantly improve their knowledge of software development tools, practices, and processes.
Invoca
Find a DB Job on Vettery — Vettery specializes in tech roles and is completely free for job seekers.
Vettery
📒 Tutorials and Stories
▶ Advanced NoSQL Data Modeling with Amazon DynamoDB — The latest in a series of YouTube videos that dig deep into using DynamoDB, Amazon’s scalable NoSQL database. DynamoDB is quite unique in how it works so content like this is valuable if you plan to use it.
Amazon Web Services
An Introduction to Hypothetical Indexes in PostgreSQL — Why would you want to create imaginary indexes for Postgres’s optimizer to chew over? It’s a way to find out if an index would be useful before you endure the expense of creating a real one. SQL Server and Oracle can do this too.
Avinash Vallarapu
Building a Data Stream for IoT with NiFi & InfluxDB — Combining NiFi & InfluxDB results in secure, accessible, and usable IoT data streams. This solution enables a single data view across all facilities providing proactive maintenance, failure detection, and more.
InfluxData sponsor
Analyzing the Performance and Cost of Large-Scale Data Processing with AWS Lambda — A serverless approach isn’t applicable for every data analytics use case but the low TCO and flexibility of AWS Lambda has a lot going for it.
Amazon Web Services
SQLsmith: Randomized SQL Testing in CockroachDB — Randomized testing lets you automate the discovery of interesting test cases that would be difficult to come up with on your own and CockroachDB has adopted the idea for its ultra-resilient SQL database.
Matt Jibson (Cockroach Labs)
MongoDB's Plan to Stop Breaches With Dead Simple Database Encryption — MongoDB has been working on a new encryption scheme that should help keep customers’ data more secure.
Lily Hay Newman (Wired)
Spring Cleaning at OverOps: How (and Why) We Changed Our DB Cleaning Strategy — "after years of writing and executing code, our DB’s free disk space started to run out.." Here’s how they addressed the issue.
Aviv Danziger
RedisTimeSeries: A Redis Module for Working with Time Series Data — Redis is already a great fit for time series work but this introduces some cool new features like timed retention policies on streams, downsampling, and integration with other tools.
Redis Labs