#344 — March 5, 2021
Database Weekly
sq: A General Swiss-Army Knife for Data — There are lot of ‘Swiss Army knife’-esque data 'multitools' around (such as xsv for CSV or jq for JSON) – think of this one as being like jq but for all sorts of formats and systems. You can import an Excel sheet into a Postgres table, move query results from SQL Server to SQLite, export an entire database to CSV.. all sorts of things. Many formats are supported and Postgres, SQLite, SQL Server and MySQL are the supported RDBMS so far.
Neil O'Toole
Clickhouse as an Alternative to ElasticSearch and MySQL — Clickhouse is an open-source column-oriented OLAP system and the author talks about how and, importantly, why his team are using it for log storage and analytics for their project.
Anton Sidashin
[Guide] How to Calculate The True Cost of a Database — Use this guide to add up your license costs, operational overhead costs, infrastructure costs and everything in between - so that you have a clear picture of what you're spending (and where you can spend less).
CockroachDB sponsor
Google Cloud Memorystore: It's Managed memcached — memcached is a long standing memory object caching system that was originally created at LiveJournal(!) and Google Cloud Platform has its own managed memcached system called Memorystore which is now generally available.
Google Cloud
A Look Back at 2011 and the Emergence of Hadoop — Datanami is a source we often link to as a strong news site in the data science and analytics space, and they’re celebrating their 10th birthday with a look back at how things were in 2011. They say Hadoop is almost a ‘dirty word’ today, but back in 2011, it was the cutting edge of ‘big data coolness..’
Datanami
Quick Bits
- Tumblr has given a little look behind the scenes at how they store post content.
- MongoDB and Google Cloud are continuing to work together to make MongoDB Atlas easier to deploy via the Google Cloud Marketplace.
- MySQL 5.6 has reached its end of life.
How to Use a Machine Learning Model from a Google Sheet using BigQuery ML — Spreadsheets aren’t going away any time soon, so any projects that involve bringing modern data science practices to spreadsheets immediately gets my interest.
Karl Weinmeister (Google Cloud)
Building a Recommendation Engine Inside Postgres with Python and Pandas — Learn how you can leverage Python and Pandas (a popular data analysis tool) from directly inside Postgres to build your own recommendation engine.
Craig Kerstiens
Creating Amazon Timestream Interpolated Views using Amazon Kinesis Data Analytics for Apache Flink — Sounds like a bit of buzzword bingo in the title, eh? The idea here is building a streaming data pipeline where aggregations are generated during ingestion to enable faster eventual querying by a dashboard (QuickSight, in this case).
Will Taff and John Gray (AWS)
Migrating to Aurora: Easy, Except The Bill — "Migrating our production database from Postgres to Aurora was easy, until we noticed that our daily database costs more than doubled." Luckily they went on to mitigate this, though they seem mildly ambivalent about the move.
Kimberly Nicholls
How to Efficiently Choose the Right Database for Your Apps — Note that this is on PingCAP’s blog (the creators of TiDB, an open source distributed SQL database) so be aware of the bias. Nonetheless, some reasonable questions are asked here and the author uses several databases in their systems (including MySQL, Redis, and Couchbase).
Leitao Guo
AWS Claims 'Better Performance for Less' versus Azure for SQL Server — Is this AWS tooting their own horn? Yes, it is. But it’s just another small step in their ongoing rivalry with Microsoft over running SQL Server workloads. It would be fun to see Azure’s comeback to this.
Fred Wurden (AWS)
Delivering an Even Better Redis Experience on Azure — We don’t want to be seen as biased, so let’s let Azure do a little cheerleading too – this post covers the new Enterprise level offering of Azure Cache for Redis (Azure’s managed Redis service).
Kyle Teegarden (Microsoft)
Building a Inventory Management System with Google BigQuery and Cloud Run
Aja Hammerly (Google Cloud)
🛠 Projects and Tools
OrbitDB: Peer to Peer Databases for the Decentralized Web — A serverless, distributed, peer-to-peer database that uses IPFS for storage and automatically syncs across peers. It’s limited to Node and browser use cases for now and here’s an introductory guide.
OrbitDB Community
Import a SQL Database to MongoDB in 5 Steps with Studio 3T Enterprise — The easiest way to import an entire SQL database to MongoDB is with Studio 3T and its innovative SQL Migration feature.
Studio 3T sponsor
TerminusDB 4.2: Open Source Graph Database and Document Store — TerminusDB is aimed at ‘knowledge base’ type use cases where things like immutability, data lineage and collaboration are important. It’s not your typical graph database, especially as it’s built in Prolog!
Luke Feeney
ScalarDB 3.0: A Java Library That Makes Non-ACID Distributed Databases ACID-Compliant — Scalar is not a database in its own right but a Java client that extends the functionality of other data stores you might be using (such as Cassandra).
Scalar
💻 Jobs
DevOps Engineer at X-Team (Remote) — Join the most energizing community for developers and work on projects for Riot Games, FOX, Sony, Coinbase, and more.
X-Team