#342 — February 19, 2021
Database Weekly
Synthetic Data: Generating Realistic User Timestamps in SQL — How to generate synthetic timestamps with SQL that match realistic user behavior data versus purely picking random times.
Cedric Dussud
How Spotify Optimized the 'Largest Dataflow Job Ever' for Wrapped 2020 — Each year the music service Spotify puts together customized reports for every user which highlights the music they were listening to the most often, genres they liked, and similar tidbits. This involves a lot of data, and for Wrapped 2020 they used the Sort Merge Bucket (SMB) join technique to hugely reduce costs from previous years. Here’s how they pulled it off.
Spotify Engineering
An Operations-Free, Scalable & Flexible Postgres Alternative — Fauna combines the operational integrity and relational modeling of Postgres with an interface and architecture that fits better with modern app development in the cloud. The goodness of Postgres without its operational bottlenecks - Learn more about Fauna.
Fauna sponsor
European Patent Office Considers an Extension to SQL Patent-Eligible — This is quite a legalistic piece on how the European Patent Office has considered an SQL language extension for modifying collection-valued and scalar valued columns in a single statement to be ‘technical’.
Bastian Best
How Materialize and Other Databases Optimize SQL Subqueries — Not something you commonly think about while using subqueries (in my experience) but useful to know what helps database systems handle them better.
Jamie Brandon
Quick Bits
- TigerGraph has raised 105ドルM in Series C funding which ZDNet sees as positive for the graph database space generally.
- Google has announced Google Cloud Spanner support for the Django ORM.
- Databricks apps can now be run on Google Cloud.
- Cloud ETL provider Matillion has raised 100ドルm in a Series D round.
- The Apache Spark Connector for SQL Server and Azure SQL is now compatible with Spark 3.0.
💻 Jobs
Backend Developer - Remote or in Beautiful Norway — Do you have a passion for GraphQL, NodeJS, and message drive distributed architectures? Join our remote-first engineering team.
Crystallize
✅ And more..
Understanding SQL JOIN — A mental model to understand SQL’s JOIN.
Edward Loveall
Querying JSON Data in PostgreSQL — I’ve been working directly with querying JSON directly in Postgres a lot lately and it’s been.. rather pleasant! Nothing mindblowing in this post but it covers the basics if you haven’t dipped a toe into the JSONB column world yet.
Aaron Bos
Many Small Queries Are Efficient In SQLite — Making the point that while 200 SQL queries to put together a single Web page could be considered inefficient with many database systems, it’s nothing for SQLite.
SQLite Team
Best-Practices on How to Speed Up Your Postgres Queries. Free eBook — Learn how companies like Atlassian and CounterPath are able to speed up their queries by orders of magnitude. In this ebook, we share our best practices for optimizing Postgres performance.
pganalyze sponsor
The 20 Most-Visited Amazon DynamoDB Documentation Pages — It’s interesting to see AWS reveal information like this. Popular topics include how to work with DynamoDB Local, using global secondary indexes, and working with expressions.
Craig Liebendorfer
Simulating Latency with SQL / JDBC — "I’ve run across a fun little trick to simulate latency in your development environments when testing some SQL queries.."
Lukas Eder
Adminer: Database Management in a Single PHP File — A mature project that’s an alternative to phpMyAdmin for managing Postgres, MySQL, SQL Server, and other databases via a Web interface. The main selling point is it’s distributed as a single PHP file. v4.8.0 just dropped. GitHub repo.
Jakub Vrána