The Netflix SQL Project is a comprehensive data analysis project on Netflix’s movies and TV shows dataset. The project uses PostgreSQL to explore patterns, trends, and insights across various aspects of Netflix content.
Key highlights:
- Analyze content distribution: Movies vs TV Shows
 - Explore popular genres, ratings, actors, and directors
 - Identify top producing countries and content trends
 - Investigate content release trends over the last 5–10 years
 - Categorize content as family-friendly or violent using keyword analysis
 
This project demonstrates practical SQL skills including aggregate functions, string and date manipulation, conditional logic, and analytical queries.
- Content Analysis – Understand distribution by type, genre, and country
 - Popularity Analysis – Find common ratings, top actors, and directors
 - Trend Analysis – Explore content added over the years and release patterns
 - Content Classification – Categorize movies and TV shows based on description keywords
 
- Database: PostgreSQL
 - SQL Concepts Used:
- Aggregate functions (
COUNT,MAX,AVG) - Conditional queries (
CASE WHEN) - String manipulation (
STRING_TO_ARRAY,UNNEST,TRIM,SPLIT_PART) - Date handling (
TO_DATE,EXTRACT) - Ranking & window functions (
RANK() OVER(PARTITION BY ...)) - Filtering with 
WHERE,ILIKE,AND,OR 
 - Aggregate functions (
 
The dataset contains information about movies and TV shows on Netflix.
| Column Name | Description | 
|---|---|
show_id | 
Unique ID of the content | 
type | 
Movie or TV Show | 
title | 
Name of the content | 
director | 
Director name | 
cast | 
Actors in the content | 
country | 
Country of production | 
date_added | 
Date added to Netflix | 
release_year | 
Year of release | 
rating | 
Content rating (PG, R, etc.) | 
duration | 
Movie duration or number of seasons | 
listed_in | 
Genres | 
description | 
Brief summary of content | 
Dataset Download Link: Netflix Shows Dataset on Kaggle
Here’s an interactive summary of the queries and insights generated:
SELECT type, COUNT(*) AS number_of_movies FROM netflix GROUP BY type;
✅ Insight: Understand the distribution of movies and TV shows on Netflix.
SELECT type, rating FROM ( SELECT type, rating, COUNT(*), RANK() OVER(PARTITION BY type ORDER BY COUNT(*) DESC) AS ranking FROM netflix GROUP BY type, rating ) AS t1 WHERE ranking = 1;
✅ Insight: Find the most popular ratings (PG, TV-MA, etc.) in movies vs TV shows.
SELECT title FROM netflix WHERE type = 'Movie' AND release_year = 2020;
✅ Insight: Filter movies released in a specific year.
SELECT TRIM(UNNEST(STRING_TO_ARRAY(country, ','))) AS country, COUNT(show_id) AS total_content FROM netflix GROUP BY 1 ORDER BY total_content DESC LIMIT 5;
✅ Insight: Identify which countries produce the most Netflix content (handles multiple countries per row).
SELECT * FROM netflix WHERE type = 'Movie' AND duration = ( SELECT MAX(duration) FROM netflix WHERE type = 'Movie' );
✅ Insight: Find the movie with the longest runtime.
SELECT * FROM netflix WHERE TO_DATE(date_added, 'Month DD, YYYY') >= CURRENT_DATE - INTERVAL '5 years';
✅ Insight: Retrieve content that was added to Netflix in the last 5 years.
SELECT * FROM netflix WHERE director ILIKE '%Rajiv Chilaka%';
✅ Insight: List all content (movies & TV shows) directed by Rajiv Chilaka.
SELECT * FROM netflix WHERE type = 'TV Show' AND SPLIT_PART(TRIM(duration), ' ', 1)::numeric > 5;
✅ Insight: List TV shows having more than 5 seasons (uses TRIM to avoid spacing issues).
SELECT TRIM(UNNEST(STRING_TO_ARRAY(listed_in, ','))) AS genre, COUNT(show_id) AS total_count FROM netflix GROUP BY 1 ORDER BY total_count DESC;
✅ Insight: Get how many items belong to each genre (splits multi-genre rows).
SELECT EXTRACT(YEAR FROM TO_DATE(date_added, 'Month DD, YYYY')) AS year, COUNT(*) AS yearly_content, ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM netflix WHERE country ILIKE '%India%'), 2) AS percentage_content FROM netflix WHERE country ILIKE '%India%' AND date_added IS NOT NULL GROUP BY 1 ORDER BY 1;
✅ Insight: For each year, show total Indian content and that year’s percentage share of all Indian content.
SELECT * FROM netflix WHERE listed_in ILIKE '%documentaries%';
✅ Insight: Find movies that belong to documentary genre(s).
SELECT * FROM netflix WHERE director IS NULL;
✅ Insight: Detect rows with missing director information (NULL or empty string).
SELECT * FROM netflix WHERE casts ILIKE '%Salman Khan%' AND type = 'Movie' AND release_year >= EXTRACT(YEAR FROM CURRENT_DATE) - 10;
✅ Insight: Count/list movies that include Salman Khan released in the last 10 years (inclusive).
SELECT TRIM(UNNEST(STRING_TO_ARRAY(casts, ','))) AS actor, COUNT(show_id) AS movies FROM netflix WHERE country ILIKE '%India%' GROUP BY 1 ORDER BY movies DESC LIMIT 10;
✅ Insight: Identify actors who appear most frequently in Indian movies on Netflix.
SELECT CASE WHEN description ILIKE '%kill%' OR description ILIKE '%violence%' THEN 'Bad_Content' ELSE 'Good_Content' END AS category, COUNT(*) AS total_count FROM netflix GROUP BY category;
✅ Insight: Classify content by presence of violent keywords in descriptions and count per category.
- Writing complex SQL queries with subqueries, aggregates, and window functions
 - Performing data cleaning and transformation using string and date functions
 - Categorizing data dynamically using CASE expressions
 - Handling multi-valued fields via 
STRING_TO_ARRAY+UNNEST(andTRIM) - Using PostgreSQL features like 
RANK(),TO_DATE(), and window functions 
This project demonstrates practical SQL-based data analysis on a real-world dataset (Netflix).
It is suitable for:
- Data analysts learning SQL
 - Candidates who want project examples for portfolios or GitHub
 - Anyone interested in exploring streaming-platform data using SQL
 
You can copy these queries into your PostgreSQL environment to reproduce results and refine them further for presentations, dashboards, or visualizations.