Commit f52d0f5

committed

Update 2. json_to_pandas_df.ipynb

1 parent 57748d3 commit f52d0f5Copy full SHA for f52d0f5

File tree

1 file changed

+133

-6

lines changed

Module 2 - Python for Data Analysis/15. Working with JSON Data/2. Loading JSON to DF
- 2. json_to_pandas_df.ipynb

1 file changed

+133

-6

lines changed

`‎Module 2 - Python for Data Analysis/15. Working with JSON Data/2. Loading JSON to DF /2. json_to_pandas_df.ipynb`

Lines changed: 133 additions & 6 deletions

Original file line number	Diff line number	Diff line change
`@@ -9,18 +9,138 @@`
`9`	`9`	`"\n",`
`10`	`10`	`"### What's covered in this notebook?\n",`
`11`	`11`	`"\n",`
`12`		`- "1. Converting Flat JSON to DataFrame\n",`
	`12`	`+ "1. JSON Structures\n",`
	`13`	`+ " - Flat JSON\n",`
	`14`	`+ " - Nested JSON (Hierarchical JSON)\n",`
	`15`	`+ " - Multi-Level JSON (Deeply Nested JSON)\n",`
	`16`	`+ "2. Converting Flat JSON to DataFrame\n",`
`13`	`17`	`" - Using pd.DataFrame()\n",`
`14`	`18`	`" - Using pd.read_json()\n",`
`15`		`- "2. Handling Deeply Nested JSON Structures\n",`
	`19`	`+ "3. Handling Deeply Nested JSON Structures\n",`
`16`	`20`	`" - Normalizing Nested JSON Structures\n",`
`17`	`21`	`" - Normalizing Multi-Level JSON\n",`
`18`		`- "3. Few Examples\n",`
	`22`	`+ "4. Few Examples\n",`
`19`	`23`	`"\t- Example 1: Parse Students Data to Identify the Top Skill\n",`
`20`	`24`	`"\t- Example 2: Parse Customer Transactions from JSON and Generate Insights\n",`
`21`	`25`	`"\t- Example 3: Parse a Sample E-Commerce Order Data for Analysis"`
`22`	`26`	`]`
`23`	`27`	`},`
	`28`	`+ {`
	`29`	`+ "cell_type": "markdown",`
	`30`	`+ "id": "12be0c9f-4d62-4a0e-8cc0-af7dc75a18ab",`
	`31`	`+ "metadata": {},`
	`32`	`+ "source": [`
	`33`	`+ "## JSON Structures\n",`
	`34`	`+ "\n",`
	`35`	`+ "JSON structures vary depending on how data is organized. Here we will try to understand differences between Flat, Nested, and Multi-Level JSON structure.\n",`
	`36`	`+ "\n",`
	`37`	`+ "### Flat JSON\n",`
	`38`	`+ "- Simple structure with no nesting.\n",`
	`39`	`+ "- Each key directly maps to a value.\n",`
	`40`	`+ "- Easy to parse and analyze in tabular formats (like DataFrames).\n",`
	`41`	`+ "- Best for tabular databases (SQL, CSV).\n",`
	`42`	`+ "- How to Parse? - We can directly parse it using pd.DataFrame() or pd.read_json() in pandas.\n",`
	`43`	+ "```json\n",
	`44`	`+ "{\n",`
	`45`	`+ " \"student_id\": 101,\n",`
	`46`	`+ " \"name\": \"Alice Johnson\",\n",`
	`47`	`+ " \"age\": 21,\n",`
	`48`	`+ " \"is_active\": true,\n",`
	`49`	`+ " \"address_street\": \"123 Elm St\",\n",`
	`50`	`+ " \"address_city\": \"New York\",\n",`
	`51`	`+ " \"address_country\": \"USA\",\n",`
	`52`	`+ " \"skills\": \"Python, SQL, Machine Learning\"\n",`
	`53`	`+ "}\n",`
	`54`	`+ "\n",`
	`55`	+ "```\n",
	`56`	`+ "\n",`
	`57`	`+ "### Nested JSON (Hierarchical JSON)\n",`
	`58`	`+ "- Data is stored in hierarchical format.\n",`
	`59`	`+ "- Values can be objects or arrays instead of primitive types.\n",`
	`60`	`+ "- More suitable for NoSQL databases (like MongoDB).\n",`
	`61`	`+ "- How to Parse? - We can parse them using pd.json_normalize(data, sep=\"_\").\n",`
	`62`	+ "```json\n",
	`63`	`+ "{\n",`
	`64`	`+ " \"student_id\": 101,\n",`
	`65`	`+ " \"name\": \"Alice Johnson\",\n",`
	`66`	`+ " \"age\": 21,\n",`
	`67`	`+ " \"is_active\": true,\n",`
	`68`	`+ " \"address\": {\n",`
	`69`	`+ " \"street\": \"123 Elm St\",\n",`
	`70`	`+ " \"city\": \"New York\",\n",`
	`71`	`+ " \"country\": \"USA\"\n",`
	`72`	`+ " },\n",`
	`73`	`+ " \"skills\": [\"Python\", \"SQL\", \"Machine Learning\"], \n",`
	`74`	`+ " \"enrollment\": {\n",`
	`75`	`+ " \"course\": \"Data Science\",\n",`
	`76`	`+ " \"batch\": \"Spring 2025\",\n",`
	`77`	`+ " \"grades\": {\n",`
	`78`	`+ " \"assignments\": 92,\n",`
	`79`	`+ " \"quizzes\": 88,\n",`
	`80`	`+ " \"final_exam\": 95\n",`
	`81`	`+ " }\n",`
	`82`	`+ " }\n",`
	`83`	`+ "}\n",`
	`84`	`+ "\n",`
	`85`	+ "```\n",
	`86`	`+ "\n",`
	`87`	`+ "### Multi-Level JSON (Deeply Nested JSON)\n",`
	`88`	`+ "- Extends nested JSON by adding multiple levels of complexity.\n",`
	`89`	`+ "- Includes deeply nested objects and arrays within arrays.\n",`
	`90`	`+ "- More complex to parse and query.\n",`
	`91`	`+ "- How to Parse? - We can parse them using pd.json_normalize(data, record_path, meta_fields)\n",`
	`92`	`+ "\n",`
	`93`	`+ "\n",`
	`94`	+ "```json\n",
	`95`	`+ "{\n",`
	`96`	`+ " \"student_id\": 101,\n",`
	`97`	`+ " \"name\": \"Alice Johnson\",\n",`
	`98`	`+ " \"age\": 21,\n",`
	`99`	`+ " \"is_active\": true,\n",`
	`100`	`+ " \"contact\": {\n",`
	`101`	`+ " \"email\": \"alice@example.com\",\n",`
	`102`	`+ " \"phone\": \"+1-234-567-8901\"\n",`
	`103`	`+ " },\n",`
	`104`	`+ " \"addresses\": [\n",`
	`105`	`+ " {\n",`
	`106`	`+ " \"type\": \"Home\",\n",`
	`107`	`+ " \"street\": \"123 Elm St\",\n",`
	`108`	`+ " \"city\": \"New York\",\n",`
	`109`	`+ " \"country\": \"USA\"\n",`
	`110`	`+ " },\n",`
	`111`	`+ " {\n",`
	`112`	`+ " \"type\": \"Temporary\",\n",`
	`113`	`+ " \"street\": \"456 Oak Ave\",\n",`
	`114`	`+ " \"city\": \"Los Angeles\",\n",`
	`115`	`+ " \"country\": \"USA\"\n",`
	`116`	`+ " }\n",`
	`117`	`+ " ],\n",`
	`118`	`+ " \"skills\": [\n",`
	`119`	`+ " {\n",`
	`120`	`+ " \"name\": \"Python\",\n",`
	`121`	`+ " \"level\": \"Advanced\",\n",`
	`122`	`+ " \"certified\": true\n",`
	`123`	`+ " },\n",`
	`124`	`+ " {\n",`
	`125`	`+ " \"name\": \"SQL\",\n",`
	`126`	`+ " \"level\": \"Intermediate\",\n",`
	`127`	`+ " \"certified\": false\n",`
	`128`	`+ " }\n",`
	`129`	`+ " ],\n",`
	`130`	`+ " \"enrollment\": {\n",`
	`131`	`+ " \"course\": \"Data Science\",\n",`
	`132`	`+ " \"batch\": \"Spring 2025\",\n",`
	`133`	`+ " \"grades\": {\n",`
	`134`	`+ " \"assignments\": 92,\n",`
	`135`	`+ " \"quizzes\": 88,\n",`
	`136`	`+ " \"final_exam\": 95\n",`
	`137`	`+ " }\n",`
	`138`	`+ " }\n",`
	`139`	`+ "}\n",`
	`140`	`+ "\n",`
	`141`	+ "```"
	`142`	`+ ]`
	`143`	`+ },`
`24`	`144`	`{`
`25`	`145`	`"cell_type": "markdown",`
`26`	`146`	`"id": "8eee98a1-0ceb-4d94-8ca0-03fdf7e5c670",`
`@@ -486,9 +606,16 @@`
`486`	`606`	`"source": [`
`487`	`607`	`"### Normalizing Multi-Level JSON\n",`
`488`	`608`	`"\n",`
`489`		`- "pd.json_normalize(data, record_path, meta_fields) expands nested lists into rows.\n",`
	`609`	`+ "pd.json_normalize(data, record_path, meta) expands nested lists into rows. This is particularly useful when dealing with deeply nested JSON data.\n",`
	`610`	`+ "\n",`
	`611`	`+ "Syntax\n",`
	`612`	+ "```python\n",
	`613`	`+ "pd.json_normalize(data, record_path, meta)\n",`
	`614`	+ "```\n",
`490`	`615`	`"\n",`
`491`		`- "Note: \"record_path\" must be a list or null."`
	`616`	`+ "- \"data\" - The JSON like object (a dict or list of dict)\n",`
	`617`	`+ "- \"record_path\" - Must be a list or null.\n",`
	`618`	`+ "- \"meta\" - A list of keys whose values should be included as metadata"`
`492`	`619`	`]`
`493`	`620`	`},`
`494`	`621`	`{`
`@@ -680,7 +807,7 @@`
`680`	`807`	`],`
`681`	`808`	`"source": [`
`682`	`809`	`"# Normalize orders into a separate DataFrame\n",`
`683`		`- "df_orders = pd.json_normalize(data, record_path=\"orders\", meta=[\"id\", \"name\"])\n",`
	`810`	`+ "df_orders = pd.json_normalize(data, record_path=[\"orders\"], meta=[\"id\", \"name\"])\n",`
`684`	`811`	`"\n",`
`685`	`812`	`"df_orders"`
`686`	`813`	`]`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit f52d0f5

File tree

1 file changed

1 file changed

`‎Module 2 - Python for Data Analysis/15. Working with JSON Data/2. Loading JSON to DF /2. json_to_pandas_df.ipynb`

0 commit comments