|
9 | 9 | "\n",
|
10 | 10 | "### **What's covered in this notebook?**\n",
|
11 | 11 | "\n",
|
12 | | - "1. Converting Flat JSON to DataFrame\n", |
| 12 | + "1. JSON Structures\n", |
| 13 | + " - Flat JSON\n", |
| 14 | + " - Nested JSON (Hierarchical JSON)\n", |
| 15 | + " - Multi-Level JSON (Deeply Nested JSON)\n", |
| 16 | + "2. Converting Flat JSON to DataFrame\n", |
13 | 17 | " - Using pd.DataFrame()\n",
|
14 | 18 | " - Using pd.read_json()\n",
|
15 | | - "2. Handling Deeply Nested JSON Structures\n", |
| 19 | + "3. Handling Deeply Nested JSON Structures\n", |
16 | 20 | " - Normalizing Nested JSON Structures\n",
|
17 | 21 | " - Normalizing Multi-Level JSON\n",
|
18 | | - "3. Few Examples\n", |
| 22 | + "4. Few Examples\n", |
19 | 23 | "\t- Example 1: Parse Students Data to Identify the Top Skill\n",
|
20 | 24 | "\t- Example 2: Parse Customer Transactions from JSON and Generate Insights\n",
|
21 | 25 | "\t- Example 3: Parse a Sample E-Commerce Order Data for Analysis"
|
22 | 26 | ]
|
23 | 27 | },
|
| 28 | + { |
| 29 | + "cell_type": "markdown", |
| 30 | + "id": "12be0c9f-4d62-4a0e-8cc0-af7dc75a18ab", |
| 31 | + "metadata": {}, |
| 32 | + "source": [ |
| 33 | + "## **JSON Structures**\n", |
| 34 | + "\n", |
| 35 | + "JSON structures vary depending on how data is organized. Here we will try to understand differences between Flat, Nested, and Multi-Level JSON structure.\n", |
| 36 | + "\n", |
| 37 | + "### **Flat JSON**\n", |
| 38 | + "- Simple structure with no nesting.\n", |
| 39 | + "- Each key directly maps to a value.\n", |
| 40 | + "- Easy to parse and analyze in tabular formats (like DataFrames).\n", |
| 41 | + "- Best for tabular databases (SQL, CSV).\n", |
| 42 | + "- **How to Parse?** - We can directly parse it using pd.DataFrame() or pd.read_json() in pandas.\n", |
| 43 | + "```json\n", |
| 44 | + "{\n", |
| 45 | + " \"student_id\": 101,\n", |
| 46 | + " \"name\": \"Alice Johnson\",\n", |
| 47 | + " \"age\": 21,\n", |
| 48 | + " \"is_active\": true,\n", |
| 49 | + " \"address_street\": \"123 Elm St\",\n", |
| 50 | + " \"address_city\": \"New York\",\n", |
| 51 | + " \"address_country\": \"USA\",\n", |
| 52 | + " \"skills\": \"Python, SQL, Machine Learning\"\n", |
| 53 | + "}\n", |
| 54 | + "\n", |
| 55 | + "```\n", |
| 56 | + "\n", |
| 57 | + "### **Nested JSON (Hierarchical JSON)**\n", |
| 58 | + "- Data is stored in hierarchical format.\n", |
| 59 | + "- Values can be objects or arrays instead of primitive types.\n", |
| 60 | + "- More suitable for NoSQL databases (like MongoDB).\n", |
| 61 | + "- **How to Parse?** - We can parse them using pd.json_normalize(data, sep=\"_\").\n", |
| 62 | + "```json\n", |
| 63 | + "{\n", |
| 64 | + " \"student_id\": 101,\n", |
| 65 | + " \"name\": \"Alice Johnson\",\n", |
| 66 | + " \"age\": 21,\n", |
| 67 | + " \"is_active\": true,\n", |
| 68 | + " \"address\": {\n", |
| 69 | + " \"street\": \"123 Elm St\",\n", |
| 70 | + " \"city\": \"New York\",\n", |
| 71 | + " \"country\": \"USA\"\n", |
| 72 | + " },\n", |
| 73 | + " \"skills\": [\"Python\", \"SQL\", \"Machine Learning\"], \n", |
| 74 | + " \"enrollment\": {\n", |
| 75 | + " \"course\": \"Data Science\",\n", |
| 76 | + " \"batch\": \"Spring 2025\",\n", |
| 77 | + " \"grades\": {\n", |
| 78 | + " \"assignments\": 92,\n", |
| 79 | + " \"quizzes\": 88,\n", |
| 80 | + " \"final_exam\": 95\n", |
| 81 | + " }\n", |
| 82 | + " }\n", |
| 83 | + "}\n", |
| 84 | + "\n", |
| 85 | + "```\n", |
| 86 | + "\n", |
| 87 | + "### **Multi-Level JSON (Deeply Nested JSON)**\n", |
| 88 | + "- Extends nested JSON by adding multiple levels of complexity.\n", |
| 89 | + "- Includes deeply nested objects and arrays within arrays.\n", |
| 90 | + "- More complex to parse and query.\n", |
| 91 | + "- **How to Parse?** - We can parse them using pd.json_normalize(data, record_path, meta_fields)\n", |
| 92 | + "\n", |
| 93 | + "\n", |
| 94 | + "```json\n", |
| 95 | + "{\n", |
| 96 | + " \"student_id\": 101,\n", |
| 97 | + " \"name\": \"Alice Johnson\",\n", |
| 98 | + " \"age\": 21,\n", |
| 99 | + " \"is_active\": true,\n", |
| 100 | + " \"contact\": {\n", |
| 101 | + " \"email\": \"alice@example.com\",\n", |
| 102 | + " \"phone\": \"+1-234-567-8901\"\n", |
| 103 | + " },\n", |
| 104 | + " \"addresses\": [\n", |
| 105 | + " {\n", |
| 106 | + " \"type\": \"Home\",\n", |
| 107 | + " \"street\": \"123 Elm St\",\n", |
| 108 | + " \"city\": \"New York\",\n", |
| 109 | + " \"country\": \"USA\"\n", |
| 110 | + " },\n", |
| 111 | + " {\n", |
| 112 | + " \"type\": \"Temporary\",\n", |
| 113 | + " \"street\": \"456 Oak Ave\",\n", |
| 114 | + " \"city\": \"Los Angeles\",\n", |
| 115 | + " \"country\": \"USA\"\n", |
| 116 | + " }\n", |
| 117 | + " ],\n", |
| 118 | + " \"skills\": [\n", |
| 119 | + " {\n", |
| 120 | + " \"name\": \"Python\",\n", |
| 121 | + " \"level\": \"Advanced\",\n", |
| 122 | + " \"certified\": true\n", |
| 123 | + " },\n", |
| 124 | + " {\n", |
| 125 | + " \"name\": \"SQL\",\n", |
| 126 | + " \"level\": \"Intermediate\",\n", |
| 127 | + " \"certified\": false\n", |
| 128 | + " }\n", |
| 129 | + " ],\n", |
| 130 | + " \"enrollment\": {\n", |
| 131 | + " \"course\": \"Data Science\",\n", |
| 132 | + " \"batch\": \"Spring 2025\",\n", |
| 133 | + " \"grades\": {\n", |
| 134 | + " \"assignments\": 92,\n", |
| 135 | + " \"quizzes\": 88,\n", |
| 136 | + " \"final_exam\": 95\n", |
| 137 | + " }\n", |
| 138 | + " }\n", |
| 139 | + "}\n", |
| 140 | + "\n", |
| 141 | + "```" |
| 142 | + ] |
| 143 | + }, |
24 | 144 | {
|
25 | 145 | "cell_type": "markdown",
|
26 | 146 | "id": "8eee98a1-0ceb-4d94-8ca0-03fdf7e5c670",
|
|
486 | 606 | "source": [
|
487 | 607 | "### **Normalizing Multi-Level JSON**\n",
|
488 | 608 | "\n",
|
489 | | - "**pd.json_normalize(data, record_path, meta_fields)** expands nested lists into rows.\n", |
| 609 | + "**pd.json_normalize(data, record_path, meta)** expands nested lists into rows. This is particularly useful when dealing with deeply nested JSON data.\n", |
| 610 | + "\n", |
| 611 | + "**Syntax**\n", |
| 612 | + "```python\n", |
| 613 | + "pd.json_normalize(data, record_path, meta)\n", |
| 614 | + "```\n", |
490 | 615 | "\n",
|
491 | | - "**Note:** \"record_path\" must be a list or null." |
| 616 | + "- \"data\" - The JSON like object (a dict or list of dict)\n", |
| 617 | + "- \"record_path\" - Must be a list or null.\n", |
| 618 | + "- \"meta\" - A list of keys whose values should be included as metadata" |
492 | 619 | ]
|
493 | 620 | },
|
494 | 621 | {
|
|
680 | 807 | ],
|
681 | 808 | "source": [
|
682 | 809 | "# Normalize orders into a separate DataFrame\n",
|
683 | | - "df_orders = pd.json_normalize(data, record_path=\"orders\", meta=[\"id\", \"name\"])\n", |
| 810 | + "df_orders = pd.json_normalize(data, record_path=[\"orders\"], meta=[\"id\", \"name\"])\n", |
684 | 811 | "\n",
|
685 | 812 | "df_orders"
|
686 | 813 | ]
|
|
0 commit comments