Commit acd0117

committed

initialize structure and set up extract

1 parent 8bb56d1 commit acd0117Copy full SHA for acd0117

File tree

+179

-0

lines changed

+179

-0

lines changed

Lines changed: 6 additions & 0 deletions

Lines changed: 32 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,32 @@`
	`1`	`+import os`
	`2`	`+import sys`
	`3`	`+from datetime import datetime`
	`4`	`+`
	`5`	`+from airflow import DAG`
	`6`	`+from airflow.operators.python import PythonOperator`
	`7`	`+`
	`8`	`+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) # Fix ModuleNotFoundError`
	`9`	`+`
	`10`	`+from operators.contest_ranking_ops import extract_contest_ranking`
	`11`	`+`
	`12`	`+default_args = {`
	`13`	`+ "owner": "minhduc29",`
	`14`	`+ "depends_on_past": False,`
	`15`	`+ "start_date": datetime(2025, 1, 15)`
	`16`	`+}`
	`17`	`+`
	`18`	`+# Initialize DAG`
	`19`	`+dag = DAG(`
	`20`	`+ "contest_ranking_pipeline",`
	`21`	`+ default_args=default_args,`
	`22`	`+ schedule_interval=None,`
	`23`	`+ catchup=False`
	`24`	`+)`
	`25`	`+`
	`26`	`+# Extract raw data directly from API and store in local/cloud storage`
	`27`	`+extract = PythonOperator(`
	`28`	`+ task_id=f"extract_contest_ranking",`
	`29`	`+ python_callable=extract_contest_ranking,`
	`30`	`+ op_args=[4],`
	`31`	`+ dag=dag`
	`32`	`+)`

Lines changed: 17 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,17 @@`
	`1`	`+import pandas as pd`
	`2`	`+import requests`
	`3`	`+`
	`4`	`+from utils.queries import contest_ranking_query`
	`5`	`+from utils.constants import URL, OUTPUT_PATH`
	`6`	`+`
	`7`	`+`
	`8`	`+def extract_contest_ranking(num_pages):`
	`9`	`+ """Extracts raw data in all pages"""`
	`10`	`+ responses = []`
	`11`	`+ for i in range(num_pages):`
	`12`	`+ # Get response for each page`
	`13`	`+ response = requests.post(URL, json=contest_ranking_query(i + 1)).json()["data"]["globalRanking"]["rankingNodes"]`
	`14`	`+ responses.extend(response)`
	`15`	`+ file_path = f"{OUTPUT_PATH}/raw/sample_contest_ranking.csv" # Local file path for sample output data`
	`16`	`+ pd.DataFrame(responses).to_csv(file_path, index=False)`
	`17`	`+ return file_path`

Comments

(0)