Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit acd0117

Browse files
initialize structure and set up extract
1 parent 8bb56d1 commit acd0117

File tree

6 files changed

+179
-0
lines changed

6 files changed

+179
-0
lines changed

‎.gitignore‎

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
.idea
2+
.venv
3+
__pycache__
4+
logs
5+
airflow*
6+
webserver_config.py

‎dags/contest_ranking_dag.py‎

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
import os
2+
import sys
3+
from datetime import datetime
4+
5+
from airflow import DAG
6+
from airflow.operators.python import PythonOperator
7+
8+
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) # Fix ModuleNotFoundError
9+
10+
from operators.contest_ranking_ops import extract_contest_ranking
11+
12+
default_args = {
13+
"owner": "minhduc29",
14+
"depends_on_past": False,
15+
"start_date": datetime(2025, 1, 15)
16+
}
17+
18+
# Initialize DAG
19+
dag = DAG(
20+
"contest_ranking_pipeline",
21+
default_args=default_args,
22+
schedule_interval=None,
23+
catchup=False
24+
)
25+
26+
# Extract raw data directly from API and store in local/cloud storage
27+
extract = PythonOperator(
28+
task_id=f"extract_contest_ranking",
29+
python_callable=extract_contest_ranking,
30+
op_args=[4],
31+
dag=dag
32+
)

‎operators/contest_ranking_ops.py‎

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
import pandas as pd
2+
import requests
3+
4+
from utils.queries import contest_ranking_query
5+
from utils.constants import URL, OUTPUT_PATH
6+
7+
8+
def extract_contest_ranking(num_pages):
9+
"""Extracts raw data in all pages"""
10+
responses = []
11+
for i in range(num_pages):
12+
# Get response for each page
13+
response = requests.post(URL, json=contest_ranking_query(i + 1)).json()["data"]["globalRanking"]["rankingNodes"]
14+
responses.extend(response)
15+
file_path = f"{OUTPUT_PATH}/raw/sample_contest_ranking.csv" # Local file path for sample output data
16+
pd.DataFrame(responses).to_csv(file_path, index=False)
17+
return file_path

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /