I am building a CI/CD kind of tool which is actually a web app. The stack is django-rest-framework on the backend and React (Vite) on the frontend.
Currently the pipeline running and the actions from the stages that are running are all working, the jobs are being created, ran, deleted and it all works fine.
This is all happening through the Google Cloud Run Jobs.
For the app to make sense, I need to be able to get real-time logs from that specific job while it is running. Currently I have a crude temporary implementation where I am using the Google Cloud Logging API with filters to periodically get the logs for that job.
This is the mentioned crude implementation:
from google.cloud import logging_v2
from apps.docker_app.utils.google_credentials import get_google_credentials
def _stream_job_logs(logging_client, log_filter):
seen_entries = set()
while True:
entries = logging_client.list_entries(filter_=log_filter)
for entry in entries:
if entry.insert_id not in seen_entries:
seen_entries.add(entry.insert_id)
yield entry
time.sleep(1)
class GoogleLogsClient:
def __init__(self):
storage_credentials = get_google_credentials()
self.client = logging_v2.Client(credentials=storage_credentials)
def stream_job_logs(self, log_filter):
for entry in _stream_job_logs(
logging_client=self.client, log_filter=log_filter
):
if "All stages completed successfully" in entry.payload:
break
timestamp = entry.timestamp.isoformat()
print("* {}: {}".format(timestamp, entry.payload))
While this does work, I am limited to 60 read requests to the Google Cloud Logging API per minute and there is probably a general log limit as well.
As you can already probably assume, this will never work on longer running jobs, let alone be scalable for multiple users running multiple pipelines (jobs).
Is there a better way to implement this using some other GCP service or a way to circumvent the rate limit in an efficient and smart way?
Thank you everyone for any help on this at all -- this is kind of a blocker right now for the rest of the app.
-
You have live tail log with client libraries, but Python seems not yet available. cloud.google.com/logging/docs/view/…guillaume blaquiere– guillaume blaquiere2025年02月23日 19:54:35 +00:00Commented Feb 23, 2025 at 19:54
-
I have seen that, however, there, the limitation is 10 live sessions with tail log, which also doesn't work in my case, since that would mean only 10 pipelines could be ran at a time -- another scalability issue.4bs3nt– 4bs3nt2025年02月23日 20:29:07 +00:00Commented Feb 23, 2025 at 20:29
1 Answer 1
Try setting up a Log Sink to route logs to Pub/Sub. This will forward relevant logs (filtered by job name) to a Pub/Sub topic, allowing your Django backend to subscribe and process logs in real-time.
Alternatively, you can use a Log Sink to route logs to BigQuery for structured storage and analysis. Both approaches eliminate API rate limits.
6 Comments
Explore related questions
See similar questions with these tags.