1

I am building a CI/CD kind of tool which is actually a web app. The stack is django-rest-framework on the backend and React (Vite) on the frontend.

Currently the pipeline running and the actions from the stages that are running are all working, the jobs are being created, ran, deleted and it all works fine.

This is all happening through the Google Cloud Run Jobs.

For the app to make sense, I need to be able to get real-time logs from that specific job while it is running. Currently I have a crude temporary implementation where I am using the Google Cloud Logging API with filters to periodically get the logs for that job.

This is the mentioned crude implementation:

from google.cloud import logging_v2
from apps.docker_app.utils.google_credentials import get_google_credentials
def _stream_job_logs(logging_client, log_filter):
 seen_entries = set()
 while True:
 entries = logging_client.list_entries(filter_=log_filter)
 for entry in entries:
 if entry.insert_id not in seen_entries:
 seen_entries.add(entry.insert_id)
 yield entry
 time.sleep(1)
class GoogleLogsClient:
 def __init__(self):
 storage_credentials = get_google_credentials()
 self.client = logging_v2.Client(credentials=storage_credentials)
 def stream_job_logs(self, log_filter):
 for entry in _stream_job_logs(
 logging_client=self.client, log_filter=log_filter
 ):
 if "All stages completed successfully" in entry.payload:
 break
 timestamp = entry.timestamp.isoformat()
 print("* {}: {}".format(timestamp, entry.payload))

While this does work, I am limited to 60 read requests to the Google Cloud Logging API per minute and there is probably a general log limit as well.

As you can already probably assume, this will never work on longer running jobs, let alone be scalable for multiple users running multiple pipelines (jobs).

Is there a better way to implement this using some other GCP service or a way to circumvent the rate limit in an efficient and smart way?

Thank you everyone for any help on this at all -- this is kind of a blocker right now for the rest of the app.

asked Feb 23, 2025 at 19:04
2
  • You have live tail log with client libraries, but Python seems not yet available. cloud.google.com/logging/docs/view/… Commented Feb 23, 2025 at 19:54
  • I have seen that, however, there, the limitation is 10 live sessions with tail log, which also doesn't work in my case, since that would mean only 10 pipelines could be ran at a time -- another scalability issue. Commented Feb 23, 2025 at 20:29

1 Answer 1

3

Try setting up a Log Sink to route logs to Pub/Sub. This will forward relevant logs (filtered by job name) to a Pub/Sub topic, allowing your Django backend to subscribe and process logs in real-time.

Alternatively, you can use a Log Sink to route logs to BigQuery for structured storage and analysis. Both approaches eliminate API rate limits.

answered Feb 23, 2025 at 19:55
Sign up to request clarification or add additional context in comments.

6 Comments

I am trying to set up the Log Sink to route logs to Pub/Sub and then subscribe and get the logs. I have set that up so that it routes all logs for the cloud run jobs with job name pipeline-job-* to a topic and then I can subscribe to it and read it. This all works. My question is, how should I go about getting only the logs for a certain pipeline streamed to my django backend? The filter rules for what logs should be routed is set up once -- which is currently getting them all.
Or would I have to dynamically create a Log Sink, topic and subscription on each pipeline run to only get those logs? If so, would I hit any noticeable rate limits in that case?
Nevermind, I can just filter by the job_name that I get from the message received from the topic I subscribed to. Increasing the timeout a bit so I receive all the messages correctly. This all seems to work now and answered my question. Thank you!
Good idea and good answer. Keep in mind that PubSub does not guarantee the message order.
Is there a way to guarantee message order without doing something like manually ordering by the timestamps? I gotta comprehensively read the docs on this after my PoC implementation it seems.
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.