Codeberg-CI/feedback
490
71

Add limits to builds #22

Open
opened 2021年10月25日 15:45:17 +02:00 by davidak · 9 comments
Member
Copy link

The builds on the community CI should have some limits, so the resources are distributed in a fair way.

The could be a queue system and when there is only on job, it could have all resources, but when there are more, they have to get only a fraction of the whole resources.

Such a system should also work when we add more build systems, also of different types (e.g. ARM).

Not sure if upstream would implement it or if we need to build isolation around it. I think it would be good if the CI coordinates it in a centralized way across all build agend that may exist.

The builds on the community CI should have some limits, so the resources are distributed in a fair way. The could be a queue system and when there is only on job, it could have all resources, but when there are more, they have to get only a fraction of the whole resources. Such a system should also work when we add more build systems, also of different types (e.g. ARM). Not sure if upstream would implement it or if we need to build isolation around it. I think it would be good if the CI coordinates it in a centralized way across all build agend that may exist.
Owner
Copy link

The could be a queue system and when there is only on job, it could have all resources, but when there are more, they have to get only a fraction of the whole resources.

This won't really work as you don't want one build to reserve "all" space as in the next seconds, there could be another build which needs XY resources.

Ideally, each pipeline defines its requests (and limits) within the step definition. This is meanwhile possible in the k8s backend, but CB is using the docker backend. Again, ideally, one could hard-code the limits parts on the admin side and force users to define "requests". But even if this would be possible, handling this dynamically for different namespaces (e.g. specific orgs etc.) is so hard/complex, I don't think we'll ever get there.

But again, this first needs upstream work to allow setting resources in the first place for the docker backend. I'd vote to close here and first focus on the requests part for the backend before looking further.

> The could be a queue system and when there is only on job, it could have all resources, but when there are more, they have to get only a fraction of the whole resources. This won't really work as you don't want one build to reserve "all" space as in the next seconds, there could be another build which needs XY resources. Ideally, each pipeline defines its requests (and limits) within the step definition. This is meanwhile possible in the k8s backend, but CB is using the docker backend. Again, ideally, one could hard-code the limits parts on the admin side and force users to define "requests". But even if this would be possible, handling this dynamically for different namespaces (e.g. specific orgs etc.) is so hard/complex, I don't think we'll ever get there. But again, this first needs upstream work to allow setting resources in the first place for the docker backend. I'd vote to close here and first focus on the requests part for the backend before looking further.

Docker tends to allow a container to grab all CPUs, so if a build is parallel (and tries to grab all CPUs), it is up to the kernel to determine what to do.

A single container would get all CPUs. Two containers trying to grab all CPUs would get them, but now compete with each other per CPU.

You can limit individual containers: https://docs.docker.com/config/containers/resource_constraints/ - but it's not clear that this would actually be fairer than what the kernel does.

What is unfair (of sorts) is if in two builds, one is parallel and grabs all CPUs and the other uses a sequential build. Here, the parallel build wins. But you can't force the sequential build to use more CPUs, either, so really this isn't something that can be fixed.

The TL;DR is likely to keep letting the OS sort it out.

Docker tends to allow a container to grab all CPUs, so if a build is parallel (and tries to grab all CPUs), it is up to the kernel to determine what to do. A single container would get all CPUs. Two containers trying to grab all CPUs would get them, but now compete with each other per CPU. You can limit individual containers: https://docs.docker.com/config/containers/resource_constraints/ - but it's not clear that this would actually be fairer than what the kernel does. What *is* unfair (of sorts) is if in two builds, one is parallel and grabs all CPUs and the other uses a sequential build. Here, the parallel build wins. But you can't force the sequential build to use more CPUs, either, so really this isn't something that can be fixed. The TL;DR is likely to keep letting the OS sort it out.
Member
Copy link

What is unfair (of sorts) is if in two builds, one is parallel and grabs all CPUs and the other uses a sequential build. Here, the parallel build wins. But you can't force the sequential build to use more CPUs, either, so really this isn't something that can be fixed.

What do you mean by the parallel build winning? Will it likely get more CPU time overall? Yes. But due to fair scheduling in the Linux kernel the process group that only has a single process should usually get about a full core–essentially as much as it can use–compared to the the parallel build that will essentially get one core + leftovers.

The gist is that both process groups have essentially a right to the same amount of CPU time and that process that has used less of its time will be preferred during scheduling. As a single process usually can utilise less CPU time than multiple (unless they do a lot of blocking I/O) the kernel will always prefer that.

That's also why you can do something like a Borg backup even on a relatively weak machine without getting lag in the Desktop use of your system.

I fully concur with your TL;DR though. I was just confused by your conclusion on the winning process.

> What *is* unfair (of sorts) is if in two builds, one is parallel and grabs all CPUs and the other uses a sequential build. Here, the parallel build wins. But you can't force the sequential build to use more CPUs, either, so really this isn't something that can be fixed. What do you mean by the parallel build winning? Will it likely get more CPU time overall? Yes. But due to fair scheduling in the Linux kernel the process group that only has a single process should usually get about a full core–essentially as much as it can use–compared to the the parallel build that will essentially get one core + leftovers. The gist is that both process groups have essentially a right to the same amount of CPU time and that process that has used less of its time will be preferred during scheduling. As a single process usually can utilise less CPU time than multiple (unless they do a lot of blocking I/O) the kernel will always prefer that. That's also why you can do something like a Borg backup even on a relatively weak machine without getting lag in the Desktop use of your system. I fully concur with your TL;DR though. I was just confused by your conclusion on the winning process.

I was just trying to address the OPs concern by this phrasing of "winning". 🤷

I was just trying to address the OPs concern by this phrasing of "winning". 🤷
Owner
Copy link

Process scheduling is its own huge topic, in the end. Tons of approaches/schedulers exist aiming "to make it better".

Using "raw docker" for such distributed applications is certainly an issue at some point. Less for CPU, more for memory. Part of the motivation to use k8s in the end for larger environments, which is designed to provide a solution against resource overflow for isolated containers.

Telling users "to not use a lot of resources" and "hope for the best" won't get the job done anymore at some point.
The idea to use a 'decently' sized VM for (parallel) jobs, which anyone can trigger, no matter the size, is not something that will work forever without issues.
Being able to set limits on the admin side would certainly help any VM and its scheduler. E.g. right now 6 builds are allowed to run in parallel (no matter their resource needs). At the very least, admin-side limits should ensure that TOTAL_MEMORY/6 is the max memory which can be possibly requested, otherwise the machine will go down in the worst case.

(I am not talking about defining requests necessarily, as this concept is primarily targeted to k8s and its scheduling).

Process scheduling is its own huge topic, in the end. Tons of approaches/schedulers exist aiming "to make it better". Using "raw docker" for such distributed applications is certainly an issue at some point. Less for CPU, more for memory. Part of the motivation to use k8s in the end for larger environments, which is designed to provide a solution against resource overflow for isolated containers. Telling users "to not use a lot of resources" and "hope for the best" won't get the job done anymore at some point. The idea to use a 'decently' sized VM for (parallel) jobs, which anyone can trigger, no matter the size, is not something that will work forever without issues. Being able to set limits on the admin side would certainly help any VM and its scheduler. E.g. right now 6 builds are allowed to run in parallel (no matter their resource needs). At the very least, admin-side limits should ensure that TOTAL_MEMORY/6 is the max memory which can be possibly requested, otherwise the machine will go down in the worst case. (I am not talking about defining requests necessarily, as this concept is primarily targeted to k8s and its scheduling).

Also with docker, you can set memory limits. 🤷 The only thing k8s adds over docker is scaling across many machines. As long as your set of machines is limited, docker (+ swarm + compose) will work just fine. Mostly I mention this because k8s can quickly become a time sink.

All I want to stress is that the best resource scheduler is in the kernel.

This is also the case for memory. The containers don't allocate memory, processes within them do. Which means if they can't allocate anything, they'll OOM.

Sure, it can be that a process in container A exits because container B hogs so much memory and just happened to ask for it a bit sooner. But the flip side is that if you have a single process running, a TOTAL_MEMORY/ setting means there is no chance for a single build to use more even if it's running entirely alone. It's more predictable, but also more limiting.

That said, I don't really care what people choose here, since I'm not running the infrastructure. I'll unfollow this issue, you'll figure out what you want to do without me.

Also with docker, you can set memory limits. 🤷 The only thing k8s adds over docker is scaling across *many* machines. As long as your set of machines is limited, docker (+ swarm + compose) will work just fine. Mostly I mention this because k8s can quickly become a time sink. All I want to stress is that the best resource scheduler is in the kernel. This is also the case for memory. The containers don't allocate memory, processes within them do. Which means if they can't allocate anything, they'll OOM. Sure, it can be that a process in container A exits because container B hogs so much memory and just happened to ask for it a bit sooner. But the flip side is that if you have a single process running, a TOTAL_MEMORY/<parallel builds> setting means there is no chance for a single build to use more even if it's running entirely alone. It's more predictable, but also more limiting. That said, I don't really care what people choose here, since I'm not running the infrastructure. I'll unfollow this issue, you'll figure out what you want to do without me.
Owner
Copy link

@jfinkhaeuser Your input is appreciated, I hope I/we didn't scare you away. Such discussions are important.(and sorry for the ping again). In the end, everyone here just tries to make things better in their free time. It's not about who is right or wrong.

Yes, you can set limits for containers, which is what Ii also suggested earlier. Yet one cannot expect all users to set them for themselves unless we require the setting for builds to start in the first place. And if you set limits, you still don't have a guarantee to not exceed the VM resources if multiple builds are running. Which is what k8s in the end accounts for with requests but the kernel cannot, simply because it doesn't look out for a potential overflow of resources in the limit section (which is also what you partly said). And this is where kernel based scheduling is limited.

Everything we discuss here has been an issue for SAAS services providing free resources since ever. And when things become larger, you cannot rely on simply kernel scheduling anymore to prevent outages. That's all I am saying. CI works as is if everyone looks out a bit for their resource usage. But simply imagine what some people are doing on GitHub Actions - if you were to be the backend engineer having to ensure that everything "stays alive", kernel scheduling is not an option anymore.

@jfinkhaeuser Your input is appreciated, I hope I/we didn't scare you away. Such discussions are important.(and sorry for the ping again). In the end, everyone here just tries to make things better in their free time. It's not about who is right or wrong. Yes, you can set limits for containers, which is what Ii also suggested earlier. Yet one cannot expect all users to set them for themselves unless we require the setting for builds to start in the first place. And if you set limits, you still don't have a guarantee to not exceed the VM resources if multiple builds are running. Which is what k8s in the end accounts for with `requests` but the kernel cannot, simply because it doesn't look out for a potential overflow of resources in the `limit` section (which is also what you partly said). And this is where kernel based scheduling is limited. Everything we discuss here has been an issue for SAAS services providing free resources since ever. And when things become larger, you cannot rely on simply kernel scheduling anymore to prevent outages. That's all I am saying. CI works as is if everyone looks out a bit for their resource usage. But simply imagine what some people are doing on GitHub Actions - if you were to be the backend engineer having to ensure that everything "stays alive", kernel scheduling is not an option anymore.

@pat-s You didn't quite scare me away, no. I just don't see much value in continuing an argument I have little stake in.

Consider that I've been running SaaS since before it was called that, in a devops kind of role before it was called devops. So my perspective is coloured by how all of those things you now do with k8s were done before (and they all were). In fact, k8s only sets kernel limits; they're all "kernel scheduling", and have been for decades (aside: cgroups are a new-ish invention if you ignore what Solaris was doing, and they help manage these things in groups rather than for individual processes - that's a great win for manageability, but not functionality of limiting processes). The conclusion that follows is that any tooling should be able to provide much the same functionality (and in fact, docker does, or near enough). (There are also differences between what docker and k8s offer, but they mostly relate to having to manage many machines, not so much many containers, where "many" is enough that you don't give them names any longer)

But there's not much point in pushing in this direction if I'm not supposed to maintain the infrastructure. The main consideration there is what can the people who manage it do with the least amount of effort. It may be that k8s is the best choice. For me, it wouldn't be... I'm starting to tell customers that they should explore lighter choices, or choices that fit their use case better.

So, yeah, it makes more sense for me to leave this discussion to you guys.

@pat-s You didn't quite scare me away, no. I just don't see much value in continuing an argument I have little stake in. Consider that I've been running SaaS since before it was called that, in a devops kind of role before it was called devops. So my perspective is coloured by how all of those things you now do with k8s were done before (and they *all* were). In fact, k8s only sets kernel limits; they're all "kernel scheduling", and have been for decades (aside: cgroups are a new-ish invention if you ignore what Solaris was doing, and they help manage these things in groups rather than for individual processes - that's a great win for manageability, but not functionality of limiting processes). The conclusion that follows is that any tooling should be able to provide much the same functionality (and in fact, docker does, or near enough). (There are also differences between what docker and k8s offer, but they mostly relate to having to manage many *machines*, not so much many containers, where "many" is enough that you don't give them names any longer) But there's not much point in pushing in this direction if I'm not supposed to maintain the infrastructure. The main consideration there is what can the people who manage it do with the least amount of effort. It may be that k8s is the best choice. For me, it wouldn't be... I'm starting to tell customers that they should explore lighter choices, or choices that fit their use case better. So, yeah, it makes more sense for me to leave this discussion to you guys.
Owner
Copy link

I thought about this again.

While you can set limits on both k8s and docker (just to pick this thought up again), the main point of setting limits on k8s is to make life easier for the scheduler in terms of placing the job.
This is especially important when running multiple (small) machines, as otherwise the tasks run out of memory and also possibly affect other pods running on the same instance.

Now for Codeberg the situation is a bit different: there's only one (big) machine processing the builds. Hence, the job distribution part is not important here (right now).
The helpful part of setting resources would apply when so many jobs would get started at the same time that would demand more resources than available (especially in terms of memory), others would wait in the queue.
Right now, all are started and are competing against each other, no matter if CB can actually take the resource load.

Additionally, enabling this would require all users to set resources. Yet, it is not possible to enforce this right now in Woodpecker.

Resource limiting is partially addressed by the limit of jobs the agent is allowed to take in parallel (currently 9). This assumes that 9 jobs in parallel will not exceed the memory limit and holds true most of the time, but only applies if most of these jobs are "low resource" jobs.

I think for now (and the foreseeable short-to midterm future) we're good here and I wouldn't invest too much time (yet) into this, also because Woodpecker itself only supports setting resources for the k8s backend, not for the docker backend right now.
But once the load becomes larger and CB eventually needs to add additional agents, we need to think about this again.

I thought about this again. While you can set limits on both k8s and docker (just to pick this thought up again), the main point of setting limits on k8s is to make life easier for the scheduler in terms of placing the job. This is especially important when running multiple (small) machines, as otherwise the tasks run out of memory and also possibly affect other pods running on the same instance. Now for Codeberg the situation is a bit different: there's only one (big) machine processing the builds. Hence, the job distribution part is not important here (right now). The helpful part of setting resources would apply when so many jobs would get started at the same time that would demand more resources than available (especially in terms of memory), others would wait in the queue. Right now, all are started and are competing against each other, no matter if CB can actually take the resource load. Additionally, enabling this would require all users to set resources. Yet, it is not possible to enforce this right now in Woodpecker. Resource limiting is partially addressed by the limit of jobs the agent is allowed to take in parallel (currently 9). This assumes that 9 jobs in parallel will not exceed the memory limit and holds true most of the time, but only applies if most of these jobs are "low resource" jobs. I think for now (and the foreseeable short-to midterm future) we're good here and I wouldn't invest too much time (yet) into this, also because Woodpecker itself only supports setting resources for the k8s backend, not for the docker backend right now. But once the load becomes larger and CB eventually needs to add additional agents, we need to think about this again.
Sign in to join this conversation.
No Branch/Tag specified
main
No results found.
Milestone
Clear milestone
No items
No milestone
Projects
Clear projects
No items
No project
Assignees
Clear assignees
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Codeberg-CI/feedback#22
Reference in a new issue
Codeberg-CI/feedback
No description provided.
Delete branch "%!s()"

Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?