How to track external media (binaries) that multiple engineers will update?

Question 1

I'm not sure if this is the right forum for this--if not, please point me in the right direction. This will be a little long-winded due to the specific nature of my question, so I apologize in advance.

The system I work on includes both code and a large number of binary files (e.g.: .exe, .whl, .iso, etc.). We have a CI/CD release pipeline that packages the code, downloads the binaries from S3 into specific folders, zieverything together and uploads as one large package. The customer then downloads this package for their use. My team opted to store the binaries externally in S3 as opposed to in the repo with git LFS.

We have a single file in the root of the repo (I'll call it "binary tracker") with a list of all necessary binaries & the local folder/destination where those files should be stored in the final package. Our CI/CD pipeline uses this file to download the appropriate binaries and insert where needed.

Since there are so many (hundreds) binaries to track, certain collections of folders/binaries have been grouped into individual ISO files. That way a single ISO can be added to the binary tracker instead of the dozens/hundreds of files inside the ISO.

Throughout development, engineers may need to update, replace, modify some of the binaries. For example, they may be developing something to install some software so they need to add the setup.exe to the media (this is a disconnected environment, so we can't download at the customer site). They would make their code changes, ensure the setup.exe is uploaded to S3, and they would include in their PR the update to the binary tracker with their setup.exe binary.

A problem this has created is when an engineer needs to update a single setup.exe, they need to download the whole ISO of dozens of files, add their setup.exe, rebuild the ISO, and upload to S3.

Another problem we run into is there are multiple engineers working on features simultaneously that need to update the same ISO file.

To solve this problem, my team has decided to decompose those ISOs into individual folders, zip each of those folders, add a datetime in the filename (for versioning), and upload all the zips to S3 instead of as one larger ISO file. The idea is that instead of downloading and repackaging the whole ISO, the engineer could download the specific folder (zip) they need and update that.

I asked why can't we just keep all the files directly in S3 instead of packaging into zips and the reason given was that since there are hundreds, it would make our binary tracker too complicated to update/track.

I think this is a terrible idea. For one, S3 already has a versioning system so using the date in the filename is a bad way to track it. Additionally, managing all these zips doesn't really fix the issue, just alleviates it a bit. I just feel like this is a unnecessary update that won't fix an already terrible system.

Has anyone worked with a system like this? How do you track binaries that need to be delivered with the product? Does anyone have recommendations I can provide to my team as to how we can improve this system?

Edit to answer some of the questions:

Why can’t maven handle this?: We've never looked into Maven--can Maven support downloading from S3, REST APIs, running custom scripts (PowerShell, etc.)?
why you have all these versioned binaries that arent just compiled code / Are these binaries generated by your code, or externally?: The end product is a single large ZIP package which is then used to build out an IT datacenter. So the binaries include things like VMware VCSA ISO, Windows, RHEL, etc. OS ISOs, Cisco IOS binaries, and various other software that's used within this datacenter. These binaries are not code we developed & compiled, it's software that we're installing at the end location as part of a datacenter deployment. The piece we develop is various scripts & workflows to automate the deployment & configuration of all this software throughout the datacenter.
How do you handle conflicts between engineers updating the same binaries?: It's a problem, we run into conflicts all the time. Usually we just have the developers coordinate with each other to make sure all their different files are included in the same ISO.
does it have to be xml? Why not something human friendly like json or yaml?: We are currently using XML for the binary tracker--I would much rather use JSON/YML but we've been using the same system for a while & so many people on the team already know the one system so changes like that are challenging. But that's one change I'll be proposing since we're already changing the way things work.

Question 2

Is there any convention for where the files are uploaded in S3 or the CI pipelines that produce them that you could use to automatically generate a list of artifacts?

Question 3

No not really. I proposed having the S3 bucket structured exactly like the destination folder structure and use something like aws s3 sync. But the problem with that is we have multiple variants of the system which may reference the same S3 object but in a different destination location.

Question 4

I have asked why can't we just keep all the files directly in S3 instead of packaging into zips and the reason given was that since there are hundreds, it would make our binary tracker too complicated to update/track." - I think you have already asked the right question and should ask again: what - precisely - is so complicated about it - and when you get a serious explanation, find a solution to work around this complications. Maybe your binary track file needs to be changed to have a hierarchical structure inside (some XML, for example). ...

Question 5

... or maybe using a single file is the problem - you could use individual files or scripts each one descibing the assembling of group of files or whole folders. But I think it quite hard to answer without seeing the real system.

Question 6

@candied_orange: I wrote "XML, for example", didn't you notice? JSON or XML is a matter of taste (and btw. I would prefer a format which allows comments inside, JSON does not), but the complexity problem is one which none of the two formats can solve better than the other - so lets focus on the question, not open a totally irrelevant discussion.

Question 7

We've never looked into Maven--can Maven support downloading from S3,

Yes.

The CloudStorage Maven plugin helps you with using various cloud buckets as a private Maven repository. Recently, CloudStorageMaven for S3 got a huge upgrade, and you can use it in order to download or upload files from S3, by using it as a plugin.

dzone.com - upload and download files to s3 using maven

REST APIs,

It seems so.

stackoverflow.com - maven plugin to call or invoke a rest web service
stackoverflow.com - maven dependencies for rest api jersey glassfish or not

running custom scripts (PowerShell,

Affirmative.

stackoverflow.com - how to run powershell script in maven

etc.)?

Well I assume yes, but don't do that unless it's here.

c2.com - EtcLanguage

The end product is a single large ZIP

Maven can handle creating zips as well.

stackoverflow.com create a zip archive with maven

We've made a big fuss about Maven but there are other build tools as well. What you really need is someone who is skilled in DevOps. That whole field lives to solve exactly these kinds of problems. They are skilled at more than the needed tools. They know how to work with the people who are doing this stuff manually and create processes that will automate the bulk of this work.

score 2 · Answer 1 · 2025-01-08 01:10:19Z

We've never looked into Maven--can Maven support downloading from S3,

Yes.

The CloudStorage Maven plugin helps you with using various cloud buckets as a private Maven repository. Recently, CloudStorageMaven for S3 got a huge upgrade, and you can use it in order to download or upload files from S3, by using it as a plugin.

dzone.com - upload and download files to s3 using maven

REST APIs,

It seems so.

stackoverflow.com - maven plugin to call or invoke a rest web service
stackoverflow.com - maven dependencies for rest api jersey glassfish or not

running custom scripts (PowerShell,

Affirmative.

stackoverflow.com - how to run powershell script in maven

etc.)?

Well I assume yes, but don't do that unless it's here.

c2.com - EtcLanguage

The end product is a single large ZIP

Maven can handle creating zips as well.

stackoverflow.com create a zip archive with maven

We've made a big fuss about Maven but there are other build tools as well. What you really need is someone who is skilled in DevOps. That whole field lives to solve exactly these kinds of problems. They are skilled at more than the needed tools. They know how to work with the people who are doing this stuff manually and create processes that will automate the bulk of this work.

Stack Exchange Network

How to track external media (binaries) that multiple engineers will update?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to track external media (binaries) that multiple engineers will update?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions