How to generate multiple tar.gz file by overriding specific files for each environment?

Question 1

I have a root folder Products and then bunch of sub-folders inside it. Each of those sub-folder has bunch of files as of now. Just for simplicity I came up with sub-folders name as folder{number} and files name as files{number}.json but in general they have different names.

In general I have 20 different sub-folders inside root folder and each sub-folder has around 30 files max.

(figure 1)

Products
├── folder1
│  ├── files1.json
│  ├── files2.json
│  └── files3.json
├── folder2
│  ├── files4.json
│  ├── files5.json
│  └── files6.json
└── folder3
 ├── files10.json
 ├── files7.json
 ├── files8.json
 └── files9.json

Now I am compressing all this into a tar.gz file by running below command -

tar cvzf ./products.tgz Products

Question:-

I got a new design as shown below where each sub-folder inside Products root folder has three environment folders in it - dev, stage and prod.

(figure 2)

Products
├── folder1
│  ├── dev
│  │  └── files1.json
│  ├── files1.json
│  ├── files2.json
│  ├── files3.json
│  ├── prod
│  │  └── files1.json
│  └── stage
│  └── files1.json
├── folder2
│  ├── dev
│  │  └── files5.json
│  ├── files4.json
│  ├── files5.json
│  ├── files6.json
│  ├── prod
│  │  └── files5.json
│  └── stage
│  └── files5.json
└── folder3
 ├── files10.json
 ├── files7.json
 ├── files8.json
 └── files9.json

For example - Inside folder1 sub-folder there are three more sub-folders dev, stage and prod and exactly same thing for other sub-folders folder2 and folder3. Each of those dev, stage and prod sub-folder inside folder{number} sub-folder will have files which are overridden for them.

I need to generate three different tar.gz file now - one for each dev, stage and prod from the above structure.

Whatever files I have inside dev, stage and prod they will override their subfolder files if it is present in their sub-folder (folder1, folder2 or folder3) also.
So if files1.json is present in folder1 sub-folder and same file also present inside any of dev, stage and prod then while packaging I need to use whatever it is present in their environment folder and override their sub-folder files otherwise just use whatever is present in their sub-folder(s).

At the end I will have 3 different structures like this - one for dev, one for stage and other for prod where folder1 (or 2 and 3) will have files accordingly what I have in their environment as first preference since they are overridden and other files which are not overridden.

(figure 3)

Products
├── folder1
│  ├── files1.json
│  ├── files2.json
│  └── files3.json
├── folder2
│  ├── files4.json
│  ├── files5.json
│  └── files6.json
└── folder3
 ├── files10.json
 ├── files7.json
 ├── files8.json
 └── files9.json

And I need to generate products-dev.gz, products-stage.gz and products-prod.gz from the figure 2 which will have data like figure 3 but specific to each environment. Only difference is each sub-folder folder1 (2 or 3) will have files which are overridden for them as first preference from their particular environment folder and rest will use from their sub-folder only.

Is this possible to do through some linux commands? Only confusion I have is how to overwrite specific environment files inside particular sub-folder and then generate 3 different tar.gz file in them.

Update:

Also consider cases like the below:

Products
├── folder1
│  ├── dev
│  │  ├── files1.json
│  │  └── files5.json
│  ├── files1.json
│  ├── files2.json
│  ├── files3.json
│  ├── prod
│  │  ├── files10.json
│  │  └── files1.json
│  └── stage
│  └── files1.json
├── folder2
│  ├── dev
│  ├── prod
│  └── stage
└── folder3
 ├── dev
 ├── prod
 └── stage

As you can see folder2 and folder3 has environment overriding folders but they don't have any files so in that case I want to generate empty folder2 and folder3 as well in each environment specific tar.gz file.

Question 2

Do you have to use that structure? Because it seems that having Production, Dev, Stage roots and then, inside each one the Product hierarchy you had up until now (with just the needed files) would make everything a lot easier to deal with.

Question 3

Yeah that can make things simpler but then I have to copy 3 different copies. Here I have a concept of default and overriden files for each environment. And also somehow my team wants this way only.

Question 4

There can be plenty of ways, though all require some kind of complexity in order to handle the override case.

As a one-liner, though a bit long, you could do like this for one iteration i.e. one "environments" directory:

(r=Products; e=stage; (find -- "$r" -regextype posix-extended -maxdepth 2 \( -regex '^[^/]+(/[^/]+)?' -o ! -type d \) -print0; find -- "$r" -mindepth 1 -path "$r/*/$e/*" -print0) | tar --null --no-recursion -czf "$r-$e.tgz" -T- --transform=s'%^\(\([^/]\{1,\}/\)\{2\}\)[^/]\{1,\}/%1円%')

broken down to observe it better:

(
 r=Products; e=stage
 (
 find -- "$r" -regextype posix-extended -maxdepth 2 \( -regex '^[^/]+(/[^/]+)?' -o ! -type d \) -print0
 find -- "$r" -mindepth 1 -path "$r/*/$e/*" -print0
 ) \
 | tar --null --no-recursion -czf "$r-$e.tgz" -T- \
 --transform=s'%^\(\([^/]\{1,\}/\)\{2\}\)[^/]\{1,\}/%1円%'
)

Things to note:

it shows GNU tools' syntax. For BSD find you must replace -regextype posix-extended with just -E and for BSD tar you must replace --no-recursion with just -n as well as --transform=s (<- note the final s) with just -s
for simplicity of demonstration the snippet assumes to be run from the directory containing Products, and uses the custom $e variable for the name of the "environments" directory to archive, while $r is just a short-named helper variable to contain the Products name
it is enclosed within parentheses, making it a subshell, just so as not to pollute your shell with $r and $e should you run it from the command-line
it does not copy nor link/refer to the original files, it does handle any valid filename, it has no memory constraints, and it can handle any amount of names; the only assumption is about the first two levels of the directories hierarchy in that any directory directly below the first level is considered an "environments" directory and thus ignored (except the one indicated in $e)

You could simply enclose that snippet in a for e in dev prod stage; do ...; done shell loop and just go. (possibly taking away the outermost parentheses and rather surround the entire for loop).

The upside is that it is quite short and relatively simple after all.

The downside is that it always archives also all the overridden files (i.e. the base ones), the trick being just that the double find commands feed tar with the to-be-overridden files first, and hence during extraction they will be overwritten by the overriding files (i.e. the "environments" specific ones). This leads to a bigger archive taking more time both during creation and during extraction, and might be undesirable depending on whether such "overhead" can be negligible or not.

That pipeline described in prose is:

(besides the outermost parentheses and the helper variables)
the first find command produces the list of non-specific files (and leading directories as per your update) only, while the second find produces the list of all environments-specific files only
the two find commands are within parentheses by themselves so that both their outputs feed the pipe to tar in sequence
tar reads such pipe in order to get the names of the files, and puts those files in the archive while also --transform-ing their names by eliminating the "environments" component (if present) from the path-name of each file
the two find commands are separated instead of being just one, and they are run one after the other, so that the non-specific files are produced (for tar to consume) before the environments-specific files, which enables the trick I described earlier

To avoid the overhead of including always all the files we need additional complexity in order to truly purge the overridden files. One way might be like below:

# still a pipeline, but this time I won't even pretend it to be a one-liner
(
r=Products; e=stage; LC_ALL=C
find -- "$r" -regextype posix-extended \( -path "$r/*/$e/*" -o \( -regex '^([^/]+/){2}[^/]+' ! -type d \) -o -regex '^[^/]+(/[^/]+)?' \) -print0 \
 | sed -zE '\%^(([^/]+/){2})([^/]+/)%s%%0/3円1円%;t;s%^%1//%' \
 | sort -zt/ -k 3 -k 1,1n \
 | sort -zut/ -k 3 \
 | sed -zE 's%^[01]/(([^/]+/)|/)(([^/]+/?){2})%3円2円%' \
 | tar --null --no-recursion -czf "$r-$e.tgz" -T- \
 --transform=s'%^\(\([^/]\{1,\}/\)\{2\}\)[^/]\{1,\}/%1円%'
)

Several things to note:

everything we said earlier regarding GNU and BSD syntaxes for find and tar applies here as well
like the previous solution, it has no constraints whatsoever besides the assumption about the first two levels of the directories hierarchy
I'm using GNU sed here in order to deal with nul-delimited I/O (option -z), but you could easily replace those two sed commands with e.g. a while read ... shell loop (Bash version 3 or greater would be required) or another language you feel confident with, the only recommendation being that the tool you use is able to handle nul-delimited I/O (e.g. GNU's gawk can do it); see below for a replacement using Bash loops
I use one single find here, as I'm not relying on any implied behavior from tar
The sed commands manipulate the list of names, paving the way for the sort commands
specifically, the first sed moves the "environments" name at the beginning of the path, also prefixing it with a helper 0 number just to make it sort before the non-environments files, as I'm prefixing these latter with a leading 1 for the purpose of sorting
such preparation normalizes the list of names in the "eyes" of the sort commands, making all names without the "environments" name and all having the same amount of slash-delimited fields at the beginning, which is important for sort's keys definitions
the first sort applies a sorting based first on the files' names, thus putting same names adjacent to each other, and then by numeric value of 0 or 1 as marked previously by the sed command, thus guaranteeing that any "environments" specific file, when present, comes before its non-specific counterpart
the second sort coalesces (option -u) on the files' names leaving only the first of duplicate names, which due to the previous reordering is always an "environments" specific file when present
finally, a second sed undoes what has been done by the first one, thus reshaping the file names for tar to archive

If you are curious to explore the intermediate pieces of such long pipeline, keep in mind that they all work with nul-delimited names, and hence do not show well on screen. You can pipe any one of the intermediate outputs (i.e. taking away at least the tar) to a courtesy tr '0円' '\n' to show a human-friendly output, just remember that filenames with newlines will span two lines on screen.

Several improvements could be done, certainly by making it a fully parameterized function/script, or for instance by detecting automatically any arbitrary name for "environments" directories, like below:

Important: pay attention to the comments as they may not be well accepted by an interactive shell

(
export r=Products LC_ALL=C
cd -- "$r/.." || exit
# make arguments out of all directories lying at the second level of the hierarchy
set -- "$r"/*/*/
# then expand all such paths found, take their basenames only, uniquify them, and pass them along xargs down to a Bash pipeline the same as above
printf %s\0円 "${@#*/*/}" \
 | sort -zu \
 | xargs -0I{} sh -c '
e="${1%/}"
echo --- "$e" ---
find -- "$r" -regextype posix-extended \( -path "$r/*/$e/*" -o \( -regex '\''^([^/]+/){2}[^/]+'\'' ! -type d \) -o -regex '\''^[^/]+(/[^/]+)?'\'' \) -print0 \
 | sed -zE '\''\%^(([^/]+/){2})([^/]+/)%s%%0/3円1円%;t;s%^%1//%'\'' \
 | sort -zt/ -k 3 -k 1,1n \
 | sort -zut/ -k 3 \
 | sed -zE '\''s%^[01]/(([^/]+/)|/)(([^/]+/?){2})%3円2円%'\'' \
 | tar --null --no-recursion -czf "$r-$e.tgz" -T- \
 --transform=s'\''%^\(\([^/]\{1,\}/\)\{2\}\)[^/]\{1,\}/%1円%'\''
' packetizer {}
)

Example replacement for the first sed command with a Bash loop:

(IFS=/; while read -ra parts -d $'0円'; do
 if [ "${#parts[@]}" -gt 3 ]; then
 env="${parts[2]}"; unset parts[2]
 printf 0/%s/%s\0円 "$env" "${parts[*]}"
 else
 printf 1//%s\0円 "${parts[*]}"
 fi
done)

For the second sed command:

(IFS=/; while read -ra parts -d $'0円'; do
 printf %s "${parts[*]:2:2}" "/${parts[1]:+${parts[1]}/}" "${parts[*]:4}"
 printf \0円
done)

Both snippets require the surrounding parentheses in order to be drop-in replacements for their respective sed commands within the pipeline above, and of course the sh -c piece after xargs needs to be turned into bash -c.

Question 5

@alecxs Unfortunately there simply is no POSIX tool that can deal with nul-terminated input, neither can BSD's awk and sed. However I've now turned those shell loops into a couple of sed -z commands. Probably even more cryptic than before but at least more compact

Question 6

@LL3 thanks it works fine now. One last question - Is there any way by which we can print total number of files count in each subfolder like folder1, folder2 and etc.? It doesn't need to be in same one liner script you had so I am ok doing it in separate line?

Question 7

@cs98 In the extended pipeline there should be all the trickiest operations needed to let you count the names going into the archive. It may be a matter of inserting a cut/sed towards a final uniq -c. Try researching U&L about counting names as there are plenty of excellent answers on that very common task. Else it may make material for another good question. BTW: please consider upvoting and/or accepting answer(s) you found useful, it is a "concrete" way to say thank you and gives first glance hints to future readers having similar problems.

Question 8

General solution

Make a copy of the directory tree. Hardlink the files to save space.
Modify the copy. (In case of hardlinks, you need to know what you can do safely. See below.)
Archive the copy.
Remove the copy.
Repeat (modifying differently) if needed.

Example

Limitations:

this example uses non-POSIX options (tested on Debian 10),
it makes some assumptions about the directory tree,
it can fail if there are too many files.

Treat it as a proof of concept, adjust it to your needs.

Making a copy

cd to the parent directory of Products. This directory, Products and everything within should belong to a single filesystem. Make a temporary directory and recreate Products there:
```
mkdir -p tmp
cp -la Products/ tmp/
```
Modifying the copy

Files in the two directory trees are hardlinked. If you modify their content then you will alter the original data. Operations that modify information held by directories are safe, they will not alter the original data if performed in the other tree. These are:
- removing files,
- renaming files,
- moving files around (this includes moving a file over another file with mv),
- creating totally independent files.
In your case for every directory named dev at the right depth move its contents one level up:
```
cd tmp/Products
dname=dev
find . -mindepth 2 -maxdepth 2 -type d -name "$dname" -exec sh -c 'cd "1ドル" && mv -f -- * ../' sh {} \;
```
Notes:
- mv -- * ../ is prone to argument list too long,
- by default * does not match dotfiles.
Then remove directories:
```
find . -mindepth 2 -maxdepth 2 -type d -exec rm -rf {} +
```
Note this removes the now empty dev and unneeded prod, stage; and any other directory at this depth.

Archiving the copy

# still in tmp/Products because of the previous step
cd ..
tar cvzf "products-$dname.tgz" Products

Removing the copy

# now in tmp because of the previous step
rm -rf Products

Repeating

Go back to the right directory and start over, this time with dname=stage; and so on.

Example script (quick and dirty)

#!/bin/bash
dir=Products
[ -d "$dir" ] || exit 1
mkdir -p tmp
for dname in dev prod stage; do
(
 cp -la "$dir" tmp/
 cd "tmp/$dir"
 [ "$?" -eq 0 ] || exit 1
 find . -mindepth 2 -maxdepth 2 -type d -name "$dname" -exec sh -c 'cd "1ドル" && mv -f -- * ../' sh {} \;
 find . -mindepth 2 -maxdepth 2 -type d -exec rm -rf {} +
 cd ..
 [ "$?" -eq 0 ] || exit 1
 tar cvzf "${dir,,}-$dname.tgz" "$dir"
 rm -rf "$dir" || exit 1
) || exit "$?"
done

Question 9

I made that bit more generic and working on non-trivial file names without actually changing the source directories

Products is given as argument. keywords dev prod stage are hard-coded inside script (but can easily changed)

Note: this is GNU specific --transform and -print0 -z extension

run the script
./script Products

#!/bin/sh
# environment
subdirs="dev prod stage"
# script requires arguments
[ -n "1ドル" ] || exit 1
# remove trailing /
while [ ${i:-0} -le $# ]
 do
 i=$((i+1))
 dir="1ドル"
 while [ "${dir#"${dir%?}"}" = "/" ]
 do
 dir="${dir%/}"
 done
 set -- "$@" "$dir"
 shift
done
# search string
for sub in $subdirs
 do
 [ -n "$search" ] && search="$search -o -name $sub" || search="( -name $sub"
done
search="$search )"
# GNU specific zero terminated handling for non-trivial directory names
excludes="$excludes $(find -L "$@" -type d $search -print0 | sed -z 's,[^/]*/,*/,g' | sort -z | uniq -z | xargs -0 printf '--exclude=%s\n')"
# for each argument
for dir in "$@"
 do
 # for each environment
 [ -e "$dir" ] || continue
 for sub in $subdirs
 do
 # exclude other subdirs
 exclude=$(echo "$excludes" | grep -v "$sub")
# # exclude files that exist in subdir (at least stable against newlines and spaces in file names)
# include=$(echo "$excludes" | grep "$sub" | cut -d= -f2)
# [ -n "$include" ] && files=$(find $include -mindepth 1 -maxdepth 1 -print0 | tr '\n[[:space:]]' '?' | sed -z "s,/$sub/,/," | xargs -0 printf '--exclude=%s\n')
# exclude="$exclude $files"
 # create tarball archive
 archive="${dir##*/}-${sub}.tgz"
 [ -f "$archive" ] && echo "WARNING: '$archive' is overwritten"
 tar --transform "s,/$sub,,ドル" --transform "s,/$sub/,/," $exclude -czhf "$archive" "$dir"
 done
done

You might notice duplicates inside archive. tar will recursively descend directories, on restore the deeper files will overwrite files on parent directory

However, that needs some more testing against consistent behavior (not sure about that). the proper way would be exlude files1.json + files5.json unfortunately -X doesn't work with --null

if you don't trust that behavior or don't want duplicate files in archives you can add some exclude for simple file names. uncomment the code above tar. newlines and whitespaces allowed in file names but will be excluded with wildcard ? in exclude pattern, which could in theory exclude more files than expected (if there are similar files matching that pattern)

you can place a echo before tar and you will see the script generates the following commands

tar --transform 's,/dev,,ドル' --transform 's,/dev/,/,' --exclude=*/*/prod --exclude=*/*/stage -czhf Products-dev.tgz Products
tar --transform 's,/prod,,ドル' --transform 's,/prod/,/,' --exclude=*/*/dev --exclude=*/*/stage -czhf Products-prod.tgz Products
tar --transform 's,/stage,,ドル' --transform 's,/stage/,/,' --exclude=*/*/dev --exclude=*/*/prod -czhf Products-stage.tgz Products

Question 10

if you uncomment the exclude files block you might get Argument list too long for too many files. pass excludes via -X index file in that case

LL3 LL3 5,5639 silver badges20 bronze badges · Accepted Answer · 2020-08-13 00:59:26Z

There can be plenty of ways, though all require some kind of complexity in order to handle the override case.

As a one-liner, though a bit long, you could do like this for one iteration i.e. one "environments" directory:

(r=Products; e=stage; (find -- "$r" -regextype posix-extended -maxdepth 2 \( -regex '^[^/]+(/[^/]+)?' -o ! -type d \) -print0; find -- "$r" -mindepth 1 -path "$r/*/$e/*" -print0) | tar --null --no-recursion -czf "$r-$e.tgz" -T- --transform=s'%^\(\([^/]\{1,\}/\)\{2\}\)[^/]\{1,\}/%1円%')

broken down to observe it better:

(
 r=Products; e=stage
 (
 find -- "$r" -regextype posix-extended -maxdepth 2 \( -regex '^[^/]+(/[^/]+)?' -o ! -type d \) -print0
 find -- "$r" -mindepth 1 -path "$r/*/$e/*" -print0
 ) \
 | tar --null --no-recursion -czf "$r-$e.tgz" -T- \
 --transform=s'%^\(\([^/]\{1,\}/\)\{2\}\)[^/]\{1,\}/%1円%'
)

Things to note:

it shows GNU tools' syntax. For BSD find you must replace -regextype posix-extended with just -E and for BSD tar you must replace --no-recursion with just -n as well as --transform=s (<- note the final s) with just -s
for simplicity of demonstration the snippet assumes to be run from the directory containing Products, and uses the custom $e variable for the name of the "environments" directory to archive, while $r is just a short-named helper variable to contain the Products name
it is enclosed within parentheses, making it a subshell, just so as not to pollute your shell with $r and $e should you run it from the command-line
it does not copy nor link/refer to the original files, it does handle any valid filename, it has no memory constraints, and it can handle any amount of names; the only assumption is about the first two levels of the directories hierarchy in that any directory directly below the first level is considered an "environments" directory and thus ignored (except the one indicated in $e)

You could simply enclose that snippet in a for e in dev prod stage; do ...; done shell loop and just go. (possibly taking away the outermost parentheses and rather surround the entire for loop).

The upside is that it is quite short and relatively simple after all.

The downside is that it always archives also all the overridden files (i.e. the base ones), the trick being just that the double find commands feed tar with the to-be-overridden files first, and hence during extraction they will be overwritten by the overriding files (i.e. the "environments" specific ones). This leads to a bigger archive taking more time both during creation and during extraction, and might be undesirable depending on whether such "overhead" can be negligible or not.

That pipeline described in prose is:

(besides the outermost parentheses and the helper variables)
the first find command produces the list of non-specific files (and leading directories as per your update) only, while the second find produces the list of all environments-specific files only
the two find commands are within parentheses by themselves so that both their outputs feed the pipe to tar in sequence
tar reads such pipe in order to get the names of the files, and puts those files in the archive while also --transform-ing their names by eliminating the "environments" component (if present) from the path-name of each file
the two find commands are separated instead of being just one, and they are run one after the other, so that the non-specific files are produced (for tar to consume) before the environments-specific files, which enables the trick I described earlier

To avoid the overhead of including always all the files we need additional complexity in order to truly purge the overridden files. One way might be like below:

# still a pipeline, but this time I won't even pretend it to be a one-liner
(
r=Products; e=stage; LC_ALL=C
find -- "$r" -regextype posix-extended \( -path "$r/*/$e/*" -o \( -regex '^([^/]+/){2}[^/]+' ! -type d \) -o -regex '^[^/]+(/[^/]+)?' \) -print0 \
 | sed -zE '\%^(([^/]+/){2})([^/]+/)%s%%0/3円1円%;t;s%^%1//%' \
 | sort -zt/ -k 3 -k 1,1n \
 | sort -zut/ -k 3 \
 | sed -zE 's%^[01]/(([^/]+/)|/)(([^/]+/?){2})%3円2円%' \
 | tar --null --no-recursion -czf "$r-$e.tgz" -T- \
 --transform=s'%^\(\([^/]\{1,\}/\)\{2\}\)[^/]\{1,\}/%1円%'
)

Several things to note:

everything we said earlier regarding GNU and BSD syntaxes for find and tar applies here as well
like the previous solution, it has no constraints whatsoever besides the assumption about the first two levels of the directories hierarchy
I'm using GNU sed here in order to deal with nul-delimited I/O (option -z), but you could easily replace those two sed commands with e.g. a while read ... shell loop (Bash version 3 or greater would be required) or another language you feel confident with, the only recommendation being that the tool you use is able to handle nul-delimited I/O (e.g. GNU's gawk can do it); see below for a replacement using Bash loops
I use one single find here, as I'm not relying on any implied behavior from tar
The sed commands manipulate the list of names, paving the way for the sort commands
specifically, the first sed moves the "environments" name at the beginning of the path, also prefixing it with a helper 0 number just to make it sort before the non-environments files, as I'm prefixing these latter with a leading 1 for the purpose of sorting
such preparation normalizes the list of names in the "eyes" of the sort commands, making all names without the "environments" name and all having the same amount of slash-delimited fields at the beginning, which is important for sort's keys definitions
the first sort applies a sorting based first on the files' names, thus putting same names adjacent to each other, and then by numeric value of 0 or 1 as marked previously by the sed command, thus guaranteeing that any "environments" specific file, when present, comes before its non-specific counterpart
the second sort coalesces (option -u) on the files' names leaving only the first of duplicate names, which due to the previous reordering is always an "environments" specific file when present
finally, a second sed undoes what has been done by the first one, thus reshaping the file names for tar to archive

If you are curious to explore the intermediate pieces of such long pipeline, keep in mind that they all work with nul-delimited names, and hence do not show well on screen. You can pipe any one of the intermediate outputs (i.e. taking away at least the tar) to a courtesy tr '0円' '\n' to show a human-friendly output, just remember that filenames with newlines will span two lines on screen.

Several improvements could be done, certainly by making it a fully parameterized function/script, or for instance by detecting automatically any arbitrary name for "environments" directories, like below:

Important: pay attention to the comments as they may not be well accepted by an interactive shell

(
export r=Products LC_ALL=C
cd -- "$r/.." || exit
# make arguments out of all directories lying at the second level of the hierarchy
set -- "$r"/*/*/
# then expand all such paths found, take their basenames only, uniquify them, and pass them along xargs down to a Bash pipeline the same as above
printf %s\0円 "${@#*/*/}" \
 | sort -zu \
 | xargs -0I{} sh -c '
e="${1%/}"
echo --- "$e" ---
find -- "$r" -regextype posix-extended \( -path "$r/*/$e/*" -o \( -regex '\''^([^/]+/){2}[^/]+'\'' ! -type d \) -o -regex '\''^[^/]+(/[^/]+)?'\'' \) -print0 \
 | sed -zE '\''\%^(([^/]+/){2})([^/]+/)%s%%0/3円1円%;t;s%^%1//%'\'' \
 | sort -zt/ -k 3 -k 1,1n \
 | sort -zut/ -k 3 \
 | sed -zE '\''s%^[01]/(([^/]+/)|/)(([^/]+/?){2})%3円2円%'\'' \
 | tar --null --no-recursion -czf "$r-$e.tgz" -T- \
 --transform=s'\''%^\(\([^/]\{1,\}/\)\{2\}\)[^/]\{1,\}/%1円%'\''
' packetizer {}
)

Example replacement for the first sed command with a Bash loop:

(IFS=/; while read -ra parts -d $'0円'; do
 if [ "${#parts[@]}" -gt 3 ]; then
 env="${parts[2]}"; unset parts[2]
 printf 0/%s/%s\0円 "$env" "${parts[*]}"
 else
 printf 1//%s\0円 "${parts[*]}"
 fi
done)

For the second sed command:

(IFS=/; while read -ra parts -d $'0円'; do
 printf %s "${parts[*]:2:2}" "/${parts[1]:+${parts[1]}/}" "${parts[*]:4}"
 printf \0円
done)

Both snippets require the surrounding parentheses in order to be drop-in replacements for their respective sed commands within the pipeline above, and of course the sh -c piece after xargs needs to be turned into bash -c.

@alecxs Unfortunately there simply is no POSIX tool that can deal with nul-terminated input, neither can BSD's awk and sed. However I've now turned those shell loops into a couple of sed -z commands. Probably even more cryptic than before but at least more compact
@LL3 thanks it works fine now. One last question - Is there any way by which we can print total number of files count in each subfolder like folder1, folder2 and etc.? It doesn't need to be in same one liner script you had so I am ok doing it in separate line?
@cs98 In the extended pipeline there should be all the trickiest operations needed to let you count the names going into the archive. It may be a matter of inserting a cut/sed towards a final uniq -c. Try researching U&L about counting names as there are plenty of excellent answers on that very common task. Else it may make material for another good question. BTW: please consider upvoting and/or accepting answer(s) you found useful, it is a "concrete" way to say thank you and gives first glance hints to future readers having similar problems.

Stack Exchange Network

How to generate multiple tar.gz file by overriding specific files for each environment?

3 Answers 3

General solution

Example

Example script (quick and dirty)

You must log in to answer this question.

Hot Network Questions

How to generate multiple tar.gz file by overriding specific files for each environment?

3 Answers 3

General solution

Example

Example script (quick and dirty)

You must log in to answer this question.

Related

Hot Network Questions