Create hierarchy of multiple levels where each node has a random number of children

Question 1

I need to create some test data that involves a hierarchy. I could make it easy and do a couple of CROSS JOINs, but that would give me a structure that is completely uniform / without any variation. That not only seems dull, but lack of variation in test data sometimes masks problems that would otherwise be found. So, I am wanting to generate a non-uniform hierarchy that follows these rules:

3 levels deep
- Level 1 is randomly 5 - 20 nodes
- Level 2 is 1 - 10 nodes, random per each node of Level 1
- Level 3 is 1 - 5 nodes, random per each node of Level 2
All branches will be 3 levels deep. Uniformity in depth is ok at this point.
There can be overlap in names of child nodes on any given level (i.e. names of child nodes do not need to be unique across all nodes on the same level).
The term "random" is defined here as being pseudo-random, not uniquely random. This needs to be mentioned since the term "random" is often used to mean "random ordering of a given set that does not produce duplicates". I accept that random = random and if the number of children per each node of Level 1 is only 4, 7, and 8, even across 20 nodes on Level 1 that has a potential spread of 1 - 10 children per each of those nodes, then that is fine, because that is what random is.
Even though this can be done quite easily with nested WHILE loops, the preference is to find a set-based approach. Generally speaking, generating test data does not have the requirements for efficiency that Production code would have, but shooting for a set-based approach will likely be more educational and help in the future with finding set-based approaches to problems. So WHILE loops are not ruled-out, but can only be used if no set-based approach is possible.
Set-based = ideally a single query, regardless of CTEs, APPLYs, etc. So using an existing or inline numbers table is fine. Using a WHILE / CURSOR / procedural approach will not work. I suppose staging portions of the data into temp tables or table variables fine, just so long as the operations are all set-based, no loops. However, that being said, a single-query approach will probably be favored over multiple queries, unless it can be shown that the multi-query approach is actually better. Please also keep in mind that what constitutes "better" is typically subjective ;-). Please also keep in mind that the usage of "typically" in the prior sentence is also subjective.
Any version and edition of SQL Server (2005 and newer, I suppose) will do.
Only pure T-SQL: none of that silly SQLCLR stuff!! At least in terms of generating the data. Creating the directories and files will be done using SQLCLR. But here I am just focusing on generating the values of what to create.
T-SQL Multi-statement TVF are considered procedural, not set-based, even though on the outside they mask the procedural approach in a set. There are times when that is absolutely appropriate. This is not one of those time. Along those same lines, T-SQL Scalar functions are also not allowed, not only because they are also procedural, but the Query Optimizer sometimes caches their value and repeats it such that the output is not as expected.
T-SQL Inline TVFs (a.k.a. iTVFs) are okey-dokey as they are set-based, and effectively the same as using [ CROSS | OUTER ] APPLY, which was stated above as being ok.
Repeated executions of the query(ies) should produce mostly different result from the prior run.
Clarification Update 1: The final result set should be expressed as having one row for each distinct node of Level3, having the full path starting at Level1. This means that the Level1 and Level2 values will necessarily repeat across one or more rows, except in cases of there being only a single Level2 node containing only a single Level3 node.
Clarification Update 2: There is a very strong preference for each node having a name or label, and not just a number. This will allow for the resulting test data being more meaningful and realistic.

I am not sure if this additional info matters, but just in case it helps to have some context, the test data relates to my answer on this question:

Import XML files into SQL Server 2012

While not relevant at this point, the end goal of generating this hierarchy is to create a directory structure to test recursive file system methods. Levels 1 and 2 will be directories, and Level 3 will end up being the file name. I have searched around (both here and via the Googles) and have only found one reference to generating a random hierarchy:

Linux: create random directory/file hierarchy

That question (on StackOverflow) is actually quite close in terms of desired result since that also seeks to create a directory structure for testing. But that question (and the answers) are focused on Linux/Unix shell scripting and not so much the set-based world that we live in.

Now, I know how to generate random data, and am doing so already to create the contents of the files so that they can also show variations. The tricky part here is that the number of elements within each set is random, not a particular field. And, the number of elements within each node needs to be random from other nodes on that same levels.

Example Hierarchy

 Level 1
 Level 3
|---- A
| |-- 1
| | |--- I
| |
| |-- 2
| |--- III
| |--- VI
| |--- VII
| |--- IX
|
|---- B
| |-- 87
| |--- AAA
| |--- DDD
|
|---- C
 |-- ASDF
 | |--- 11
 | |--- 22
 | |--- 33
 |
 |-- QWERTY
 | |--- beft
 |
 |-- ROYGBP
 |--- Poi
 |--- Moi
 |--- Soy
 |--- Joy
 |--- Roy

Example Result Set Describing the Hierarchy Above

Level 1 Level 2 Level 3
A 1 I
A 2 III
A 2 VI
A 2 VII
A 2 IX
B 87 AAA
B 87 DDD
C ASDF 11
C ASDF 22
C ASDF 33
C QWERTY beft
C ROYGBP Poi
C ROYGBP Moi
C ROYGBP Soy
C ROYGBP Joy
C ROYGBP Roy

Question 2

^{(O.P.'s note: preferred solution is the 4th / final code block)}

XML seems to me to be the obvious choice of data structure to use here.

with N as
(
 select T.N
 from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),
 (12),(13),(14),(15),(16),(17),(18),(19),(20)) as T(N)
)
select top(5 + abs(checksum(newid())) % 15)
 N1.N as '@Value',
 (
 select top(1 + abs(checksum(newid())) % 10)
 N2.N as '@Value',
 (
 select top(1 + abs(checksum(newid())) % 5)
 N3.N as '@Value'
 from N as N3
 where N2.N > 0
 for xml path('Level3'), type
 )
 from N as N2
 where N1.N > 0
 for xml path('Level2'), type
 )
from N as N1
for xml path('Level1'), root('Root');

The trick to make SQL Server use different values for top() for each node is to make the sub-queries correlated. N1.N > 0 and N2.N > 0.

Flatteing the XML:

declare @X xml;
with N as
(
 select T.N
 from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),
 (12),(13),(14),(15),(16),(17),(18),(19),(20)) as T(N)
)
select @X = (
 select top(5 + abs(checksum(newid())) % 15)
 N1.N as '@Value',
 (
 select top(1 + abs(checksum(newid())) % 10)
 N2.N as '@Value',
 (
 select top(1 + abs(checksum(newid())) % 5)
 N3.N as '@Value'
 from N as N3
 where N2.N > 0
 for xml path('Level3'), type
 )
 from N as N2
 where N1.N > 0
 for xml path('Level2'), type
 )
 from N as N1
 for xml path('Level1')
 );
select L1.X.value('@Value', 'varchar(10)')+'\'+
 L2.X.value('@Value', 'varchar(10)')+'\'+
 L3.X.value('@Value', 'varchar(10)')
from @X.nodes('/Level1') as L1(X)
 cross apply L1.X.nodes('Level2') as L2(X)
 cross apply L2.X.nodes('Level3') as L3(X);

And a version totally void of XML.

with N as
(
 select T.N
 from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),
 (12),(13),(14),(15),(16),(17),(18),(19),(20)) as T(N)
)
select cast(N1.N as varchar(10))+'\'+
 cast(N2.N as varchar(10))+'\'+
 cast(N3.N as varchar(10))
from (
 select top(5 + abs(checksum(newid())) % 15)
 N.N
 from N
 ) as N1
 cross apply
 (
 select top(1 + abs(checksum(newid())) % 10)
 N.N
 from N
 where N1.N > 0
 ) as N2
 cross apply
 (
 select top(1 + abs(checksum(newid())) % 5)
 N.N
 from N
 where N2.N > 0
 ) as N3;

Correlation N1.N > 0 and N2.N > 0 is still important.

A version using a table with 20 names to be used instead of just integers.

declare @Elements table
(
 Name nvarchar(50) not null
);
insert into @Elements(Name)
select top(20) C.name 
from sys.columns as C
group by C.name;
select N1.Name + N'\' + N2.Name + N'\' + N3.Name
from (
 select top(5 + abs(checksum(newid())) % 15)
 E.Name
 from @Elements as E
 ) as N1
 cross apply
 (
 select top(1 + abs(checksum(newid())) % 10)
 E.Name
 from @Elements as E
 where N1.Name > ''
 ) as N2
 cross apply
 (
 select top(1 + abs(checksum(newid())) % 5)
 E.Name
 from @Elements as E
 where N2.Name > ''
 ) as N3;

Question 3

I like the new version better. It is almost the same thing I came up with in my first attempt but for some reason I couldn't get the TOP(n) to work correctly in the 2 CROSS APPLYs. Not sure what I did differently / incorrectly since I got rid of that code once I got something else working. I will post that soon, now that you have provided this update. And I have cleaned up most of my comments above.

Question 4

I just posted my version. The main differences are: 1) since I couldn't get the TOP(n) to work, I went with getting n elements via a WHERE condition, and 2) I have the name component which is more controlled than randomizing directory and/or file names.

Question 5

Sorry to be away for so long, but I've been crazy busy. Still, I've been thinking about this and can't decide between my answer and your non-XML version. I like the simplicity & flexibility of yours, but need the ability to return names to use to create a folder structure, which mine has. Then I realized I had Vlad update his to have a lookup table and JOIN to it to give the ideal output. So, if not inappropriate to ask, could you please update yours to include that same lookup? Then all 3 answers would give equivalent output (ideal for comparing all 3), and I would accept yours. Is that ok?

Question 6

@srutzky I have updated the answer. Was a while ago so hope I got it right and what you where looking for. You can of course add a level column to @Elemets to get a different set of names for each level to choose from.

Question 7

@srutzky no worries. I'm glad the answer was helpful for you.

Question 8

That was interesting.

My aim was to generate given number of levels with random number of child rows per each level in a properly linked hierarchical structure. Once this structure is ready it is easy to add extra info into it like file and folder names.

So, I wanted to generate a classic table for storing a tree:

ID int NOT NULL
ParentID int NULL
Lvl int NOT NULL

Since we are dealing with recursion, recursive CTE seems a natural choice.

I will need a table of numbers. Numbers in the table should start from 1. There should be at least 20 numbers in the table: MAX(LvlMax).

CREATE TABLE [dbo].[Numbers](
 [Number] [int] NOT NULL,
CONSTRAINT [PK_Numbers] PRIMARY KEY CLUSTERED 
(
 [Number] ASC
));
INSERT INTO Numbers(Number)
SELECT TOP(1000)
 ROW_NUMBER() OVER(ORDER BY S.object_id) AS Number
FROM
 sys.all_objects AS S
ORDER BY Number;

Parameters for data generation should be stored in a table:

DECLARE @Intervals TABLE (Lvl int, LvlMin int, LvlMax int);
INSERT INTO @Intervals (Lvl, LvlMin, LvlMax) VALUES
(1, 5, 20),
(2, 1, 10),
(3, 1, 5);

Note, that the query is pretty flexible and all parameters are separated into one place. You can add more levels if needed, just add an extra row of parameters.

To make such dynamic generation possible I had to remember the random number of rows for the next level, so I have an extra column ChildRowCount.

Generating unique IDs is also somewhat tricky. I hard-coded the limit of 100 child rows per 1 parent row to guarantee that IDs don't repeat. This is what that POWER(100, CTE.Lvl) is about. As a result there are large gaps in IDs. That number could be a MAX(LvlMax), but I put constant 100 in the query for simplicity. The number of levels is not hard-coded, but is determined by @Intervals.

This formula

CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5

generates a random floating point number in the range [0..1), which is then scaled to the required interval.

The query logic is simple. It is recursive. First step generates a set of rows of the first level. Number of rows is determined by random number in TOP. Also, for each row there is a separate random number of child rows stored in ChildRowCount.

Recursive part uses CROSS APPLY to generate given number of child rows per each parent row. I had to use WHERE Numbers.Number <= CTE.ChildRowCount instead of TOP(CTE.ChildRowCount), because TOP is not allowed in recursive part of CTE. Didn't know about this limitation of SQL Server before.

WHERE CTE.ChildRowCount IS NOT NULL stops the recursion.

SQL Fiddle

WITH
CTE
AS
(
 SELECT 
 TOP(CAST(
 (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) * 
 (
 1 + (SELECT I.LvlMax FROM @Intervals AS I WHERE I.Lvl = 1)
 - (SELECT I.LvlMin FROM @Intervals AS I WHERE I.Lvl = 1)
 )
 + (SELECT I.LvlMin FROM @Intervals AS I WHERE I.Lvl = 1)
 AS int))
 Numbers.Number AS ID
 ,NULL AS ParentID
 ,1 AS Lvl
 ,CAST(
 (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) * 
 (
 1 + (SELECT I.LvlMax FROM @Intervals AS I WHERE I.Lvl = 2)
 - (SELECT I.LvlMin FROM @Intervals AS I WHERE I.Lvl = 2)
 )
 + (SELECT I.LvlMin FROM @Intervals AS I WHERE I.Lvl = 2)
 AS int) AS ChildRowCount
 FROM Numbers
 ORDER BY Numbers.Number
 UNION ALL
 SELECT
 CA.Number + CTE.ID * POWER(100, CTE.Lvl) AS ID
 ,CTE.ID AS ParentID
 ,CTE.Lvl + 1 AS Lvl
 ,CA.ChildRowCount
 FROM
 CTE
 CROSS APPLY
 (
 SELECT
 Numbers.Number
 ,CAST(
 (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) * 
 (
 1 + (SELECT I.LvlMax FROM @Intervals AS I WHERE I.Lvl = CTE.Lvl + 2)
 - (SELECT I.LvlMin FROM @Intervals AS I WHERE I.Lvl = CTE.Lvl + 2)
 )
 + (SELECT I.LvlMin FROM @Intervals AS I WHERE I.Lvl = CTE.Lvl + 2)
 AS int) AS ChildRowCount
 FROM Numbers
 WHERE Numbers.Number <= CTE.ChildRowCount
 ) AS CA
 WHERE
 CTE.ChildRowCount IS NOT NULL
)
SELECT *
FROM CTE
ORDER BY Lvl, ParentID, ID;

Result (there can be up to 20 + 20*10 + 200*5 = 1220 rows if you are lucky)

+---------+----------+-----+-------------------+
| ID | ParentID | Lvl | ChildRowCount |
+---------+----------+-----+-------------------+
| 1 | NULL | 1 | 3 |
| 2 | NULL | 1 | 1 |
| 3 | NULL | 1 | 6 |
| 4 | NULL | 1 | 5 |
| 5 | NULL | 1 | 3 |
| 6 | NULL | 1 | 7 |
| 7 | NULL | 1 | 1 |
| 8 | NULL | 1 | 6 |
| 101 | 1 | 2 | 3 |
| 102 | 1 | 2 | 5 |
| 103 | 1 | 2 | 1 |
| 201 | 2 | 2 | 5 |
| 301 | 3 | 2 | 4 |
| 302 | 3 | 2 | 5 |
| 303 | 3 | 2 | 1 |
| 304 | 3 | 2 | 2 |
| 305 | 3 | 2 | 4 |
| 306 | 3 | 2 | 3 |
| 401 | 4 | 2 | 3 |
| 402 | 4 | 2 | 1 |
| 403 | 4 | 2 | 2 |
| 404 | 4 | 2 | 2 |
| 405 | 4 | 2 | 4 |
| 501 | 5 | 2 | 1 |
| 502 | 5 | 2 | 3 |
| 503 | 5 | 2 | 5 |
| 601 | 6 | 2 | 2 |
| 602 | 6 | 2 | 5 |
| 603 | 6 | 2 | 3 |
| 604 | 6 | 2 | 3 |
| 605 | 6 | 2 | 4 |
| 606 | 6 | 2 | 5 |
| 607 | 6 | 2 | 4 |
| 701 | 7 | 2 | 2 |
| 801 | 8 | 2 | 2 |
| 802 | 8 | 2 | 3 |
| 803 | 8 | 2 | 3 |
| 804 | 8 | 2 | 3 |
| 805 | 8 | 2 | 5 |
| 806 | 8 | 2 | 2 |
| 1010001 | 101 | 3 | NULL |
| 1010002 | 101 | 3 | NULL |
| 1010003 | 101 | 3 | NULL |
| 1020001 | 102 | 3 | NULL |
| 1020002 | 102 | 3 | NULL |
| 1020003 | 102 | 3 | NULL |
| 1020004 | 102 | 3 | NULL |
| 1020005 | 102 | 3 | NULL |
| 1030001 | 103 | 3 | NULL |
| 2010001 | 201 | 3 | NULL |
| 2010002 | 201 | 3 | NULL |
| 2010003 | 201 | 3 | NULL |
| 2010004 | 201 | 3 | NULL |
| 2010005 | 201 | 3 | NULL |
| 3010001 | 301 | 3 | NULL |
| 3010002 | 301 | 3 | NULL |
| 3010003 | 301 | 3 | NULL |
| 3010004 | 301 | 3 | NULL |
| 3020001 | 302 | 3 | NULL |
| 3020002 | 302 | 3 | NULL |
| 3020003 | 302 | 3 | NULL |
| 3020004 | 302 | 3 | NULL |
| 3020005 | 302 | 3 | NULL |
| 3030001 | 303 | 3 | NULL |
| 3040001 | 304 | 3 | NULL |
| 3040002 | 304 | 3 | NULL |
| 3050001 | 305 | 3 | NULL |
| 3050002 | 305 | 3 | NULL |
| 3050003 | 305 | 3 | NULL |
| 3050004 | 305 | 3 | NULL |
| 3060001 | 306 | 3 | NULL |
| 3060002 | 306 | 3 | NULL |
| 3060003 | 306 | 3 | NULL |
| 4010001 | 401 | 3 | NULL |
| 4010002 | 401 | 3 | NULL |
| 4010003 | 401 | 3 | NULL |
| 4020001 | 402 | 3 | NULL |
| 4030001 | 403 | 3 | NULL |
| 4030002 | 403 | 3 | NULL |
| 4040001 | 404 | 3 | NULL |
| 4040002 | 404 | 3 | NULL |
| 4050001 | 405 | 3 | NULL |
| 4050002 | 405 | 3 | NULL |
| 4050003 | 405 | 3 | NULL |
| 4050004 | 405 | 3 | NULL |
| 5010001 | 501 | 3 | NULL |
| 5020001 | 502 | 3 | NULL |
| 5020002 | 502 | 3 | NULL |
| 5020003 | 502 | 3 | NULL |
| 5030001 | 503 | 3 | NULL |
| 5030002 | 503 | 3 | NULL |
| 5030003 | 503 | 3 | NULL |
| 5030004 | 503 | 3 | NULL |
| 5030005 | 503 | 3 | NULL |
| 6010001 | 601 | 3 | NULL |
| 6010002 | 601 | 3 | NULL |
| 6020001 | 602 | 3 | NULL |
| 6020002 | 602 | 3 | NULL |
| 6020003 | 602 | 3 | NULL |
| 6020004 | 602 | 3 | NULL |
| 6020005 | 602 | 3 | NULL |
| 6030001 | 603 | 3 | NULL |
| 6030002 | 603 | 3 | NULL |
| 6030003 | 603 | 3 | NULL |
| 6040001 | 604 | 3 | NULL |
| 6040002 | 604 | 3 | NULL |
| 6040003 | 604 | 3 | NULL |
| 6050001 | 605 | 3 | NULL |
| 6050002 | 605 | 3 | NULL |
| 6050003 | 605 | 3 | NULL |
| 6050004 | 605 | 3 | NULL |
| 6060001 | 606 | 3 | NULL |
| 6060002 | 606 | 3 | NULL |
| 6060003 | 606 | 3 | NULL |
| 6060004 | 606 | 3 | NULL |
| 6060005 | 606 | 3 | NULL |
| 6070001 | 607 | 3 | NULL |
| 6070002 | 607 | 3 | NULL |
| 6070003 | 607 | 3 | NULL |
| 6070004 | 607 | 3 | NULL |
| 7010001 | 701 | 3 | NULL |
| 7010002 | 701 | 3 | NULL |
| 8010001 | 801 | 3 | NULL |
| 8010002 | 801 | 3 | NULL |
| 8020001 | 802 | 3 | NULL |
| 8020002 | 802 | 3 | NULL |
| 8020003 | 802 | 3 | NULL |
| 8030001 | 803 | 3 | NULL |
| 8030002 | 803 | 3 | NULL |
| 8030003 | 803 | 3 | NULL |
| 8040001 | 804 | 3 | NULL |
| 8040002 | 804 | 3 | NULL |
| 8040003 | 804 | 3 | NULL |
| 8050001 | 805 | 3 | NULL |
| 8050002 | 805 | 3 | NULL |
| 8050003 | 805 | 3 | NULL |
| 8050004 | 805 | 3 | NULL |
| 8050005 | 805 | 3 | NULL |
| 8060001 | 806 | 3 | NULL |
| 8060002 | 806 | 3 | NULL |
+---------+----------+-----+-------------------+

Generating full path instead of linked hierarchy

If we are interested only in the full path N levels deep, we can omit ID and ParentID from the CTE. If we have a list of possible names in the supplementary table Names, it is easy to pick them from this table in CTE. The Names table should have enough rows for each level: 20 for level 1, 10 for level 2, 5 for level 3; 20+10+5 = 35 in total. It not necessary to have different sets of rows for each level, but it is easy to set it up properly, so I did it.

DECLARE @Names TABLE (Lvl int, Name nvarchar(4000), SeqNumber int);
-- First level: AAA, BBB, CCC, etc.
INSERT INTO @Names (Lvl, Name, SeqNumber)
SELECT 1, REPLICATE(CHAR(Number+64), 3) AS Name, Number AS SeqNumber
FROM Numbers
WHERE Number <= 20;
-- Second level: 001, 002, 003, etc.
INSERT INTO @Names (Lvl, Name, SeqNumber)
SELECT 2, REPLACE(STR(Number, 3), ' ', '0') AS Name, Number AS SeqNumber
FROM Numbers
WHERE Number <= 10;
-- Third level: I, II, III, IV, V
INSERT INTO @Names (Lvl, Name, SeqNumber) VALUES
(3, 'I', 1),
(3, 'II', 2),
(3, 'III', 3),
(3, 'IV', 4),
(3, 'V', 5);

SQL Fiddle Here is the final query. I split the FullPath into FilePath and FileName.

WITH
CTE
AS
(
 SELECT 
 TOP(CAST(
 (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) * 
 (
 1 + (SELECT I.LvlMax FROM @Intervals AS I WHERE I.Lvl = 1)
 - (SELECT I.LvlMin FROM @Intervals AS I WHERE I.Lvl = 1)
 )
 + (SELECT I.LvlMin FROM @Intervals AS I WHERE I.Lvl = 1)
 AS int))
 1 AS Lvl
 ,CAST(
 (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) * 
 (
 1 + (SELECT I.LvlMax FROM @Intervals AS I WHERE I.Lvl = 2)
 - (SELECT I.LvlMin FROM @Intervals AS I WHERE I.Lvl = 2)
 )
 + (SELECT I.LvlMin FROM @Intervals AS I WHERE I.Lvl = 2)
 AS int) AS ChildRowCount
 ,N.Name AS FullPath
 ,N.Name AS [FilePath]
 ,CAST(N'' AS nvarchar(4000)) AS [FileName]
 FROM
 Numbers
 INNER JOIN @Names AS N ON 
 N.SeqNumber = Numbers.Number AND N.Lvl = 1
 ORDER BY Numbers.Number
 UNION ALL
 SELECT
 CTE.Lvl + 1 AS Lvl
 ,CA.ChildRowCount
 ,CTE.FullPath + '\' + CA.Name AS FullPath
 ,CASE WHEN CA.ChildRowCount IS NOT NULL 
 THEN CTE.FullPath + '\' + CA.Name
 ELSE CTE.FullPath END AS [FilePath]
 ,CASE WHEN CA.ChildRowCount IS NULL 
 THEN CA.Name
 ELSE N'' END AS [FileName]
 FROM
 CTE
 CROSS APPLY
 (
 SELECT
 Numbers.Number
 ,CAST(
 (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) * 
 (
 1 + (SELECT I.LvlMax FROM @Intervals AS I WHERE I.Lvl = CTE.Lvl + 2)
 - (SELECT I.LvlMin FROM @Intervals AS I WHERE I.Lvl = CTE.Lvl + 2)
 )
 + (SELECT I.LvlMin FROM @Intervals AS I WHERE I.Lvl = CTE.Lvl + 2)
 AS int) AS ChildRowCount
 ,N.Name
 FROM
 Numbers
 INNER JOIN @Names AS N ON 
 N.SeqNumber = Numbers.Number AND N.Lvl = CTE.Lvl + 1
 WHERE Numbers.Number <= CTE.ChildRowCount
 ) AS CA
 WHERE
 CTE.ChildRowCount IS NOT NULL
)
SELECT
 CTE.FullPath
 ,CTE.[FilePath]
 ,CTE.[FileName]
FROM CTE
WHERE CTE.ChildRowCount IS NULL
ORDER BY FullPath;

Result

+-------------+----------+----------+
| FullPath | FilePath | FileName |
+-------------+----------+----------+
| AAA001円\I | AAA001円 | I |
| AAA001円\II | AAA001円 | II |
| AAA002円\I | AAA002円 | I |
| AAA002円\II | AAA002円 | II |
| AAA002円\III | AAA002円 | III |
| AAA002円\IV | AAA002円 | IV |
| AAA002円\V | AAA002円 | V |
| AAA003円\I | AAA003円 | I |
| AAA003円\II | AAA003円 | II |
| AAA003円\III | AAA003円 | III |
| AAA004円\I | AAA004円 | I |
| AAA004円\II | AAA004円 | II |
| AAA004円\III | AAA004円 | III |
| AAA004円\IV | AAA004円 | IV |
| BBB001円\I | BBB001円 | I |
| BBB001円\II | BBB001円 | II |
| CCC001円\I | CCC001円 | I |
| CCC001円\II | CCC001円 | II |
| CCC001円\III | CCC001円 | III |
| CCC001円\IV | CCC001円 | IV |
| CCC001円\V | CCC001円 | V |
| CCC002円\I | CCC002円 | I |
| CCC003円\I | CCC003円 | I |
| CCC003円\II | CCC003円 | II |
| CCC004円\I | CCC004円 | I |
| CCC004円\II | CCC004円 | II |
| CCC005円\I | CCC005円 | I |
| CCC005円\II | CCC005円 | II |
| CCC005円\III | CCC005円 | III |
| CCC006円\I | CCC006円 | I |
| CCC006円\II | CCC006円 | II |
| CCC006円\III | CCC006円 | III |
| CCC006円\IV | CCC006円 | IV |
| CCC007円\I | CCC007円 | I |
| CCC007円\II | CCC007円 | II |
| CCC007円\III | CCC007円 | III |
| CCC007円\IV | CCC007円 | IV |
| CCC008円\I | CCC008円 | I |
| CCC008円\II | CCC008円 | II |
| CCC008円\III | CCC008円 | III |
| CCC009円\I | CCC009円 | I |
| CCC009円\II | CCC009円 | II |
| CCC009円\III | CCC009円 | III |
| CCC009円\IV | CCC009円 | IV |
| CCC010円\I | CCC010円 | I |
| CCC010円\II | CCC010円 | II |
| CCC010円\III | CCC010円 | III |
| DDD001円\I | DDD001円 | I |
| DDD001円\II | DDD001円 | II |
| DDD001円\III | DDD001円 | III |
| DDD001円\IV | DDD001円 | IV |
| DDD002円\I | DDD002円 | I |
| DDD003円\I | DDD003円 | I |
| DDD003円\II | DDD003円 | II |
| DDD003円\III | DDD003円 | III |
| DDD003円\IV | DDD003円 | IV |
| DDD004円\I | DDD004円 | I |
| DDD004円\II | DDD004円 | II |
| DDD004円\III | DDD004円 | III |
| DDD005円\I | DDD005円 | I |
| DDD006円\I | DDD006円 | I |
| DDD006円\II | DDD006円 | II |
| DDD006円\III | DDD006円 | III |
| DDD007円\I | DDD007円 | I |
| DDD007円\II | DDD007円 | II |
| DDD008円\I | DDD008円 | I |
| DDD008円\II | DDD008円 | II |
| DDD008円\III | DDD008円 | III |
| DDD009円\I | DDD009円 | I |
| DDD009円\II | DDD009円 | II |
| DDD010円\I | DDD010円 | I |
| DDD010円\II | DDD010円 | II |
| DDD010円\III | DDD010円 | III |
| DDD010円\IV | DDD010円 | IV |
| DDD010円\V | DDD010円 | V |
| EEE001円\I | EEE001円 | I |
| EEE001円\II | EEE001円 | II |
| FFF001円\I | FFF001円 | I |
| FFF002円\I | FFF002円 | I |
| FFF002円\II | FFF002円 | II |
| FFF003円\I | FFF003円 | I |
| FFF003円\II | FFF003円 | II |
| FFF003円\III | FFF003円 | III |
| FFF003円\IV | FFF003円 | IV |
| FFF003円\V | FFF003円 | V |
| FFF004円\I | FFF004円 | I |
| FFF004円\II | FFF004円 | II |
| FFF004円\III | FFF004円 | III |
| FFF004円\IV | FFF004円 | IV |
| FFF005円\I | FFF005円 | I |
| FFF006円\I | FFF006円 | I |
| FFF007円\I | FFF007円 | I |
| FFF007円\II | FFF007円 | II |
| FFF007円\III | FFF007円 | III |
| GGG001円\I | GGG001円 | I |
| GGG001円\II | GGG001円 | II |
| GGG001円\III | GGG001円 | III |
| GGG002円\I | GGG002円 | I |
| GGG003円\I | GGG003円 | I |
| GGG003円\II | GGG003円 | II |
| GGG003円\III | GGG003円 | III |
| GGG004円\I | GGG004円 | I |
| GGG004円\II | GGG004円 | II |
| HHH001円\I | HHH001円 | I |
| HHH001円\II | HHH001円 | II |
| HHH001円\III | HHH001円 | III |
| HHH002円\I | HHH002円 | I |
| HHH002円\II | HHH002円 | II |
| HHH002円\III | HHH002円 | III |
| HHH002円\IV | HHH002円 | IV |
| HHH002円\V | HHH002円 | V |
| HHH003円\I | HHH003円 | I |
| HHH003円\II | HHH003円 | II |
| HHH003円\III | HHH003円 | III |
| HHH003円\IV | HHH003円 | IV |
| HHH003円\V | HHH003円 | V |
| HHH004円\I | HHH004円 | I |
| HHH004円\II | HHH004円 | II |
| HHH004円\III | HHH004円 | III |
| HHH004円\IV | HHH004円 | IV |
| HHH004円\V | HHH004円 | V |
| HHH005円\I | HHH005円 | I |
| HHH005円\II | HHH005円 | II |
| HHH005円\III | HHH005円 | III |
| HHH005円\IV | HHH005円 | IV |
| HHH005円\V | HHH005円 | V |
| HHH006円\I | HHH006円 | I |
| HHH007円\I | HHH007円 | I |
| HHH007円\II | HHH007円 | II |
| HHH007円\III | HHH007円 | III |
| HHH008円\I | HHH008円 | I |
| HHH008円\II | HHH008円 | II |
| HHH008円\III | HHH008円 | III |
| HHH008円\IV | HHH008円 | IV |
| HHH008円\V | HHH008円 | V |
+-------------+----------+----------+

Question 9

Interesting approach :). I like it. For the sake of completeness, can you please add the query to populate the Numbers table (from the SQL Fiddle), or just include that inline as part of the CTE? Then it is easier for someone to just copy and paste. For this answer, can the final output be expressed as each row being a complete path from Level1 down to Level3 for all Level3 values? I think that would take just 2 INNER JOINs in the final SELECT. Finally, can names / labels be assigned to each node so they aren't just numbers? I will update the question to clarify both of these points.

Question 10

Where do these names / labels come from? Should I have a 'Names' table, which has 20 rows and choose a name from it? Thus the same set of names would appear on each level. Or should each level has its own separate set of names?

Question 11

I think that the names can come from either a table (temp, real, or variable) or inline as part of the CTE. I originally put them in the CTE but then moved them to a local temp table so that the main part of the query would be more readable here. I think with the structure that you have, it would be easy enough to have separate per level. But if it was just one set of 20 that would also suffice, it just provide's slightly less variation in the test data. The only true requirement is that no name repeats within a node as that would error when trying to create the directories or files :).

Question 12

@srutzky, I added a second variant.

Question 13

@srutzky, I split FullPath into FilePath and FileName.

Question 14

So here is what I came up with. With the goal of creating a directory structure, I was looking for usable "names" for the directories and files. Because I was unable to get the TOP(n) working in the CROSS APPLYs (I think I attempted to correlate the queries by using a value from the parent as the n in the TOP(n) but then it wasn't random), I decided to create a type of "numbers" table that would allow an INNER JOIN or WHERE condition to produce a set of n elements simply by randomizing a number and specifying it as WHERE table.Level = random_number. The trick is that there is only 1 row for Level1, 2 rows for Level2, 3 rows for Level3, and so on. Hence, using WHERE LevelID = 3 will get me 3 rows, and each row has a value that I can use as a directory name.

SETUP

This part was originally specified inline, as part of the CTE. But for the sake of readability (so that you don't need to scroll through lots of INSERT statements to get to the few lines of the real query), I broke it out into a local temporary table.

IF (OBJECT_ID(N'tempdb..#Elements') IS NULL)
BEGIN
 PRINT 'Creating #Elements table...';
 CREATE TABLE #Elements (
 ElementLevel TINYINT NOT NULL,
 LevelName NVARCHAR(50) NOT NULL
 );
 PRINT 'Populating #Elements table...';
 INSERT INTO #Elements (ElementLevel, LevelName)
 SELECT tmp.[Level], tmp.[Name]
 FROM (
 SELECT 1, N'Ella'
 UNION ALL SELECT 2, N'Itchy'
 UNION ALL SELECT 2, N'Scratchy'
 UNION ALL SELECT 3, N'Moe'
 UNION ALL SELECT 3, N'Larry'
 UNION ALL SELECT 3, N'Curly'
 UNION ALL SELECT 4, N'Ian'
 UNION ALL SELECT 4, N'Stephen'
 UNION ALL SELECT 4, N'Peter'
 UNION ALL SELECT 4, N'Bernard'
 UNION ALL SELECT 5, N'Michigan'
 UNION ALL SELECT 5, N'Erie'
 UNION ALL SELECT 5, N'Huron'
 UNION ALL SELECT 5, N'Ontario'
 UNION ALL SELECT 5, N'Superior'
 UNION ALL SELECT 6, N'White'
 UNION ALL SELECT 6, N'Orange'
 UNION ALL SELECT 6, N'Blonde'
 UNION ALL SELECT 6, N'Pink'
 UNION ALL SELECT 6, N'Blue'
 UNION ALL SELECT 6, N'Brown'
 UNION ALL SELECT 7, N'Asia'
 UNION ALL SELECT 7, N'Africa'
 UNION ALL SELECT 7, N'North America'
 UNION ALL SELECT 7, N'South America'
 UNION ALL SELECT 7, N'Antarctica'
 UNION ALL SELECT 7, N'Europe'
 UNION ALL SELECT 7, N'Australia'
 UNION ALL SELECT 8, N'AA'
 UNION ALL SELECT 8, N'BB'
 UNION ALL SELECT 8, N'CC'
 UNION ALL SELECT 8, N'DD'
 UNION ALL SELECT 8, N'EE'
 UNION ALL SELECT 8, N'FF'
 UNION ALL SELECT 8, N'GG'
 UNION ALL SELECT 8, N'HH'
 UNION ALL SELECT 9, N'I'
 UNION ALL SELECT 9, N'II'
 UNION ALL SELECT 9, N'III'
 UNION ALL SELECT 9, N'IV'
 UNION ALL SELECT 9, N'V'
 UNION ALL SELECT 9, N'VI'
 UNION ALL SELECT 9, N'VII'
 UNION ALL SELECT 9, N'VIII'
 UNION ALL SELECT 9, N'IX'
 UNION ALL SELECT 10, N'Million'
 UNION ALL SELECT 10, N'Billion'
 UNION ALL SELECT 10, N'Trillion'
 UNION ALL SELECT 10, N'Quadrillion'
 UNION ALL SELECT 10, N'Quintillion'
 UNION ALL SELECT 10, N'Sestillion'
 UNION ALL SELECT 10, N'Sextillion'
 UNION ALL SELECT 10, N'Octillion'
 UNION ALL SELECT 10, N'Nonillion'
 UNION ALL SELECT 10, N'Decillion'
 ) tmp([Level], [Name]);
END;

MAIN QUERY

For Level 1 I just grabbed [name] values out of sys.objects since there are always plenty of rows there. But, if I needed more control over the names, I could just expand the #Elements table to contain additional levels.

;WITH topdir(Level1, Randy) AS
(
 SELECT TOP ( (CONVERT(INT, CRYPT_GEN_RANDOM(1)) % 20) + 5 ) so.[name],
 ( (CONVERT(INT, CRYPT_GEN_RANDOM(1)) % 10) + 1 )
 FROM sys.objects so
 ORDER BY CRYPT_GEN_RANDOM(8) ASC
)
SELECT td.Level1, tmp1.Level2, tmp2.Level3
FROM topdir td
CROSS APPLY (SELECT help.LevelName, (CONVERT(INT, CRYPT_GEN_RANDOM(1)) % 5) + 1
 FROM #Elements help
 WHERE help.ElementLevel = td.Randy
 ) tmp1 (Level2, Bandy)
CROSS APPLY (SELECT help.LevelName
 FROM #Elements help
 WHERE help.ElementLevel = tmp1.Bandy
 ) tmp2 (Level3);

QUERY ADAPTED TO PRODUCE EACH FILE'S PATH, NAME, and CONTENTS

In order to generate the full paths for the files and the file contents, I made the main SELECT of the CTE just another CTE and added a new main SELECT that gave the proper outputs that simply need to go into files.

DECLARE @Template NVARCHAR(4000);
SET @Template = N'<?xml version="1.0" encoding="ISO-8859-1"?>
<ns0:P4131 xmlns:ns0="http://switching/xi">
<R000000>
 <R00000010>R000000</R00000010>
 <R00000020>I</R00000020>
 <R00000030>{{Tag30}}</R00000030>
 <R00000040>{{Tag40}}</R00000040>
 <R00000050>{{Tag50}}</R00000050>
 <R00000060>2</R00000060>
</R000000>
</ns0:P4131>
';
;WITH topdir(Level1, Thing1) AS
(
 SELECT TOP ( (CONVERT(INT, CRYPT_GEN_RANDOM(1)) % 20) + 5 ) so.[name],
 ( (CONVERT(INT, CRYPT_GEN_RANDOM(1)) % 10) + 1 )
 FROM sys.objects so
 ORDER BY CRYPT_GEN_RANDOM(8) ASC
), main AS
(
 SELECT td.Level1, tmp1.Level2, tmp2.Level3,
 td.Level1 + N'\' + tmp1.Level2 AS [FullPath],
 RIGHT('000' + CONVERT(VARCHAR(10),
 (CONVERT(INT, CRYPT_GEN_RANDOM(2)) % 9999) + 1), 4) AS [R30],
 RIGHT('000' + CONVERT(VARCHAR(10),
 (CONVERT(INT, CRYPT_GEN_RANDOM(2)) % 500) + 100), 4) AS [R50],
 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS [RowNum]
 FROM topdir td
 CROSS APPLY (SELECT help.LevelName, (CONVERT(INT, CRYPT_GEN_RANDOM(1)) % 5) + 1
 FROM #Elements help
 WHERE help.ElementLevel = td.Thing1
 ) tmp1 (Level2, Thing2)
 CROSS APPLY (SELECT help.LevelName
 FROM #Elements help
 WHERE help.ElementLevel = tmp1.Thing2
 ) tmp2 (Level3)
)
SELECT mn.FullPath,
 mn.Level3 + N'.xml' AS [FileName],
 REPLACE(
 REPLACE(
 REPLACE(
 @Template,
 N'{{Tag30}}',
 mn.R30),
 N'{{Tag40}}',
 mn.RowNum),
 N'{{Tag50}}',
 mn.R50) AS [Contents]
FROM main mn;

EXTRA CREDIT

While not part of the requirements stated in the question, the goal (which was mentioned) was to create files to test recursive File System functions with. So how do we take this result set of path names, file names, and file contents and do something with it? We just need two SQLCLR functions: one to create the folders and one to create the files.

In order to make this data functional, I modified the main SELECT of the CTE shown directly above as follows:

SELECT SQL#.File_CreateDirectory(
 N'C:\Stuff\TestXmlFiles\' + mn.FullPath) AS [CreateTheDirectory],
 SQL#.File_WriteFile(
 N'C:\Stuff\TestXmlFiles\' + mn.FullPath + N'\' + mn.Level3 + N'.xml',
 REPLACE(
 REPLACE(
 REPLACE(
 @Template,
 N'{{Tag30}}',
 mn.R30),
 N'{{Tag40}}',
 mn.RowNum),
 N'{{Tag50}}',
 mn.R50), -- @FileData
 0, -- @AppendData
 '' -- @FileEncoding
 ) AS [WriteTheFile]
FROM main mn;

score 9 · Accepted Answer · 2015-10-09 08:06:44Z

^{(O.P.'s note: preferred solution is the 4th / final code block)}

XML seems to me to be the obvious choice of data structure to use here.

with N as
(
 select T.N
 from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),
 (12),(13),(14),(15),(16),(17),(18),(19),(20)) as T(N)
)
select top(5 + abs(checksum(newid())) % 15)
 N1.N as '@Value',
 (
 select top(1 + abs(checksum(newid())) % 10)
 N2.N as '@Value',
 (
 select top(1 + abs(checksum(newid())) % 5)
 N3.N as '@Value'
 from N as N3
 where N2.N > 0
 for xml path('Level3'), type
 )
 from N as N2
 where N1.N > 0
 for xml path('Level2'), type
 )
from N as N1
for xml path('Level1'), root('Root');

The trick to make SQL Server use different values for top() for each node is to make the sub-queries correlated. N1.N > 0 and N2.N > 0.

Flatteing the XML:

declare @X xml;
with N as
(
 select T.N
 from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),
 (12),(13),(14),(15),(16),(17),(18),(19),(20)) as T(N)
)
select @X = (
 select top(5 + abs(checksum(newid())) % 15)
 N1.N as '@Value',
 (
 select top(1 + abs(checksum(newid())) % 10)
 N2.N as '@Value',
 (
 select top(1 + abs(checksum(newid())) % 5)
 N3.N as '@Value'
 from N as N3
 where N2.N > 0
 for xml path('Level3'), type
 )
 from N as N2
 where N1.N > 0
 for xml path('Level2'), type
 )
 from N as N1
 for xml path('Level1')
 );
select L1.X.value('@Value', 'varchar(10)')+'\'+
 L2.X.value('@Value', 'varchar(10)')+'\'+
 L3.X.value('@Value', 'varchar(10)')
from @X.nodes('/Level1') as L1(X)
 cross apply L1.X.nodes('Level2') as L2(X)
 cross apply L2.X.nodes('Level3') as L3(X);

And a version totally void of XML.

with N as
(
 select T.N
 from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),
 (12),(13),(14),(15),(16),(17),(18),(19),(20)) as T(N)
)
select cast(N1.N as varchar(10))+'\'+
 cast(N2.N as varchar(10))+'\'+
 cast(N3.N as varchar(10))
from (
 select top(5 + abs(checksum(newid())) % 15)
 N.N
 from N
 ) as N1
 cross apply
 (
 select top(1 + abs(checksum(newid())) % 10)
 N.N
 from N
 where N1.N > 0
 ) as N2
 cross apply
 (
 select top(1 + abs(checksum(newid())) % 5)
 N.N
 from N
 where N2.N > 0
 ) as N3;

Correlation N1.N > 0 and N2.N > 0 is still important.

A version using a table with 20 names to be used instead of just integers.

declare @Elements table
(
 Name nvarchar(50) not null
);
insert into @Elements(Name)
select top(20) C.name 
from sys.columns as C
group by C.name;
select N1.Name + N'\' + N2.Name + N'\' + N3.Name
from (
 select top(5 + abs(checksum(newid())) % 15)
 E.Name
 from @Elements as E
 ) as N1
 cross apply
 (
 select top(1 + abs(checksum(newid())) % 10)
 E.Name
 from @Elements as E
 where N1.Name > ''
 ) as N2
 cross apply
 (
 select top(1 + abs(checksum(newid())) % 5)
 E.Name
 from @Elements as E
 where N2.Name > ''
 ) as N3;

I like the new version better. It is almost the same thing I came up with in my first attempt but for some reason I couldn't get the TOP(n) to work correctly in the 2 CROSS APPLYs. Not sure what I did differently / incorrectly since I got rid of that code once I got something else working. I will post that soon, now that you have provided this update. And I have cleaned up most of my comments above.
I just posted my version. The main differences are: 1) since I couldn't get the TOP(n) to work, I went with getting n elements via a WHERE condition, and 2) I have the name component which is more controlled than randomizing directory and/or file names.
Sorry to be away for so long, but I've been crazy busy. Still, I've been thinking about this and can't decide between my answer and your non-XML version. I like the simplicity & flexibility of yours, but need the ability to return names to use to create a folder structure, which mine has. Then I realized I had Vlad update his to have a lookup table and JOIN to it to give the ideal output. So, if not inappropriate to ask, could you please update yours to include that same lookup? Then all 3 answers would give equivalent output (ideal for comparing all 3), and I would accept yours. Is that ok?
@srutzky I have updated the answer. Was a while ago so hope I got it right and what you where looking for. You can of course add a level column to @Elemets to get a different set of names for each level to choose from.
@srutzky no worries. I'm glad the answer was helpful for you.

Stack Exchange Network

Create hierarchy of multiple levels where each node has a random number of children

3 Answers 3

Generating full path instead of linked hierarchy

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Create hierarchy of multiple levels where each node has a random number of children

3 Answers 3

Generating full path instead of linked hierarchy

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions