John-Nagle
Nov 8, 2024

Let me describe what I'm doing, and ask if Renderling can get me there.

Here's some video from my Sharpview metaverse viewer. This is an all-new, all Rust client for Second Life and Open Simulator.
Four years of work so far. The graphics stack is Rend3 -> EGUI -> WGPU -> Vulkan.

Rend3 has been abandoned. I'm maintaining a copy, but it doesn't have much lighting, and has no environmental reflections. There are serious performance problems. Thus my interest in Renderling.

Renderling has much of what I need in a renderer. Here's what I need:

Features

Basic PBR (albedo, roughness, metallic, emissive) and normals.
High dynamic range rendering.
Full sun shadows.
A small number of additional lights with shadows.
Large numbers of non-shadow-casting lights.
Environmental reflections.
Mirrors would be nice, but only one mirror at a time needs to be fully functional.
Support for Windows, Linux, and MacOS. Do not need Android or WebAssembly.

Performance

Goal: 60 FPS on a scene 512 meters x 512 meters with about 25,000 meshes.
Ability to load new content from other threads while rendering is in progress. This is a key requirement for a metaverse viewer. The Sharpview side of this is working now, with many threads pulling in assets from asset servers at several hundred megabits per second. But WGPU has only one queue to the GPU, and the main thread owns it. So asset loading severely impacts frame rate. The effect is that frame rate is good when the camera is stationary and content loading has caught up. But move around fast and the frame rate drops. Since you can get on a motorcycle and drive around at 100kph in the virtual world, this is a big problem.
Ability to replace textures dynamically while rendering. All the textures will not fit in an 8GB GPU at the same time. So there are background threads busily replacing textures with larger or smaller textures as the camera moves. (This is not mip-mapping; that's different.) If you watch the video, this is happening constantly, but it's barely noticeable. The way Renderling currently does bindless might not work for this.

Can Renderling potentiallty do this? If so, how far in the future? Thanks.

Replies: 11 comments 15 replies

John-Nagle
Nov 8, 2024
Author

apothecary01med

Here's an image from Sharpview. That's the kind of content I'm dealing with. You can walk into that shop and closely examine the objects on sale.

0 replies

John-Nagle
Nov 9, 2024
Author

Tried loading the famous Bistro scene, from the Dropbox linked at https://www.reddit.com/r/godot/comments/gxccc9/been_playing_with_an_optimized_version_of_the/

Result:

ERROR example: gltf loading error: Cannot pack textures.

1 reply

@schell

schell Nov 9, 2024
Maintainer

Yes, I haven't tackled the requirements of the Bistro scene. It's going to require increasing the size of the atlas, increasing the number of layers in the atlas and possibly using block compression, which is on the roadmap.

schell
Nov 9, 2024
Maintainer

Thanks for your interest @John-Nagle.

Though I've been working on renderling a while, it's not ready for prime time. I would say that it's still in an alpha state, at best. I'm trying to get the features working, haven't moved on to getting them right and even further still is making them fast. That said, I can answer your bullet points. Here's the list with the features renderling currently has finished marked as complete:

Basic PBR (albedo, roughness, metallic, emissive) and normals.
High dynamic range rendering.
Full sun shadows.
A small number of additional lights with shadows.
Large numbers of non-shadow-casting lights.
Environmental reflections.
Mirrors would be nice, but only one mirror at a time needs to be fully functional.
Support for Windows, Linux, and MacOS. Do not need Android or WebAssembly.

Shadow mapping is in the works, I should have an initial implementation by the end of the year. Reflections/mirrors aren't on the roadmap yet, but of course I'd like to have them.

As far as your performance requirements - who knows?

25,000 meshes
I think it would depend on the size of the geometry of each mesh.
Ability to load new content from other threads while rendering is in progress
This is what the Stage object's main focus has been. You can load content from any thread and it will get synchronized to the GPU every frame tick.
Ability to replace textures dynamically while rendering
The Atlas object is fully capable of doing this. You might have to hack around a bit though, because to add a new image or remove an image from the Atlas requires repacking all the images and generating a new texture, which is expensive. I think you'd probably want to reserve an image worth of space and then manage that "texture" yourself.

I'm glad you're digging in!

0 replies

John-Nagle
Nov 14, 2024
Author

See WGPU: Bindless tracking issue

I wrote up how I think that should work. Comments appreciated. Probably more useful to comment at that issue.

I update content from separate threads, not the rendering thread. So that kind of buffer separation and interlocking is needed. The big engines, like UE5, do this.

1 reply

@schell

schell Nov 15, 2024
Maintainer

I update content from separate threads, not the rendering thread. So that kind of buffer separation and interlocking is needed.

Yes, Stage is capable of doing this. Stage derefs to SlabAllocator, which is a CPU/GPU slab allocator. All clones of Stage or SlabAllocator point to the same underlying data.

You can allocate values and arrays using SlabAllocator::new_value and SlabAllocator::new_array. This can be done from any thread. The data is synchronized to the GPU once per frame. Values can be evicted from the CPU and reside only on the GPU, or can be hybrid. When values are dropped the allocator reclaims their allocation. Values are recycled and de-fragmented.

John-Nagle
Nov 15, 2024
Author

OK. So if we can get WGPU to support bindless, is there anything else needed to have multiple sized textures?
What do you need from WGPU?

This is looking encouraging.

(Incidentally, my textures are always powers of 2 pixels, and range from 1 to to 2048x2048 pixels. The most common size is 256x256. Textures are sometimes not square. 256x512 is often used.)

1 reply

@schell

schell Nov 16, 2024
Maintainer

Well the Atlas type abstracts over the lack of arrays-of-textures by providing an API that allows you to add images, drop images, swap images, etc. So at the moment having multiple sized textures works just fine, the difference being that when writing a shader you bind the entire atlas and then work with AtlasTexture to sample the layer and frame that corresponds to your specific image, instead of binding the entire image as the sole texture. See the impl of crates/renderling/src/pbr.rs:texture_color for an example.

Long story short - renderling has this "bindless" textures feature already, but when wgpu gains arrays of textures the entire Atlas type can be refactored away. I do await that moment anxiously, though - it can't happen fast enough.

John-Nagle
Nov 16, 2024
Author

Ah. Almost everything I have is bigger than Bistro, which, as noted above, won't currently fit into Renderling. Something needs a scale-up. Agree that we need to get WGPU to implement more bindless support. Or port to Vulkano instead of WGPU.

5 replies

@schell

schell Nov 16, 2024
Maintainer

Something needs a scale-up

Yes. You can increase the size and depth of the atlas's internal texture from Context. Context is the first thing you create in a renderling program.

Apart from that, I do have BCn compression on the roadmap which will help. But yes, you're correct renderling can't currently run Bistro by default - as I stated in the README it's alpha software! :) It will get there in time, though :)

@John-Nagle

John-Nagle Apr 22, 2025
Author

How are things going? Is this project still active?

@schell

schell Apr 23, 2025
Maintainer

Very much so! I'm currently working on light tiling, and then finishing up any loose ends on this past year's nlnet funding work. After that I'll be figuring out what to work on next, depending on the state of this upcoming year's funding (or lack thereof).

My focus for this next year will be on creating a path for migrating from Rend3 to Renderling, global illumination and texture improvements.

@John-Nagle

John-Nagle Apr 23, 2025
Author

Great to hear!

One thing I've been thinking about is how to handle many shadow-casting lights. There needs to be some culling phase where the set of objects which can occlude each light is computed at a coarse level. For, say, a small indoor light, that might cut 50,000 objects down to 10 or so. Then the cost per light is reasonable. Have you looked at that problem?

@schell

schell Apr 24, 2025
Maintainer

Indeed I have been thinking about that, and working on it! Light tiling does address this, as well as occlusion culling (which is still a work in progress). After the light tiling work is done you should be able to query the GPU for a list of objects that each light illuminates, which can then be used to filter the objects used to update the shadow maps from the CPU. If light tiling is enabled I think it might be possible to reuse the light tiling buffer to update shadow maps without having to do that round trip. Internally on the GPU those same lists/buffers are used to trim the number of lights that might be illuminating the object subject to the render call.

John-Nagle
Nov 20, 2024
Author

Any interest in switching from WGPU to Vulkano? Vulkano can do bindless.

There's an egui integration for Vulkano now, so that's covered.

MacOS support has been made to work for Vulkano, using MoltenVK.

Vulkano has some spatial filtering functions to help with lights vs. objects culling. WGPU seems to lack that.

You give up Android and web browser support, of course. But you can't do 3D game speed high-detail graphics on those platforms anyway.

0 replies

John-Nagle
Nov 21, 2024
Author

Reading through the code. Some comments.

Renderling notes

John Nagle

2024年11月20日

Renderling is a new renderer in Rust, sitting above WGPU.

https://renderling.xyz/

It looks promising.

Looking at the basic types.

These are the main data structures. Most renderers have something roughly similar.

Vertex

https://github.com/schell/renderling/blob/main/crates/renderling/src/stage.rs#L99

pub struct Vertex {
 pub position: Vec3,
 pub color: Vec4,
 pub uv0: Vec2,
 pub uv1: Vec2,
 pub normal: Vec3,
 pub tangent: Vec4,
 // Indices that point to this vertex's 'joint' transforms.
 pub joints: [u32; 4],
 // The weights of influence that each joint has over this vertex
 pub weights: [f32; 4],
}

The usual fields.
72 bytes of basics, 32 bytes of weights.
104 bytes per vertex. This wastes a lot of space for non rigged mesh.

The "joints" array seems to be an index into an array of joint transforms.

This corresponds to Vertex in Rend3.

Renderlet

renderling/crates/renderling/src/stage.rs

Line 200 in 61eecac

pub struct Renderlet {

pub struct Renderlet {
 pub visible: bool,
 pub vertices_array: Array<Vertex>,
 /// Bounding sphere of the entire renderlet, in local space.
 pub bounds: BoundingSphere,
 pub indices_array: Array<u32>,
 pub camera_id: Id<Camera>,
 pub transform_id: Id<Transform>,
 pub material_id: Id<Material>,
 pub skin_id: Id<Skin>,
 pub morph_targets: Array<Array<MorphTarget>>,
 pub morph_weights: Array<f32>,
 pub pbr_config_id: Id<PbrConfig>,
}

This corresponds to Object in Rend3. As with Rend3, dropping the Renderlet removes the item from the visible scene. Or so say the comments. Can't find a drop function for Renderlet, so that may not be implemented yet.

Material

https://github.com/schell/renderling/blob/main/crates/renderling/src/pbr.rs#L32

pub struct Material {
 pub emissive_factor: Vec3,
 pub emissive_strength_multiplier: f32,
 pub albedo_factor: Vec4,
 pub metallic_factor: f32,
 pub roughness_factor: f32,
 pub albedo_texture_id: Id<AtlasTexture>,
 pub metallic_roughness_texture_id: Id<AtlasTexture>,
 pub normal_texture_id: Id<AtlasTexture>,
 pub ao_texture_id: Id<AtlasTexture>,
 pub emissive_texture_id: Id<AtlasTexture>,
 pub albedo_tex_coord: u32,
 pub metallic_roughness_tex_coord: u32,
 pub normal_tex_coord: u32,
 pub ao_tex_coord: u32,
 pub emissive_tex_coord: u32,
 pub has_lighting: bool,
 pub ao_strength: f32,
}

This corresponds to Material in Rend3.

AtlasTexture

https://github.com/schell/renderling/blob/main/crates/renderling/src/atlas.rs#L38

pub struct AtlasTexture {
 /// The top left offset of texture in the atlas.
 pub offset_px: UVec2,
 /// The size of the texture in the atlas.
 pub size_px: UVec2,
 /// The index of the layer within the atlas that this `AtlasTexture `belongs to.
 pub layer_index: u32,
 /// The index of this frame within the layer.
 pub frame_index: u32,
 /// Various toggles of texture modes.
 pub modes: TextureModes,

}

This corresponds to Texture in Rend3, but instead of being in a separate buffer, it's a place in a giant texture atlas texture. That needs to change for proper bindless operation.

NestedTransform

renderling/crates/renderling/src/stage/cpu.rs

Line 1024 in 61eecac

pub fn add_child(&self, node: &NestedTransform) {

pub struct NestedTransform {
 global_transform: Gpu<Transform>,
 local_transform: Arc<RwLock<Transform>>,
 children: Arc<RwLock<Vec<NestedTransform>>>,
 parent: Arc<RwLock<Option<NestedTransform>>>,
}

There's a hierarchy of transforms. Nodes contain an ID into an array of these.

Misc. questions and notes

Is Z up, or is Y up?
There are many arrays which have to be allocated and kept in sync in this system. All this index stuff means that Rust's built-in checks don't help much.
Interlocked drop functions are needed at several places.
Updating is mostly allocating Vecs for each type of structure and doing pushes while loading initial content.
Dynamic scene updating has never been done with this code and there is little sign of support for it. Adding it will require resolving some interlocking problems.
The general approach seems to be to copy giant arrays from CPU memory to GPU memory on every frame. This may be ill-suited to a dynamic world.
There's promise here, but a lot of work between where this is now and usability for complex, frequently-updated scenes.

3 replies

@schell

schell Nov 21, 2024
Maintainer

Hi @John-Nagle, it's great to see you digging into Renderling! I'll try to address your comments as best I can.

Vertex

104 bytes per vertex. This wastes a lot of space for non rigged mesh.

Yes, it's not the most compact representation. There's a trade off between ergonomics and size here. Well, there's also the fact that these all live in one buffer, so either these weights live in the vertex or a pointer to the weights live in the vertex, but of course that means there's another 8 bytes per vertex worth of data in the buffer, total.

But I see what you're getting at. If there are no weights this is indeed wasted space.

Renderlet

This corresponds to Object in Rend3. As with Rend3, dropping the Renderlet removes the item from the visible scene. Or so say the comments. Can't find a drop function for Renderlet, so that may not be implemented yet.

I haven't used Rend3, I didn't know this concept exists other places! Makes sense!

About dropping - on the CPU side you'll very rarely be working with these structs "in the raw", so to speak. Everything is staged on the GPU using a SlabAllocator, which wraps these structs in Hybrid or Gpu wrappers. The former is a value that lives on the CPU and the GPU and the latter only lives on GPU (meaning that the data has been evicted from the CPU cache). The Stage mantains a weak reference to these Hybrid renderlets and recycles the ones that have been dropped. That's why there's no drop function. The upkeep is all handled by Stage.

Misc. questions

Is Z up, or is Y up?

Typically Y is up, but that's my convention and it depends on your matrices.

There are many arrays which have to be allocated and kept in sync in this system. All this index stuff means that Rust's built-in checks don't help much.

There's really only one storage buffer and it's managed by Stage. You can find more information in the docs/code for Stage, SlabAllocator and also the library crabslab. The shaders use the indices to read from the storage buffer and the shaders are really the only place where raw structs are handled.

But yes, if you were to implement your own model loading or scene generation, you'd have to use the Stage to place the structs on the GPU and then make sure they point to the correct places, but it's not hard in practice. crabslab handles most of it for you.

Interlocked drop functions are needed at several places.

Not sure about this.

Updating is mostly allocating Vecs for each type of structure and doing pushes while loading initial content.

This shouldn't be necessary, as Hybrid will do this for you. Typically you would do something like this:

let stage = context.new_stage();
let hybrid_struct: Hybrid<MyStruct> = stage.new_value(MyStruct { .. });
// Then later you can update it, and this is synchronized to the GPU in the upkeep phase of `Stage::render`
hybrid_struct.modify(|ms| {ms.field0 = 42;});

There's promise here, but a lot of work between where this is now and usability for complex, frequently-updated scenes.

Thank you! And definitely! I'm still very much in the early stages.

@John-Nagle

John-Nagle Nov 21, 2024
Author

Thanks.

I'm trying to figure out how to get to true bindless. See my note on GPU buffer allocation vs. safety boundary.

I'd appreciate comments on that.

It's easy to design yourself into a corner where rendering is slow and no backwards-compatible fix will work. I'm starting to suspect that Vulkano and WGPU both did that.

@schell

schell Nov 21, 2024
Maintainer

True bindless likely won't happen in wgpu for a good while. At least a year plus in my estimation (I hope I'm wrong). Web is a necessity for me, hence why I'm targeting wgpu and have used this atlas-workaround.

I'd appreciate comments on that.

I mean, it seems you know what these pitfalls are - have you tried writing your own renderer with ash?

I'm sure eventually wgpu will be ready in a way that it satisfies your requirements, but that won't be for a while, it seems. I'm not sure my opinions and comments will help move your cause forward, as my concerns are orthogonal to yours, I think.

Let me be clear that I do want to be able to render complex and dynamic scenes - we are aligned there. It's just that I'm fine working within the current boundaries and creating temporary work-arounds until the underlying tech is ready. I'm personally not equipped (yet) to dive deeper into the graphics stack than wgpu, nor would I want to, as I'm not done learning at my current level, and diving deeper would distract me from meeting my current goals to deliver renderling to 1.0.

John-Nagle
Nov 22, 2024
Author

Google just today announced their plans for WPGU in Chrome. WGPU has been sort of "Vulkan lite", and Google is preparing to add more mainstream Vulkan features. This includes two major performance fixes:

Please take a look at that update. Eventually, WGPU has to support that.

2 replies

@schell

schell Nov 22, 2024
Maintainer

Hot dog! Funny how top-of-mind it is for all of us. I look forward to that. Thanks for the heads up!

@John-Nagle

John-Nagle Nov 22, 2024
Author

Thanks. There's a complexity ceiling on how much you can render in all the current Rust renderers, and it's lower than the hardware can support. You see this when main thread time hits 100% of one CPU and the GPU is maybe 25% busy. Profiling shows too much time going into bookkeeping for binding and depth sorting.

I think a key idea is 1) to update the descriptor table and buffer allocation from the same level, and 2) use that to handle most buffer locking. If this is sorted out properly, you should be able to get the ability to write to a currently unmapped-to-the-gpu texture buffer independent of anything else the GPU is doing.

In my own system I have about a dozen threads busily loading content from the network, mostly textures. But getting them into the GPU sets locks that stall rendering during the data transfer. Vulkan doesn't require that, but the WGPU and Rend3 layers impose sequencing constraints by using locks and queues. So the frame rate drops badly (like 60FPS to 10 FPS) during content updates. Content updates happen constantly when the player moves around.

I'm thinking that bindless buffer allocation, as I've outlined, could simplify the synchronization problem. If the unit of locking is one texture buffer, that maps well to concurrent texture loading.

I'm trying to avoid working below the renderer level. Below that, you have to have lots of target machines for testing, or a team of testers. That's a full time job in itself. Right now, the same code runs on Windows, Linux, and Mac, using cross-platform stuff developed by others.

John-Nagle
Jan 16, 2025
Author

WGPU seems to support bindless mode now. I think. At least on non-web platforms. Or so the WGPU people say. What's your take on that?

I have a test program called render-bench which I use to test Rend3/WGPU. It basically generates a dummy city, then adds and removes buildings to see how that impacts timing. Is Renderling far enough along to try doing that?

I'd like to have a shim which converts the Rend3 API to the Renderling API. Once all objects deallocate cleanly on drop, and textures can be any size, that should work. How far off is that? Thanks.

1 reply

@schell

schell Jan 20, 2025
Maintainer

WGPU seems to support bindless mode now. I think. At least on non-web platforms. Or so the WGPU people say. What's your take on that?

I haven't read about it yet, I'll have to check. As far as I know the only thing lacking was support for arrays of textures.

I have a test program called render-bench which I use to test Rend3/WGPU. It basically generates a dummy city, then adds and removes buildings to see how that impacts timing. Is Renderling far enough along to try doing that?

I'd be interested to see how far you can get with Renderling, but its API is still very much in flux and so I would worry that you might put in wasted effort, only for Renderling's setup to change drastically.

Once all objects deallocate cleanly on drop, and textures can be any size, that should work. How far off is that?

Those things should be supported right now. Hybrid<T> and HybridArray<T> (and respectively Gpu<T> and GpuArray<T>) should deallocate cleanly on drop, though for the dropped GPU buffer memory to become available you have to first call Stage::tick. And a texture can be any size, up to the limits determined by wgpu::Adapter, which is runtime specific.

John-Nagle
Jan 20, 2025
Author

The WGPU situation for bindless is ... complicated. See the WGPU bindless tracking issue.

I'd be interested to see how far you can get with Renderling, but its API is still very much in flux and so I would worry that you might put in wasted effort, only for Renderling's setup to change drastically.

OK. Then I will hold off for now. A rough API spec would help. There aren't that many calls needed, really. Look at the Rend3 API. It's pretty simple. Three.js is more widely used and has a more complex API, but it's mostly the same thing underneath plus lots of convenience functions, such as draw_sphare, draw_torus, etc. You're already pretty close to the Rend3 API.

1 reply

@schell

schell Jan 20, 2025
Maintainer

Thanks @John-Nagle, I've created an issue to explore writing a migration guide #155.

Uh oh!

Using Renderling to render a big, complex, dynamic world. #140

Uh oh!

Uh oh!

John-Nagle Nov 8, 2024

Replies: 11 comments · 15 replies

Uh oh!

Uh oh!

John-Nagle Nov 8, 2024 Author

Uh oh!

Uh oh!

John-Nagle Nov 9, 2024 Author

Uh oh!

schell Nov 9, 2024 Maintainer

Uh oh!

schell Nov 9, 2024 Maintainer

Uh oh!

Uh oh!

John-Nagle Nov 14, 2024 Author

Uh oh!

schell Nov 15, 2024 Maintainer

Uh oh!

John-Nagle Nov 15, 2024 Author

Uh oh!

schell Nov 16, 2024 Maintainer

Uh oh!

John-Nagle Nov 16, 2024 Author

Uh oh!

schell Nov 16, 2024 Maintainer

Uh oh!

John-Nagle Apr 22, 2025 Author

Uh oh!

schell Apr 23, 2025 Maintainer

Uh oh!

John-Nagle Apr 23, 2025 Author

Uh oh!

schell Apr 24, 2025 Maintainer

Uh oh!

John-Nagle Nov 20, 2024 Author

Uh oh!

Uh oh!

John-Nagle Nov 21, 2024 Author

2024年11月20日

Vertex

Renderlet

Material

AtlasTexture

NestedTransform

Misc. questions and notes

Uh oh!

Uh oh!

schell Nov 21, 2024 Maintainer

Vertex

Renderlet

Misc. questions

Uh oh!

John-Nagle Nov 21, 2024 Author

Uh oh!

schell Nov 21, 2024 Maintainer

Uh oh!

John-Nagle Nov 22, 2024 Author

Uh oh!

schell Nov 22, 2024 Maintainer

Uh oh!

John-Nagle Nov 22, 2024 Author

Uh oh!

John-Nagle Jan 16, 2025 Author

Uh oh!

Uh oh!

schell Jan 20, 2025 Maintainer

Uh oh!

John-Nagle Jan 20, 2025 Author

John-Nagle
Nov 8, 2024

Replies: 11 comments 15 replies

John-Nagle
Nov 8, 2024
Author

John-Nagle
Nov 9, 2024
Author

schell Nov 9, 2024
Maintainer

schell
Nov 9, 2024
Maintainer

John-Nagle
Nov 14, 2024
Author

schell Nov 15, 2024
Maintainer

John-Nagle
Nov 15, 2024
Author

schell Nov 16, 2024
Maintainer

John-Nagle
Nov 16, 2024
Author

schell Nov 16, 2024
Maintainer

John-Nagle Apr 22, 2025
Author

schell Apr 23, 2025
Maintainer

John-Nagle Apr 23, 2025
Author

schell Apr 24, 2025
Maintainer

John-Nagle
Nov 20, 2024
Author

John-Nagle
Nov 21, 2024
Author

schell Nov 21, 2024
Maintainer

John-Nagle Nov 21, 2024
Author

schell Nov 21, 2024
Maintainer

John-Nagle
Nov 22, 2024
Author

schell Nov 22, 2024
Maintainer

John-Nagle Nov 22, 2024
Author

John-Nagle
Jan 16, 2025
Author

schell Jan 20, 2025
Maintainer

John-Nagle
Jan 20, 2025
Author