-
-
Notifications
You must be signed in to change notification settings - Fork 12
Using Renderling to render a big, complex, dynamic world. #140
-
Let me describe what I'm doing, and ask if Renderling can get me there.
Here's some video from my Sharpview metaverse viewer. This is an all-new, all Rust client for Second Life and Open Simulator.
Four years of work so far. The graphics stack is Rend3 -> EGUI -> WGPU -> Vulkan.
Rend3 has been abandoned. I'm maintaining a copy, but it doesn't have much lighting, and has no environmental reflections. There are serious performance problems. Thus my interest in Renderling.
Renderling has much of what I need in a renderer. Here's what I need:
Features
- Basic PBR (albedo, roughness, metallic, emissive) and normals.
- High dynamic range rendering.
- Full sun shadows.
- A small number of additional lights with shadows.
- Large numbers of non-shadow-casting lights.
- Environmental reflections.
- Mirrors would be nice, but only one mirror at a time needs to be fully functional.
- Support for Windows, Linux, and MacOS. Do not need Android or WebAssembly.
Performance
- Goal: 60 FPS on a scene 512 meters x 512 meters with about 25,000 meshes.
- Ability to load new content from other threads while rendering is in progress. This is a key requirement for a metaverse viewer. The Sharpview side of this is working now, with many threads pulling in assets from asset servers at several hundred megabits per second. But WGPU has only one queue to the GPU, and the main thread owns it. So asset loading severely impacts frame rate. The effect is that frame rate is good when the camera is stationary and content loading has caught up. But move around fast and the frame rate drops. Since you can get on a motorcycle and drive around at 100kph in the virtual world, this is a big problem.
- Ability to replace textures dynamically while rendering. All the textures will not fit in an 8GB GPU at the same time. So there are background threads busily replacing textures with larger or smaller textures as the camera moves. (This is not mip-mapping; that's different.) If you watch the video, this is happening constantly, but it's barely noticeable. The way Renderling currently does bindless might not work for this.
Can Renderling potentiallty do this? If so, how far in the future? Thanks.
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 11 comments 15 replies
-
Here's an image from Sharpview. That's the kind of content I'm dealing with. You can walk into that shop and closely examine the objects on sale.
Beta Was this translation helpful? Give feedback.
All reactions
-
Tried loading the famous Bistro scene, from the Dropbox linked at https://www.reddit.com/r/godot/comments/gxccc9/been_playing_with_an_optimized_version_of_the/
Result:
ERROR example: gltf loading error: Cannot pack textures.
Beta Was this translation helpful? Give feedback.
All reactions
-
Yes, I haven't tackled the requirements of the Bistro scene. It's going to require increasing the size of the atlas, increasing the number of layers in the atlas and possibly using block compression, which is on the roadmap.
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks for your interest @John-Nagle.
Though I've been working on renderling a while, it's not ready for prime time. I would say that it's still in an alpha state, at best. I'm trying to get the features working, haven't moved on to getting them right and even further still is making them fast. That said, I can answer your bullet points. Here's the list with the features renderling currently has finished marked as complete:
- Basic PBR (albedo, roughness, metallic, emissive) and normals.
- High dynamic range rendering.
- Full sun shadows.
- A small number of additional lights with shadows.
- Large numbers of non-shadow-casting lights.
- Environmental reflections.
- Mirrors would be nice, but only one mirror at a time needs to be fully functional.
- Support for Windows, Linux, and MacOS. Do not need Android or WebAssembly.
Shadow mapping is in the works, I should have an initial implementation by the end of the year. Reflections/mirrors aren't on the roadmap yet, but of course I'd like to have them.
As far as your performance requirements - who knows?
- 25,000 meshes
I think it would depend on the size of the geometry of each mesh. - Ability to load new content from other threads while rendering is in progress
This is what theStageobject's main focus has been. You can load content from any thread and it will get synchronized to the GPU every frame tick. - Ability to replace textures dynamically while rendering
TheAtlasobject is fully capable of doing this. You might have to hack around a bit though, because to add a new image or remove an image from theAtlasrequires repacking all the images and generating a new texture, which is expensive. I think you'd probably want to reserve an image worth of space and then manage that "texture" yourself.
I'm glad you're digging in!
Beta Was this translation helpful? Give feedback.
All reactions
-
See WGPU: Bindless tracking issue
I wrote up how I think that should work. Comments appreciated. Probably more useful to comment at that issue.
I update content from separate threads, not the rendering thread. So that kind of buffer separation and interlocking is needed. The big engines, like UE5, do this.
Beta Was this translation helpful? Give feedback.
All reactions
-
I update content from separate threads, not the rendering thread. So that kind of buffer separation and interlocking is needed.
Yes, Stage is capable of doing this. Stage derefs to SlabAllocator, which is a CPU/GPU slab allocator. All clones of Stage or SlabAllocator point to the same underlying data.
You can allocate values and arrays using SlabAllocator::new_value and SlabAllocator::new_array. This can be done from any thread. The data is synchronized to the GPU once per frame. Values can be evicted from the CPU and reside only on the GPU, or can be hybrid. When values are dropped the allocator reclaims their allocation. Values are recycled and de-fragmented.
Beta Was this translation helpful? Give feedback.
All reactions
-
OK. So if we can get WGPU to support bindless, is there anything else needed to have multiple sized textures?
What do you need from WGPU?
This is looking encouraging.
(Incidentally, my textures are always powers of 2 pixels, and range from 1 to to 2048x2048 pixels. The most common size is 256x256. Textures are sometimes not square. 256x512 is often used.)
Beta Was this translation helpful? Give feedback.
All reactions
-
Well the Atlas type abstracts over the lack of arrays-of-textures by providing an API that allows you to add images, drop images, swap images, etc. So at the moment having multiple sized textures works just fine, the difference being that when writing a shader you bind the entire atlas and then work with AtlasTexture to sample the layer and frame that corresponds to your specific image, instead of binding the entire image as the sole texture. See the impl of crates/renderling/src/pbr.rs:texture_color for an example.
Long story short - renderling has this "bindless" textures feature already, but when wgpu gains arrays of textures the entire Atlas type can be refactored away. I do await that moment anxiously, though - it can't happen fast enough.
Beta Was this translation helpful? Give feedback.
All reactions
-
Ah. Almost everything I have is bigger than Bistro, which, as noted above, won't currently fit into Renderling. Something needs a scale-up. Agree that we need to get WGPU to implement more bindless support. Or port to Vulkano instead of WGPU.
Beta Was this translation helpful? Give feedback.
All reactions
-
Something needs a scale-up
Yes. You can increase the size and depth of the atlas's internal texture from Context. Context is the first thing you create in a renderling program.
Apart from that, I do have BCn compression on the roadmap which will help. But yes, you're correct renderling can't currently run Bistro by default - as I stated in the README it's alpha software! :) It will get there in time, though :)
Beta Was this translation helpful? Give feedback.
All reactions
-
How are things going? Is this project still active?
Beta Was this translation helpful? Give feedback.
All reactions
-
Very much so! I'm currently working on light tiling, and then finishing up any loose ends on this past year's nlnet funding work. After that I'll be figuring out what to work on next, depending on the state of this upcoming year's funding (or lack thereof).
My focus for this next year will be on creating a path for migrating from Rend3 to Renderling, global illumination and texture improvements.
Beta Was this translation helpful? Give feedback.
All reactions
-
Great to hear!
One thing I've been thinking about is how to handle many shadow-casting lights. There needs to be some culling phase where the set of objects which can occlude each light is computed at a coarse level. For, say, a small indoor light, that might cut 50,000 objects down to 10 or so. Then the cost per light is reasonable. Have you looked at that problem?
Beta Was this translation helpful? Give feedback.
All reactions
-
Indeed I have been thinking about that, and working on it! Light tiling does address this, as well as occlusion culling (which is still a work in progress). After the light tiling work is done you should be able to query the GPU for a list of objects that each light illuminates, which can then be used to filter the objects used to update the shadow maps from the CPU. If light tiling is enabled I think it might be possible to reuse the light tiling buffer to update shadow maps without having to do that round trip. Internally on the GPU those same lists/buffers are used to trim the number of lights that might be illuminating the object subject to the render call.
Beta Was this translation helpful? Give feedback.
All reactions
-
Any interest in switching from WGPU to Vulkano? Vulkano can do bindless.
There's an egui integration for Vulkano now, so that's covered.
MacOS support has been made to work for Vulkano, using MoltenVK.
Vulkano has some spatial filtering functions to help with lights vs. objects culling. WGPU seems to lack that.
You give up Android and web browser support, of course. But you can't do 3D game speed high-detail graphics on those platforms anyway.
Beta Was this translation helpful? Give feedback.
All reactions
-
Reading through the code. Some comments.
Renderling notes
John Nagle
2024年11月20日
Renderling is a new renderer in Rust, sitting above WGPU.
It looks promising.
Looking at the basic types.
These are the main data structures. Most renderers have something roughly similar.
Vertex
https://github.com/schell/renderling/blob/main/crates/renderling/src/stage.rs#L99
pub struct Vertex {
pub position: Vec3,
pub color: Vec4,
pub uv0: Vec2,
pub uv1: Vec2,
pub normal: Vec3,
pub tangent: Vec4,
// Indices that point to this vertex's 'joint' transforms.
pub joints: [u32; 4],
// The weights of influence that each joint has over this vertex
pub weights: [f32; 4],
}
The usual fields.
72 bytes of basics, 32 bytes of weights.
104 bytes per vertex. This wastes a lot of space for non rigged mesh.
The "joints" array seems to be an index into an array of joint transforms.
This corresponds to Vertex in Rend3.
Renderlet
renderling/crates/renderling/src/stage.rs
Line 200 in 61eecac
pub struct Renderlet {
pub visible: bool,
pub vertices_array: Array<Vertex>,
/// Bounding sphere of the entire renderlet, in local space.
pub bounds: BoundingSphere,
pub indices_array: Array<u32>,
pub camera_id: Id<Camera>,
pub transform_id: Id<Transform>,
pub material_id: Id<Material>,
pub skin_id: Id<Skin>,
pub morph_targets: Array<Array<MorphTarget>>,
pub morph_weights: Array<f32>,
pub pbr_config_id: Id<PbrConfig>,
}
This corresponds to Object in Rend3. As with Rend3, dropping the Renderlet removes the item from the visible scene. Or so say the comments. Can't find a drop function for Renderlet, so that may not be implemented yet.
Material
https://github.com/schell/renderling/blob/main/crates/renderling/src/pbr.rs#L32
pub struct Material {
pub emissive_factor: Vec3,
pub emissive_strength_multiplier: f32,
pub albedo_factor: Vec4,
pub metallic_factor: f32,
pub roughness_factor: f32,
pub albedo_texture_id: Id<AtlasTexture>,
pub metallic_roughness_texture_id: Id<AtlasTexture>,
pub normal_texture_id: Id<AtlasTexture>,
pub ao_texture_id: Id<AtlasTexture>,
pub emissive_texture_id: Id<AtlasTexture>,
pub albedo_tex_coord: u32,
pub metallic_roughness_tex_coord: u32,
pub normal_tex_coord: u32,
pub ao_tex_coord: u32,
pub emissive_tex_coord: u32,
pub has_lighting: bool,
pub ao_strength: f32,
}
This corresponds to Material in Rend3.
AtlasTexture
https://github.com/schell/renderling/blob/main/crates/renderling/src/atlas.rs#L38
pub struct AtlasTexture {
/// The top left offset of texture in the atlas.
pub offset_px: UVec2,
/// The size of the texture in the atlas.
pub size_px: UVec2,
/// The index of the layer within the atlas that this `AtlasTexture `belongs to.
pub layer_index: u32,
/// The index of this frame within the layer.
pub frame_index: u32,
/// Various toggles of texture modes.
pub modes: TextureModes,
}
This corresponds to Texture in Rend3, but instead of being in a separate buffer, it's a place in a giant texture atlas texture. That needs to change for proper bindless operation.
NestedTransform
renderling/crates/renderling/src/stage/cpu.rs
Line 1024 in 61eecac
pub struct NestedTransform {
global_transform: Gpu<Transform>,
local_transform: Arc<RwLock<Transform>>,
children: Arc<RwLock<Vec<NestedTransform>>>,
parent: Arc<RwLock<Option<NestedTransform>>>,
}
There's a hierarchy of transforms. Nodes contain an ID into an array of these.
Misc. questions and notes
-
Is Z up, or is Y up?
-
There are many arrays which have to be allocated and kept in sync in this system. All this index stuff means that Rust's built-in checks don't help much.
-
Interlocked drop functions are needed at several places.
-
Updating is mostly allocating Vecs for each type of structure and doing pushes while loading initial content.
-
Dynamic scene updating has never been done with this code and there is little sign of support for it. Adding it will require resolving some interlocking problems.
-
The general approach seems to be to copy giant arrays from CPU memory to GPU memory on every frame. This may be ill-suited to a dynamic world.
-
There's promise here, but a lot of work between where this is now and usability for complex, frequently-updated scenes.
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi @John-Nagle, it's great to see you digging into Renderling! I'll try to address your comments as best I can.
Vertex
104 bytes per vertex. This wastes a lot of space for non rigged mesh.
Yes, it's not the most compact representation. There's a trade off between ergonomics and size here. Well, there's also the fact that these all live in one buffer, so either these weights live in the vertex or a pointer to the weights live in the vertex, but of course that means there's another 8 bytes per vertex worth of data in the buffer, total.
But I see what you're getting at. If there are no weights this is indeed wasted space.
Renderlet
This corresponds to Object in Rend3. As with Rend3, dropping the Renderlet removes the item from the visible scene. Or so say the comments. Can't find a drop function for Renderlet, so that may not be implemented yet.
I haven't used Rend3, I didn't know this concept exists other places! Makes sense!
About dropping - on the CPU side you'll very rarely be working with these structs "in the raw", so to speak. Everything is staged on the GPU using a SlabAllocator, which wraps these structs in Hybrid or Gpu wrappers. The former is a value that lives on the CPU and the GPU and the latter only lives on GPU (meaning that the data has been evicted from the CPU cache). The Stage mantains a weak reference to these Hybrid renderlets and recycles the ones that have been dropped. That's why there's no drop function. The upkeep is all handled by Stage.
Misc. questions
Is Z up, or is Y up?
Typically Y is up, but that's my convention and it depends on your matrices.
There are many arrays which have to be allocated and kept in sync in this system. All this index stuff means that Rust's built-in checks don't help much.
There's really only one storage buffer and it's managed by Stage. You can find more information in the docs/code for Stage, SlabAllocator and also the library crabslab. The shaders use the indices to read from the storage buffer and the shaders are really the only place where raw structs are handled.
But yes, if you were to implement your own model loading or scene generation, you'd have to use the Stage to place the structs on the GPU and then make sure they point to the correct places, but it's not hard in practice. crabslab handles most of it for you.
Interlocked drop functions are needed at several places.
Not sure about this.
Updating is mostly allocating Vecs for each type of structure and doing pushes while loading initial content.
This shouldn't be necessary, as Hybrid will do this for you. Typically you would do something like this:
let stage = context.new_stage(); let hybrid_struct: Hybrid<MyStruct> = stage.new_value(MyStruct { .. }); // Then later you can update it, and this is synchronized to the GPU in the upkeep phase of `Stage::render` hybrid_struct.modify(|ms| {ms.field0 = 42;});
There's promise here, but a lot of work between where this is now and usability for complex, frequently-updated scenes.
Thank you! And definitely! I'm still very much in the early stages.
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks.
I'm trying to figure out how to get to true bindless. See my note on GPU buffer allocation vs. safety boundary.
I'd appreciate comments on that.
It's easy to design yourself into a corner where rendering is slow and no backwards-compatible fix will work. I'm starting to suspect that Vulkano and WGPU both did that.
Beta Was this translation helpful? Give feedback.
All reactions
-
True bindless likely won't happen in wgpu for a good while. At least a year plus in my estimation (I hope I'm wrong). Web is a necessity for me, hence why I'm targeting wgpu and have used this atlas-workaround.
I'd appreciate comments on that.
I mean, it seems you know what these pitfalls are - have you tried writing your own renderer with ash?
I'm sure eventually wgpu will be ready in a way that it satisfies your requirements, but that won't be for a while, it seems. I'm not sure my opinions and comments will help move your cause forward, as my concerns are orthogonal to yours, I think.
Let me be clear that I do want to be able to render complex and dynamic scenes - we are aligned there. It's just that I'm fine working within the current boundaries and creating temporary work-arounds until the underlying tech is ready. I'm personally not equipped (yet) to dive deeper into the graphics stack than wgpu, nor would I want to, as I'm not done learning at my current level, and diving deeper would distract me from meeting my current goals to deliver renderling to 1.0.
Beta Was this translation helpful? Give feedback.
All reactions
-
Google just today announced their plans for WPGU in Chrome. WGPU has been sort of "Vulkan lite", and Google is preparing to add more mainstream Vulkan features. This includes two major performance fixes:
Please take a look at that update. Eventually, WGPU has to support that.
Beta Was this translation helpful? Give feedback.
All reactions
-
Hot dog! Funny how top-of-mind it is for all of us. I look forward to that. Thanks for the heads up!
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks. There's a complexity ceiling on how much you can render in all the current Rust renderers, and it's lower than the hardware can support. You see this when main thread time hits 100% of one CPU and the GPU is maybe 25% busy. Profiling shows too much time going into bookkeeping for binding and depth sorting.
I think a key idea is 1) to update the descriptor table and buffer allocation from the same level, and 2) use that to handle most buffer locking. If this is sorted out properly, you should be able to get the ability to write to a currently unmapped-to-the-gpu texture buffer independent of anything else the GPU is doing.
In my own system I have about a dozen threads busily loading content from the network, mostly textures. But getting them into the GPU sets locks that stall rendering during the data transfer. Vulkan doesn't require that, but the WGPU and Rend3 layers impose sequencing constraints by using locks and queues. So the frame rate drops badly (like 60FPS to 10 FPS) during content updates. Content updates happen constantly when the player moves around.
I'm thinking that bindless buffer allocation, as I've outlined, could simplify the synchronization problem. If the unit of locking is one texture buffer, that maps well to concurrent texture loading.
I'm trying to avoid working below the renderer level. Below that, you have to have lots of target machines for testing, or a team of testers. That's a full time job in itself. Right now, the same code runs on Windows, Linux, and Mac, using cross-platform stuff developed by others.
Beta Was this translation helpful? Give feedback.
All reactions
-
WGPU seems to support bindless mode now. I think. At least on non-web platforms. Or so the WGPU people say. What's your take on that?
I have a test program called render-bench which I use to test Rend3/WGPU. It basically generates a dummy city, then adds and removes buildings to see how that impacts timing. Is Renderling far enough along to try doing that?
I'd like to have a shim which converts the Rend3 API to the Renderling API. Once all objects deallocate cleanly on drop, and textures can be any size, that should work. How far off is that? Thanks.
Beta Was this translation helpful? Give feedback.
All reactions
-
WGPU seems to support bindless mode now. I think. At least on non-web platforms. Or so the WGPU people say. What's your take on that?
I haven't read about it yet, I'll have to check. As far as I know the only thing lacking was support for arrays of textures.
I have a test program called render-bench which I use to test Rend3/WGPU. It basically generates a dummy city, then adds and removes buildings to see how that impacts timing. Is Renderling far enough along to try doing that?
I'd be interested to see how far you can get with Renderling, but its API is still very much in flux and so I would worry that you might put in wasted effort, only for Renderling's setup to change drastically.
Once all objects deallocate cleanly on drop, and textures can be any size, that should work. How far off is that?
Those things should be supported right now. Hybrid<T> and HybridArray<T> (and respectively Gpu<T> and GpuArray<T>) should deallocate cleanly on drop, though for the dropped GPU buffer memory to become available you have to first call Stage::tick. And a texture can be any size, up to the limits determined by wgpu::Adapter, which is runtime specific.
Beta Was this translation helpful? Give feedback.
All reactions
-
The WGPU situation for bindless is ... complicated. See the WGPU bindless tracking issue.
I'd be interested to see how far you can get with Renderling, but its API is still very much in flux and so I would worry that you might put in wasted effort, only for Renderling's setup to change drastically.
OK. Then I will hold off for now. A rough API spec would help. There aren't that many calls needed, really. Look at the Rend3 API. It's pretty simple. Three.js is more widely used and has a more complex API, but it's mostly the same thing underneath plus lots of convenience functions, such as draw_sphare, draw_torus, etc. You're already pretty close to the Rend3 API.
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks @John-Nagle, I've created an issue to explore writing a migration guide #155.
Beta Was this translation helpful? Give feedback.