For a raytracer I’ve been writing with a classmate, we use a custom scene definition format that allows specifying shapes, composite shapes, materials, lights, cameras and transform and render commands.
The assets of a scene (i.e. shapes, composite shapes, materials, lights and cameras) are stored in the data transfer object Scene
. The task of this parser is to take a text file with the scene definition, parse its rules and transform them into objects which populate a Scene
object.
A few pieces of the code I left out of simplicity, because I want the review to focus on the parsing strategy. Here is a list of these things and what they do.
Logger
: A simple logger class (a singleton) that prints messages when in debug mode. It’s also possible to generate a log file of the currently stored messagesload_file(file_path)
: Takes a path as a string, loads the file and returns its content as a stringsanitize_inputs(file_string)
: Removes leading/trailing spaces, consecutive spaces, comments (starting with#
, anywhere in the line) and empty linessplit(file_string, delimiter)
: Takes a string and splits it by a character. Returns a vector of stringsorder_rule_vector(rule_vector)
: Some rules in the scene definition format depend on other rules. To allow the user to write them out in any order, the vector of rules is partitioned first so that rules with dependencies come after their dependants
Also I left out the code for parsing lights, cameras, transform commands and render commands, because the strategy is identical to what’s shown in the code below.
Scene Definition Format
Our raytracer will implement the Phong reflection model. Currently there are regular spheres and axis-aligned boxes.
# name, ambient, diffuse, specular terms, shininess
define material red 1 0 0 1 0 0 1 0 0 1
# name, min, max positions, material name
define shape box b1 -100 -80 -20 1002 80 -100 red
# name, center position, radius, material name
define shape sphere s1 0 0 -100 50 blue
# name, shape(s)
define shape composite c1 b1 s1
The Parser
std::shared_ptr<Scene> load_scene(std::string const& file_path) {
std::shared_ptr<Scene> scene = std::make_shared<Scene>();
auto logger = Logger::instance();
logger->log("-- Start. Parsing scene: " + file_path);
std::string file_string = load_file(file_path);
sanitize_inputs(file_string);
std::vector<std::string> rule_vector = split(file_string, '\n');
order_rule_vector(rule_vector);
std::istringstream iss{ file_string };
for (auto&& rule : rule_vector) {
std::istringstream rule_stream{ rule };
std::string command;
rule_stream >> command;
if (command == "define") {
std::string define_target;
rule_stream >> define_target;
if (define_target == "material") {
std::string mat_name;
rule_stream >> mat_name;
if (scene->materials.find(mat_name) != scene->materials.end()) {
logger->log("Warning: Duplicate material '" + mat_name + "' was skipped.");
continue;
}
Color ambient, diffuse, specular;
double shininess;
rule_stream >> ambient.r;
rule_stream >> ambient.g;
rule_stream >> ambient.b;
rule_stream >> diffuse.r;
rule_stream >> diffuse.g;
rule_stream >> diffuse.b;
rule_stream >> specular.r;
rule_stream >> specular.g;
rule_stream >> specular.b;
rule_stream >> shininess;
logger->log("--- Adding material: " + mat_name);
auto material = std::make_shared<Material>(
mat_name, ambient, diffuse, specular, shininess
);
if (!scene->materials.insert({ mat_name, material }).second) {
logger->log("Error: Material '" + mat_name
+ "' wasn't added to the scene object.");
}
} else if (define_target == "shape") {
std::string shape_type, shape_name;
rule_stream >> shape_type;
rule_stream >> shape_name;
std::shared_ptr<Shape> shape;
if(shape_type == "sphere") {
glm::vec3 center;
double radius = 1.0;
rule_stream >> center.x;
rule_stream >> center.y;
rule_stream >> center.z;
rule_stream >> radius;
std::string mat_name;
rule_stream >> mat_name;
std::shared_ptr<Material> material;
if(scene->materials.find(mat_name) == scene->materials.end()) {
logger->log("--- Warning: Material '" + mat_name
+ "' not found. Using default material instead.");
material = std::make_shared<Material>();
} else {
material = scene->materials[mat_name];
}
shape = std::make_shared<Sphere>(shape_name, material, center, radius);
} else if(shape_type == "box") {
glm::vec3 min, max;
rule_stream >> min.x;
rule_stream >> min.y;
rule_stream >> min.z;
rule_stream >> max.x;
rule_stream >> max.y;
rule_stream >> max.z;
std::string mat_name;
rule_stream >> mat_name;
std::shared_ptr<Material> material;
if(scene->materials.find(mat_name) == scene->materials.end()) {
logger->log("--- Warning: Material '" + mat_name
+ "' not found. Using default material instead.");
material = std::make_shared<Material>();
} else {
material = scene->materials[mat_name];
}
shape = std::make_shared<Box>(shape_name, material, min, max);
} else if (shape_type == "composite") {
std::string child_token;
Composite comp{ shape_name };
while (std::getline(rule_stream, child_token, ' ')) {
// Leading/trailing spaces result in empty strings
// being part of the tokens
if (child_token == "") {
continue;
}
auto search_it = scene->shapes.find(child_token);
if (search_it == scene->shapes.end()) {
logger->log("Error: Shape '" + child_token + "' for composite "
+ comp.name() + " not found.");
continue;
}
if (!comp.add(search_it->second)) {
logger->log("Error: Shape '" + child_token
+ "' was not added to composite " + comp.name());
continue;
}
}
} else {
logger->log("--- Warning: Skipping unknown shape type " + shape_type);
continue;
}
logger->log("--- Adding shape " + shape_name);
if (!scene->shapes.insert({ shape_name, shape }).second) {
logger->log("Error: Shape " + shape_name
+ " wasn't added to the scene object.");
}
} else if (define_target == "light") {
// Parse lights
} else if (define_target == "camera") {
// Parse cameras
} else {
logger->log("--- Warning: Skipping unknown define target " + define_target);
continue;
}
} else if (command == "transform") {
// Parse transform commands
} else if (command == "render") {
// Parse render commands
} else {
logger->log("--- Skipping unknown command " + command);
continue;
}
}
logger->log("-- Success. Scene parsed completely.");
return scene;
}
I don’t like that code. It’s a branch nightmare. I write rule_stream >>
a lot. Also there are especially redundant pieces in there which I have to refactor.
An example would be parsing the shapes, more precisely which materials they use. I check a map for the materials name and use the pointer that’s stored there. The same code appears twice (once for spheres, once for boxes). That’s bad. One idea to solve this is to put the material name right after the shapes name. Then I can check for existance of the material before branching into the sphere and box cases.
Are there better approaches to parse such simple, regular languages like our scene definition format? What else can I improve?
1 Answer 1
Getting rid of the branches
I don’t like that code. It’s a branch nightmare.
Yes, let's get rid of the branches. Consider that all the branches are in the form of:
if (command == "foo") {
// parse rule_stream further
...
}
When you have a sequence of homogenous if
/else
statements like that, there are alternatives to consider. One would be a switch
-statement, but those unfortunately only work with integers and enum
s. Another option would be to create a map from strings to std::functions:
using ParseFunction = std::function<void(std::istream&, Scene*)>;
static const std::unordered_map<std::string, ParseFunction> commands = {
{"define", parse_define},
{"transform", parse_transform},
{"render", parse_render},
};
Then inside load_scene()
, you would write something like:
std::string command;
rule_stream >> command;
auto it = commands.find(command);
if (it != commands.end()) {
auto parse_function = it->second;
parse_function(rule_stream, scene->get());
} else {
logger->log("--- Skipping unknown command " + command);
continue;
}
Then you can define a function for each command, for example:
void parse_define(std::istream& rule_stream, Scene* scene) {
std::string define_target;
rule_stream >> define_target;
...
scene->materials.insert(...);
...
}
That shows how to get rid of the outer if
s. You can also do the same for targets and shapes.
Avoiding writing rule_stream >>
I write
rule_stream >>
a lot.
This is because you are reading each individual value. You can write your own overloads of operator>>()
for the types you are using. For example:
std::istream& operator>>(std::istream& is, glm::vec3 &v) {
return is >> v.x >> v.y >> v.z;
}
And then you can write:
glm::vec3 min, max;
rule_stream >> min >> max;
Split up large functions
Another reason your code looks bad is because everything is in one huge function. It is easy to lose overview. The techniques I've shown above already introduce new functions that in turn simplify your load_scene()
. But even without those you could have split up the function yourself:
static void parse_define(std::istream& rule_stream, Scene* scene) {
...
}
...
static void parse_rule(const std::string& rule, Scene* scene) {
std::istringstream rule_stream{ rule };
std::string command;
rule_stream >> command;
if (command == "define") {
parse_define(rule_stream, scene):
} else if (command == "transform") {
...
} ...
};
std::shared_ptr<Scene> load_scene(std::string const& file_path) {
...
for (auto&& rule : rule_vector) {
parse_rule(rule, scene->get());
}
return scene;
}
Are the std::shared_ptr
s necessary?
I see you use std::shared_ptr
s a lot. But are they really necessary? You could have load_scene()
return a Scene
directly; even without C++17's guaranteed copy elision, if Scene
has a move constructor, it could move all its contents efficiently to the caller. If you do need a smart pointer because of polymorphism, consider whether you can just use astd::unique_ptr
instead of the more expensive std::shared_ptr
.
Let's have a look at scene->materials
. This looks like a std::map
or std::unordered_map
. These containers will already do memory management for you, and pointers to values stored in the map are guaranteed to be stable. So if it was defined as:
struct Scene {
...
std::unordered_map<std::string, Material> materials;
...
};
The code in your parser could be simplified a lot:
std::string mat_name;
Color ambient, diffuse, specular;
float shininess;
rule_stream >> mat_name >> ambient >> diffuse >> specular >> shininess;
auto result = scene->materials.try_emplace(
mat_name, mat_name, ambient, diffuse, specular, shininess
);
if (!result.second) {
logger->log("Error: Material '" + mat_name + "' wasn't added to the scene object.");
}
(I used C++17's try_emplace()
here, you can achieve the same in other ways in earlier versions of C++, but less elegantly).
If you ensure any material used by shapes is in scene->materials
before the shapes using them are added, then you can just store a reference to the Material
in a Shape
instead of using a std::shared_ptr
.
If you have a default constructor and an operator>>
overload for Material
, then you could also consider writing:
Material mat;
rule_stream >> mat;
if (scene->materials.try_emplace(mat.name(), mat) {
logger->log("Error: Material '" + mat.name() + "' wasn't added to the scene object.");
}
Do you really need to order the rules first?
You are ordering the rules before parsing them. That means you have to read the whole file in memory first. But do you really need that? I see you are already handling missing materials, by assigning a default constructed Material
to a Shape
. But why not add that default Material
under the given name to scene->materials
, and then when you later do encounter the rule defining that Material
, just update it in place?
if (define_target == "shape") {
if (shape_type == "sphere") {
...
auto& material = scene->materials[mat_name];
shape = std::make_unique<Sphere>(shape_name, material /* reference */, center, radius);
}
...
} else if (define_target == "material") {
...
auto& material = scene->materials[mat_name];
material = {mat_name, ambient, diffuse, specular, shininess};
} ...
You can then do error checking afterwards: any material in scene->materials
that is still just default constructed was added by a "shape"
target, but there was never a matching "material"
target.
About parser generators
Dan Oberlam suggested Boost Spirit. This allows defining a grammar for a language completely inside C++. There are also external tools that can generate a parser written in C++ for you, like GNU Bison. These can be very helpful, especially for more complex grammars. The one you are trying to parse looks simple enough though that it probably is easier to write your own parser in C++.
class Parser
with all needed parameters and create a series ofclass ParserMaterial
...class ParserShape
inherited fromclass Parser
? \$\endgroup\$