I have a function which gets called very often, for this test 30,720,000 times. It has to run in real-time, so performance is very important. Are there possible improvements?
// This function looks up in the vertice table wheter a vertice at that position already exists
// - If yes, return the vertice index
// - If no, create a new vertice and return the created vertice index
//
// @param cfg: Pointer to a chunkConfig struct
// @param x: x position of the vertice
// @param y: y position of the vertice
inline VERTICE_INDEX GetOrCreateVertice(float x, float y, ChunkGenerator::chunkConfig *cfg)
{
int x_pos = int((x*POSARR_ADJ_F) + 0.5f);
int y_pos = int((y*POSARR_ADJ_F) + 0.5f);
int dict_index = x_pos + (y_pos * cfg->subdivisions_adj);
VERTICE_INDEX dict_entry = cfg->vertice_pool.vertice_dict[dict_index];
VERTICE_INDEX current_index = cfg->vertice_pool.current_vertice_index;
if (dict_entry >= 0 && dict_entry < 65535 && current_index > 0)
return dict_entry;
LVecBase3f* offset = cfg->base_position;
LVecBase2f* dim = cfg->dimensions;
LVecBase2f* tex_offset = cfg->texture_offset;
LVecBase2f* tex_scale = cfg->texture_scale;
int pool_index = ((int)current_index) * 5;
float base_scale = 1.0 / (cfg->subdivisions_f-1.0);
float x_scaled = x * base_scale;
float y_scaled = y * base_scale;
cfg->vertice_pool.vertice_array[pool_index+0] = offset->get_x() + (x_scaled * dim->get_x());
cfg->vertice_pool.vertice_array[pool_index+1] = offset->get_y() + (y_scaled * dim->get_y());
cfg->vertice_pool.vertice_array[pool_index+2] = offset->get_z();
cfg->vertice_pool.vertice_array[pool_index+3] = tex_offset->get_x() + (x_scaled * tex_scale->get_x());
cfg->vertice_pool.vertice_array[pool_index+4] = tex_offset->get_y() + (y_scaled * tex_scale->get_y());
cfg->vertice_pool.vertice_dict[dict_index] = current_index;
cfg->vertice_pool.current_vertice_index++;
return current_index;
}
LVecBase3f
and LVecBase2f
are vector-types provided by the graphics-engine I use. VERTICE_INDEX
is a unsigned short
, POSARR_ADJ
and POSARR_ADJ_F
is constant 2
.
This is the chunkConfig struct:
struct chunkConfig {
int subdivisions;
int subdivisions_adj;
float subdivisions_f;
LVecBase3f *base_position;
LVecBase2f *dimensions;
LVecBase2f *texture_offset;
LVecBase2f *texture_scale;
verticePool vertice_pool;
};
struct verticePool {
VERTICE_INDEX current_vertice_index;
VERTICE_INDEX current_primitive_index;
VERTICE_INDEX * vertice_dict;
float *vertice_array;
VERTICE_INDEX *primitive_array;
};
Performance result measured by very-sleepy: link
Version based on the suggestions made in the comments: very-sleepy, AMD CodeAnalyst, and the assembler for the slow-line: generated assembler, I also made some arrays global, and renamed "vertice" to "vertex".
1 Answer 1
It looks to me like your function is actually doing 2 things:
- It's checking to see if an entry exists in the dictionary, and if so return it
- If it doesn't exist, then it creates a new one
So I would break it into 2 functions:
VERTICE_INDEX FindVerticeInDict(const float x, const float y,
const ChunkGenerator::chunkConfig& cfg, VERTICE_INDEX& current_index);
VERTICE_INDEX GetOrCreateVertice(const float x, const float y,
ChunkGenerator::chunkConfig *cfg);
Note that I made x
and y
constants because they aren't changed by either function. Also, in the Find
method, I made the cfg
argument const
because it's not being changed there either. It is modified in GetOrCreate
though, so I left it as a pointer. (It could also be a non-const reference, if you prefer that style.)
I would create a named constant for the "entry not found" value:
const VERTICE_INDEX ENTRY_NOT_FOUND = 0xFFFF;
So it would look something like this:
VERTICE_INDEX FindVerticeInDict(const float x, const float y,
const ChunkGenerator::chunkConfig& cfg, VERTICE_INDEX& current_index)
{
int x_pos = int((x*POSARR_ADJ_F) + 0.5f);
int y_pos = int((y*POSARR_ADJ_F) + 0.5f);
int dict_index = x_pos + (y_pos * cfg.subdivisions_adj);
VERTICE_INDEX dict_entry = cfg.vertice_pool.vertice_dict[dict_index];
current_index = cfg.vertice_pool.current_vertice_index;
if (dict_entry >= 0 && dict_entry < ENTRY_NOT_FOUND && current_index > 0)
return dict_entry;
return ENTRY_NOT_FOUND;
}
For your vertex array, you really should create a struct that contains the members you're going to use rather than looking them up in a 1D array of float
s. It's error prone, hard-to-read, and hard-to-maintain the way you're doing it now. I'd create a struct like this:
typedef struct Vertex {
Point3D vertex_coord; // This might be the same as LVecBase3f?
Point2D texture_coord; // This might be the same as LVecBase2f?
} Vertex;
Where Point3D
is just:
typedef struct Point3D {
float x;
float y;
float z;
} Point3D;
and Point2D
just has an x
and y
float values. Doing this will allow you to remove the magical pool_index = ((int)current_index) * 5;
, and have names for the values you're writing.
Also, you don't need to use the inline
directive. The compiler can decide to ignore it, and it can decide to make something you didn't mark as inline inline. So there's not a lot of reason to use it.
You can reduce the verbosity of your code, and possibly reduce the number of pointer dereferences the compiler produces by getting the address of the vertex you want and assigning all its members through that one pointer.
Given the above, your function would now look like this:
VERTICE_INDEX GetOrCreateVertice(const float x, const float y,
ChunkGenerator::chunkConfig *cfg)
{
VERTICE_INDEX current_index = 0;
VERTICE_INDEX dict_entry = FindVerticeInDict(x, y, *cfg, current_index);
if (dict_entry != ENTRY_NOT_FOUND)
return dict_entry;
LVecBase3f* offset = cfg->base_position;
LVecBase2f* dim = cfg->dimensions;
LVecBase2f* tex_offset = cfg->texture_offset;
LVecBase2f* tex_scale = cfg->texture_scale;
float base_scale = 1.0 / (cfg->subdivisions_f-1.0);
float x_scaled = x * base_scale;
float y_scaled = y * base_scale;
Vertex* vertex = &cfg->vertice_pool.vertice_arry [ current_index ];
vertex->vertex_coord.x = offset->get_x() + (x_scaled * dim->get_x());
vertex->vertex_coord.y = offset->get_y() + (y_scaled * dim->get_y());
vertex->vertex_coord.z = offset->get_z();
vertex->texture_coord.x = tex_offset->get_x() + (x_scaled * tex_scale->get_x());
vertex->texture_coord.y = tex_offset->get_y() + (y_scaled * tex_scale->get_y());
cfg->vertice_pool.vertice_dict[dict_index] = current_index;
cfg->vertice_pool.current_vertice_index++;
return current_index;
}
dict_index
takes a whole second, assumingsubdivisions
andPOSARR_ADJ
are integers. \$\endgroup\$