On the limitations of the id Tech 3 renderer
Posted: Mon Jul 14, 2008 3:05 pm
As per Tourist-tam's request.
I mainly conducted practical experiments with a lot of objects being rendered simultanously.
The test system is a Pentium 4 3GHz on an Intel i915G chipset-based mainboard with 1.5 GB of DDR400 RAM and a Galaxy GeForce 8600GT with 256 MB of DDR RAM. It can pull off Crysis on High Quality settings at an average FPS of 30.
I created maps which consisted of a small, single room, filled with multiple objects (either detail brushes or entities to ensure no impact on VIS) which were creating the required number of triangles and vertices. I deliberately used single-stage shaders to minimize the workload. The results were rather disappointing - once the number of triangles rendered was exceeding about 50 thousand, the FPS started to drop dramatically, reaching the value of 24 at a triangle count of 125,000. The performance of both engines was almost identical.
Now for the theoretical part. First, a little bit of glossary. In OpenGL terminology, the client is the software that is using OpenGL and its environment (i.e. CPU, system RAM), while server is the 3D video device with its environment (the GPU and the video RAM).
The id Tech 3 engine uses vertex arrays to render stuff. This technique was one of three available back in 1999 (the others being the immediate mode and display lists; you can read about them at the aforementioned website, too). Vertex buffer objects were invented later to tackle performance problems. Vertex arrays were actually the only way to go. Display lists were out of the question because they are static (i.e. their vertex data cannot be changed once they're compiled), and immediate mode is extremely inefficient, as it creates a tremendous amount of overhead because of having to make individual function calls for each vertex information. Why function calls create overhead? Because one requires at least one additional CPU tick. Mutliply those ticks by the number of vertex data calls (one for a vertex position, one for texture coordinates and one for colour data, so that's at least 3 per vertex, and there are situations where it takes more) and you'll end up losing a considerable amount of precious microseconds.
But back to vertex arrays. The whole point of this technique is that while all the geometry data (not the textures, just the vertices and triangle indices) remains in the client memory (the system RAM), the number of calls is reduced - you only give the server a couple of pointer to arrays of relevant data and then tell it ranges of data it should use to draw stuff. This allows to maintain the ability to modify the vertex data by the application on the fly while reducing the number of function calls, often to a single glDrawElement() call.
And the ability to modify that data is the key for the whole shader system to work. All those nifty texture and vertex deformations - they either change the texture coordinates or vertex positions. Of course, in more or less sophisticated ways, but still. That's all there is to it.
I should also mention that all those shader modifier calculations are done by the CPU. So, on modern systems a complex scene (with, say, > 100k triangles) can get quite a lot of load on your CPU while the GPU will be almost idle.
Another thing is the way that multi-stage shaders are handled. Don't know if you were aware of that, but for each shader stage all the elements in the scene bearing that shader must be redrawn again. Yes, you read that right - if you have, let's say, a weapon with a 4-stage shader, it will be drawn 4 times. So, 4 times the memory consumption, 4 times the calculation time. See, power comes at the cost of performance. Of course, the engine tries to collapse multi-stage shaders into single-staged ones, but it isn't always possible.
And last but not least, there is a HORRIBLE duplication of data. When, let's say a player model, is supposed to be rendered into the scene, all of its triangle and vertex data is copied from a memory pool that contains the data loaded from the model file into the vertex array for rendering. And it's like that with everything - the world brushes, patch meshes, static models, everything, and it's done every single frame. It 1) is time consuming, 2) eats ridiculously large amounts of memory. Once again, the shader system is to blame, because if the world was static (i.e. had no dynamic vertex data), it could be turned into a display list compiled at map load time, speeding things up by several orders of magnitude.
As I said, all of this was quite acceptable back in 1999, but with today's hardware it doesn't perform very well on high-complexity scenes. Of course, there is a way to modernize the renderer - rewrite it to use vertex buffer objects and vertex programs (a.k.a. vertex shaders). The latter could be used to emulate most, if not all of the Quake shader features. However, this would drastically rise the hardware requirements (in relation to the original game), as the technology in question requires hardware a few generations newer than what was available in 2002. Plus there still are some pitfalls, as you can see in the performance of the XreaL engine (undisputably the most advanced and up-to-date of all the id Tech 3-derived renderers) - while superior to all the other id Tech 3 solutions at rendering high-complexity scenes, is still unsatisfactory for large outdoor environments that modern game engines can pull off (it starts choking at 250k triangles).
With all of this said, modernizing the renderer would be a job which is well over my head. I lack the necessary experience and knowledge, there is no one else to do the work and the time expense also would not be justified IMHO. That's all I have to say on the subject.
I mainly conducted practical experiments with a lot of objects being rendered simultanously.
The test system is a Pentium 4 3GHz on an Intel i915G chipset-based mainboard with 1.5 GB of DDR400 RAM and a Galaxy GeForce 8600GT with 256 MB of DDR RAM. It can pull off Crysis on High Quality settings at an average FPS of 30.
I created maps which consisted of a small, single room, filled with multiple objects (either detail brushes or entities to ensure no impact on VIS) which were creating the required number of triangles and vertices. I deliberately used single-stage shaders to minimize the workload. The results were rather disappointing - once the number of triangles rendered was exceeding about 50 thousand, the FPS started to drop dramatically, reaching the value of 24 at a triangle count of 125,000. The performance of both engines was almost identical.
Now for the theoretical part. First, a little bit of glossary. In OpenGL terminology, the client is the software that is using OpenGL and its environment (i.e. CPU, system RAM), while server is the 3D video device with its environment (the GPU and the video RAM).
The id Tech 3 engine uses vertex arrays to render stuff. This technique was one of three available back in 1999 (the others being the immediate mode and display lists; you can read about them at the aforementioned website, too). Vertex buffer objects were invented later to tackle performance problems. Vertex arrays were actually the only way to go. Display lists were out of the question because they are static (i.e. their vertex data cannot be changed once they're compiled), and immediate mode is extremely inefficient, as it creates a tremendous amount of overhead because of having to make individual function calls for each vertex information. Why function calls create overhead? Because one requires at least one additional CPU tick. Mutliply those ticks by the number of vertex data calls (one for a vertex position, one for texture coordinates and one for colour data, so that's at least 3 per vertex, and there are situations where it takes more) and you'll end up losing a considerable amount of precious microseconds.
But back to vertex arrays. The whole point of this technique is that while all the geometry data (not the textures, just the vertices and triangle indices) remains in the client memory (the system RAM), the number of calls is reduced - you only give the server a couple of pointer to arrays of relevant data and then tell it ranges of data it should use to draw stuff. This allows to maintain the ability to modify the vertex data by the application on the fly while reducing the number of function calls, often to a single glDrawElement() call.
And the ability to modify that data is the key for the whole shader system to work. All those nifty texture and vertex deformations - they either change the texture coordinates or vertex positions. Of course, in more or less sophisticated ways, but still. That's all there is to it.
I should also mention that all those shader modifier calculations are done by the CPU. So, on modern systems a complex scene (with, say, > 100k triangles) can get quite a lot of load on your CPU while the GPU will be almost idle.
Another thing is the way that multi-stage shaders are handled. Don't know if you were aware of that, but for each shader stage all the elements in the scene bearing that shader must be redrawn again. Yes, you read that right - if you have, let's say, a weapon with a 4-stage shader, it will be drawn 4 times. So, 4 times the memory consumption, 4 times the calculation time. See, power comes at the cost of performance. Of course, the engine tries to collapse multi-stage shaders into single-staged ones, but it isn't always possible.
And last but not least, there is a HORRIBLE duplication of data. When, let's say a player model, is supposed to be rendered into the scene, all of its triangle and vertex data is copied from a memory pool that contains the data loaded from the model file into the vertex array for rendering. And it's like that with everything - the world brushes, patch meshes, static models, everything, and it's done every single frame. It 1) is time consuming, 2) eats ridiculously large amounts of memory. Once again, the shader system is to blame, because if the world was static (i.e. had no dynamic vertex data), it could be turned into a display list compiled at map load time, speeding things up by several orders of magnitude.
As I said, all of this was quite acceptable back in 1999, but with today's hardware it doesn't perform very well on high-complexity scenes. Of course, there is a way to modernize the renderer - rewrite it to use vertex buffer objects and vertex programs (a.k.a. vertex shaders). The latter could be used to emulate most, if not all of the Quake shader features. However, this would drastically rise the hardware requirements (in relation to the original game), as the technology in question requires hardware a few generations newer than what was available in 2002. Plus there still are some pitfalls, as you can see in the performance of the XreaL engine (undisputably the most advanced and up-to-date of all the id Tech 3-derived renderers) - while superior to all the other id Tech 3 solutions at rendering high-complexity scenes, is still unsatisfactory for large outdoor environments that modern game engines can pull off (it starts choking at 250k triangles).
With all of this said, modernizing the renderer would be a job which is well over my head. I lack the necessary experience and knowledge, there is no one else to do the work and the time expense also would not be justified IMHO. That's all I have to say on the subject.
