If I were to write a game engine today ... the rendering system would

These are some thoughts on what a modern rendering system would require.


As this engine would not be ready for current gen architecture, it should go for next gen - namely a microtriangle approach.



The rendering interface

The latest Direct X and Vulkan cover the vast majority of platforms. I think Apple devices require Metal?

The game layer

Procedural generation of meshes on the client would help reduce client install size. Currently, all meshes are generated by whatever means and then exported into a bag of triangles. This takes up a massive amount of memory and it would be much more efficient to generate the mesh on the client from a seed. The same applies to textures; a run time version of substance could massively reduce the amount of disk space (if they're still needed at all). Whether the textures are all generated on first run, part of the install process, or on demand are all viable options with trade-offs. The generation is quick enough to be on demand; especially if the anti-aliased shape generation process was improved.

The micro triangle approach is building on the GPEG mantra of 'the polygon is the new pixel' and takes that to its logical conclusion. Basically, the triangles are so small they represent a pixel of screen space, and the pixel color is determined by the color of the triangle. This means textures are repurposed, procedural LOD becomes much more practical, sub-pixel antialiasing becomes practical, the normal map (and ambient occlusion?) is implicit, and the rendering pipeline is much simpler.

An additional level of hierarchy is required to handle the massive number of triangles and vertices - this would be clusters. This helps in culling (cull an entire cluster), splitting up the mesh into manageable chunks for streaming, and makes it easier for the GPU to issue its own draw calls - a massive performance boon.

Let's do a back of the envelope comparison of a 50k triangle textured object vs 1M microtriangle object and a 4M microtriangle object.

Given an uncompressed vertex being:

	float[3] Coordinates;
	float[3][3] Normals;
	float[2][2] TextureCoordinates;
... and a compressed vertex being:
	float[3] Coordinates;
	byte[2][4] PackedNormals;
	float16[2][2] TextureCoordinates;
Item Count Uncompressed Compression Run time size
Vertices 50k 3.2M 1.4M
Indices/Clusters 150k 600k Array of shorts 300k
Diffuse map 4k 67M BC1 - 8:1 11.2M
Normal map 4k 67M BC5 - 4:1 22.3M
ORM map 2k 4M BC1 - 8:1 2.8M
Given an uncompressed vertex being:
	float[3] Coordinates;
	float[3][3] Normals;
... and an uncompressed triangle being:
	int[3] VertexIndices;
	int RGBA;
	int ORM;
... and a compressed vertex being:
	float[3] Coordinates;
	byte[2][4] PackedNormals;
... and a compressed triangle being:
	int[3] VertexIndices;
	[BC3 diffuse texture]
	[BC1 ORM texture]
Item Count Uncompressed Compression Run time size
Vertices 1M 48M 20M
Indices/Clusters 3M 12M Array of ints 12M
Triangles 1M 4M 4M
Diffuse 1k 4M BC3 - 4:1 1.33M
ORM 1k 4M BC1 - 8:1 666k
Item Count Uncompressed Compression Run time size
Vertices 4M 192M 80M
Indices/Clusters 12M 48M Array of ints 48M
Triangles 4M 16M 16M
Diffuse 2k 16M BC3 - 4:1 5.33M
ORM 2k 16M BC1 - 8:1 1.33M

This brings up several open questions:

  • Is the ambient occlusion map an approximation for local shadowing? If this is the case, would it be needed in the microtriangle approach?
  • What is the quality crossover point? 1M triangles may not be enough.
  • Can the indices be shared among multiple meshes? Can they be procedurally generated?
  • How can the vertex memory be reduced?
  • If ambient occlusion isn't needed, which other channel could we put in there, and what can we put in the 4th channel? Fresnel can be calculated from the refraction indices, but pre-calculating this would probably be wise. Likewise reflection can be derived, but should probably be calculated in advance. Emissive could be a single value if the light color was the same as the triangle color; could a red triangle emit a blue light? What about specular?
  • Some blue sky thinking:

  • Could the vertex coordinates be stored in a BC6H texture? (interpolated half float accuracy) Apart from the 4:1 compression, could the mips then be used for LODs? The indices would be a problem here.
  • If the vertices were stored in textures, would the LODs from mips be able to be sampled anisotropically for multiple LODs being rendered concurrently?