If I were to write a game engine today ... the rendering system would

These are some thoughts on what a modern rendering system would require.

Introduction

As this engine would not be ready for current gen architecture, it should go for next gen - namely a microtriangle approach.

Implementation

Components

The rendering interface

The latest Direct X and Vulkan cover the vast majority of platforms. I think Apple devices require Metal?

The game layer

Procedural generation of meshes on the client would help reduce client install size. Currently, all meshes are generated by whatever means and then exported into a bag of triangles. This takes up a massive amount of memory and it would be much more efficient to generate the mesh on the client from a seed. The same applies to textures; a run time version of substance could massively reduce the amount of disk space (if they're still needed at all). Whether the textures are all generated on first run, part of the install process, or on demand are all viable options with trade-offs. The generation is quick enough to be on demand; especially if the anti-aliased shape generation process was improved.

The micro triangle approach is building on the GPEG mantra of 'the polygon is the new pixel' and takes that to its logical conclusion. Basically, the triangles are so small they represent a pixel of screen space, and the pixel color is determined by the color of the triangle. This means textures are repurposed, procedural LOD becomes much more practical, sub-pixel antialiasing becomes practical, the normal map (and ambient occlusion?) is implicit, and the rendering pipeline is much simpler.

An additional level of hierarchy is required to handle the massive number of triangles and vertices - this would be clusters. This helps in culling (cull an entire cluster), splitting up the mesh into manageable chunks for streaming, and makes it easier for the GPU to issue its own draw calls - a massive performance boon.

Let's do a back of the envelope comparison of a 50k triangle textured object vs 1M microtriangle object and a 4M microtriangle object.

Given an uncompressed vertex being:

	float[3] Coordinates;
	float[3][3] Normals;
	float[2][2] TextureCoordinates;

... and a compressed vertex being:

	float[3] Coordinates;
	byte[2][4] PackedNormals;
	float16[2][2] TextureCoordinates;

Item	Count	Uncompressed	Compression	Run time size
Vertices	50k	3.2M		1.4M
Indices/Clusters	150k	600k	Array of shorts	300k
Diffuse map	4k	67M	BC1 - 8:1	11.2M
Normal map	4k	67M	BC5 - 4:1	22.3M
ORM map	2k	4M	BC1 - 8:1	2.8M
				38M

Given an uncompressed vertex being:

	float[3] Coordinates;
	float[3][3] Normals;

... and an uncompressed triangle being:

	int[3] VertexIndices;
	int RGBA;
	int ORM;

... and a compressed vertex being:

	float[3] Coordinates;
	byte[2][4] PackedNormals;

... and a compressed triangle being:

	int[3] VertexIndices;
	[BC3 diffuse texture]
	[BC1 ORM texture]

Item	Count	Uncompressed	Compression	Run time size
Vertices	1M	48M		20M
Indices/Clusters	3M	12M	Array of ints	12M
Triangles	1M	4M		4M
Diffuse	1k	4M	BC3 - 4:1	1.33M
ORM	1k	4M	BC1 - 8:1	666k
				38M

Item	Count	Uncompressed	Compression	Run time size
Vertices	4M	192M		80M
Indices/Clusters	12M	48M	Array of ints	48M
Triangles	4M	16M		16M
Diffuse	2k	16M	BC3 - 4:1	5.33M
ORM	2k	16M	BC1 - 8:1	1.33M
				150MB

This brings up several open questions:

Is the ambient occlusion map an approximation for local shadowing? If this is the case, would it be needed in the microtriangle approach?

What is the quality crossover point? 1M triangles may not be enough.

Can the indices be shared among multiple meshes? Can they be procedurally generated?

How can the vertex memory be reduced?

If ambient occlusion isn't needed, which other channel could we put in there, and what can we put in the 4th channel? Fresnel can be calculated from the refraction indices, but pre-calculating this would probably be wise. Likewise reflection can be derived, but should probably be calculated in advance. Emissive could be a single value if the light color was the same as the triangle color; could a red triangle emit a blue light? What about specular?

Some blue sky thinking:

Could the vertex coordinates be stored in a BC6H texture? (interpolated half float accuracy) Apart from the 4:1 compression, could the mips then be used for LODs? The indices would be a problem here.

If the vertices were stored in textures, would the LODs from mips be able to be sampled anisotropically for multiple LODs being rendered concurrently?