search:   
Journal of YsaneyaBy Ysaneya      
Main project:

Infinity, a space-based MMOG

Forums

Sunday, November 2, 2008
Normal maps on the GPU

In the last journal, I was explaining that one of the main benefits of working on the GPU instead of the CPU is the ability to create normal maps. Of course, it would be technically possible to generate them on the CPU too, but it would be far too costly, even with multithreading ( which the engine already supports ).

How does it work ? Well, each time the terrain quadtree gets subdivided ( as the viewer zooms in ), I'm rendering to a temporary F32 2D texture the heightmap for the new terrain patch. The shader generates, for each pixel of this heightmap, procedural height values that are a function of the XYZ position on the spherical surface. Then for each terrain node, I create a unique RGBA8 texture and render the normal map into it by deriving the heightmap of the previous step's temporary buffer.

The resolution of the per-patch normal map is depending on the depth level of the patch in the quadtree. At the top-most level, the map is 1024^2.  For levels 1 and 2, it uses 512^2. Finally, for all other levels, the resolution is 256^2.

To maximize precision, I wanted to use lighting in tangent space, but due to various lighting artifacts and the tangent space not matching on the planet's cube faces, I ended up using object-space normal maps / lighting. It looks pretty good except on the very last depth levels ( it looks very noisy ), but I'm not sure yet if the bad lighting is coming from lack of precision in the XYZ inputs of the planet ( remember: it's using relatistic scales, and on the GPU I cannot use fp64 ), the lack of precision in the procedural heights generation, or the lack of precision in the object-space normals. Or maybe a combination of the 3... 

In the coming week(s) I'll continue to try to increase the precision in the last depth levels, and reimplement diffuse texturing ( still lacking in the following pictures ).

Craters

I've finally been able to produce a decent crater noise function, based on the cell noise ( see previous journal ). Each crater has a perfectly circular shape but then is deformed by a few octaves of fBm, to make it more natural. In the following pics, I'm combining some fractals for the land base and 3 octaves of crater noise. I'm still not 100% happy with it though, as I think the craters's distribution is too uniform, so I will probably continue to play with it a bit.

For the fun, here's the final procedural function that generates all the pics in this journal:

float height(in vec3 world)
{
    float land = gpuFbm3D(6, world * 2.0, 2.0, 0.7);
    float land2 = gpuMultiFractal3D(16, world * 5.18, 0.8, 2.0, 0.05) * 0.5 - 2.5;
    land = (land + land2 * 0.5) * 4.0;

    float n0 = gpuFbm3D(10, world * 8.0, 2.5, 0.5) * 0.05;
    vec2 c0 = gpuCellCrater3DB((world + n0 * 2.0) * (4.0 + n0 * 16.0) * vec3(1, 1, 1), 3.0, 0.5);
    land += c0.x * 3.0;

    c0 = gpuCellCrater3DB((world + n0 * 1.0) * (16.0 + n0 * 1.0) * vec3(1, 1, 1), 3.0, 0.5);
    land += c0.x * 2.0;

    c0 = gpuCellCrater3DB((world + n0 * 0.5) * (64.0 + n0 * 1.0) * vec3(1, 1, 1), 3.0, 0.5);
    land += c0.x * 1.0;
    return land + 1.0;
}
 


Pictures

crater6_med.jpg

crater7_med.jpg

 crater8_med.jpg

crater9_med.jpg

crater10_med.jpg

Comments: 6 - Leave a Comment

Link



Saturday, October 25, 2008
GPU Planetary Generation

Motivation

Until now, the planetary generation algorithm was running on the CPU synchronously. This means that each time the camera zoomed in on the surface of the planet, each terrain node was getting split into 4 children, and a heightmap was generated synchronously for each child.

Synchronous generation means that rendering is paused until the data is generated for each child node. We're talking of 10-20 milliseconds here, so it's not that slow; but since 4 childs are generated at a time, those numbers are always multiplied by 4. So the cost is around 40-80 ms per node that is getting split. Unfortunately, splits happen in cascade, so it's not rare to have no split at all during one second, and suddenly 2 or 3 nodes get split, resulting in a pause of hundreds of milliseconds in the rendering.

I've addressed this issue by adding asynchronous CPU terrain generation: a thread is used to generate data at its own rythm, and the rendering isn't affected too harshly anymore. This required to introduce new concepts and new interfaces ( like a data generation interface ) to the engine, which took many weeks.

After that, I prepared a new data generation interface that uses the GPU instead of the CPU. To make it short, I encountered a lot of practical issues with it, like PBOs ( pixel buffer objects ) not behaving as expected on some video cards, or the lack of synchronization extension on ATI cards ( I ended up using occlusion queries with an empty query to know when a texture has been rendered ), but now it's more or less working.

Benefits

There are a lot of advantages to generating data on the GPU instead of the CPU: the main one is that, thanks to the higher performance, I will now be able to generate normal maps for the terrain, which was too slow before. This will increase lighting and texturing accuracy, and make planets ( especially when seen from orbit ) much nicer. Until now, planets seen from space weren't looking too good due to per-vertex texturing; noise and clouds helped to hide the problem a bit, but if you look carefully at the old screenshots, you'll see what I mean.

The second advantage is that I can increase the complexity of the generation algorithm itself, and introduce new noise basis types, in particular the cell noise ( see Voronoi diagrams on wikipedia ).

Another advantage is debug time. Previously, playing with planetary algorithms and parameters was taking a lot of time: changing some parameters, recompiling, launching the client, spawning a planet, moving the camera around the planet to see how it looks, rinse and repeat. Now I can just change the shader code, it gets automatically reloaded by the engine and the planet updates on-the-fly: no need to quit the client and recompile. It's a lot easier to play with new planets, experiment, change parameters, etc..

I'm not generating normal maps yet ( I will probably work on that next week ), and there's no texturing; in the coming pictures, all the planet pictures you will see only show the heightmap ( grayscale ) shaded with atmospheric scattering, and set to blue below the water threshold. As incredible as it sounds, normal mapping or diffuse/specular textures are not in yet.

Cell noise

.. aka Voronoi diagrams. The standard implementation on the cpu uses a precomputed table containing N points, and when sampling a 3D coordinate, checking the 1 or 2 closest distances to each of the N points. The brute-force implementation is quite slow, but it's possible to optimize it by adding a lookup grid. Now, doing all of that on the GPU isn't easy, but fortunately there's a simpler alternative: procedurally generating the sample points on-the-fly.

The only thing needed is a 2D texture that contains random values from 0 to 1 in the red/green/blue/alpha channels; nothing else. We can then use a randomization function that takes 3D integer coordinates and returns a 4D random vector:

vec4 gpuGetCell3D(const in int x, const in int y, const in int z)
{
	float u = (x + y * 31) / 256.0;
	float v = (z - x * 3) / 256.0;
	return(texture2D(cellRandTex, vec2(u, v)));
}


The cellNoise function then samples the 27 adjacent cells around the sample point, generate a cell position in 3D given the cell coordinates, and get the distance to the sample point. Note that distances are squared until the last moment to save calculations:

vec2 gpuCellNoise3D(const in vec3 xyz)
{
	int xi = int(floor(xyz.x));
	int yi = int(floor(xyz.y));
	int zi = int(floor(xyz.z));

	float xf = xyz.x - float(xi);
	float yf = xyz.y - float(yi);
	float zf = xyz.z - float(zi);

	float dist1 = 9999999.0;
	float dist2 = 9999999.0;
	vec3 cell;

	for (int z = -1; z <= 1; z++)
	{
		for (int y = -1; y <= 1; y++)
		{
			for (int x = -1; x <= 1; x++)
			{
				cell = gpuGetCell3D(xi + x, yi + y, zi + z).xyz;
				cell.x += (float(x) - xf);
				cell.y += (float(y) - yf);
				cell.z += (float(z) - zf);
				float dist = dot(cell, cell);
				if (dist < dist1)
				{
					dist2 = dist1;
					dist1 = dist;
				}
				else if (dist < dist2)
				{
					dist2 = dist;
				}
			}
		}
	}
	return vec2(sqrt(dist1), sqrt(dist2));
}


The two closest distances are returned, so you can use F1 and F2 functions ( ex.: F2 = value.y - value.x ). It's in 3D, which is perfect for planets, so seams won't be visible between planetary faces:

gpugen4_med.jpg
 
New planetary features

Using the cell noise and the GPU terrain generation, I'm now able to create new interesting planetary shapes and features. Have a look yourself:

Rivers

"Fake" rivers I'm afraid, as it's only using the ocean-level threshold and they don't flow from high altitudes to low altitudes, but it's better than nothing. When seen from orbit, there is some aliasing, so not all pixels of a river can be seen.

It's simply some cell noise with the input displaced by a fractal ( 4 octaves ):

gpugen1_med.jpg

gpugen2_med.jpg

gpugen3_med.jpg

Craters

I've started to experiment on craters. It's a variation of cell noise, with 2 differences: extinction ( a density value is passed to the function, which is used to kill a certain number of cells ), and instead of returning the distance, return a function of the distance. This function of distance is modeled to generate a circular, crater-like look.

Here's a quick experiment with 90% extinction. The inputs are also displaced with a small fractal:

gpugen6_med.jpg

And here's the result with a stronger displacement:
 
gpugen5_med.jpg

The next step is to add more octaves of crater noise:

gpugen7_med.jpg

It doesn't look too good yet, mostly because the craters at different octaves are just additively added and not combined properly. More experiments on that later.

Planetary experiments

When adding more octaves and combining different functions together, then adding back atmosphere scattering and the ocean threshold, the results start to look interesting. Keep in mind that all the following pictures are just the grayscale heightmap, and nothing else: no normal mapping or no texturing yet !

gpugen8_med.jpg

gpugen9_med.jpg

gpugen10_med.jpg

gpugen11_med.jpg

gpugen12_med.jpg

gpugen13_med.jpg

gpugen14_med.jpg

gpugen15_med.jpg

Comments: 12 - Leave a Comment

Link


All times are ET (US)

 
S
M
T
W
T
F
S
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

OPTIONS
Track this Journal

 RSS 

ARCHIVES
November, 2008
October, 2008
July, 2008
June, 2008
May, 2008
April, 2008
March, 2008
January, 2008
December, 2007
November, 2007
October, 2007
September, 2007
August, 2007
July, 2007
June, 2007
May, 2007
April, 2007
March, 2007
February, 2007
January, 2007
December, 2006
November, 2006
October, 2006
September, 2006
August, 2006
July, 2006
June, 2006
May, 2006
April, 2006
March, 2006
February, 2006
January, 2006
December, 2005
November, 2005
October, 2005
September, 2005
August, 2005
July, 2005
June, 2005
May, 2005
April, 2005
March, 2005
February, 2005
January, 2005
December, 2004
October, 2004
September, 2004
August, 2004