Thursday, September 16, 2010

Optimizing Vertex Formats

32 bytes per vertex is optimal for the hardware vertex cache.

We can safely pack normals and tangents to a uint32 each.

Also, I mainly use the second UV set with unique mappings / lightmaps, so it is ensured that the coords always lie within [0,1] - this allows to use D3DDECLTYPE_SHORT2N (the unnormalized SHORT2 type is not supported on my trusty old ATi x700 mobility..). To convert your float UVs you just multiply them with 32767.0f

As a result we get a nice vertexformat like this:

// size = 32 bytes
struct Vertex {
 FVec3 pos;
 VecU32 nrm;
 FVec2 uv;
        short uv2[2];
 VecU32 tan;
};

D3DVERTEXELEMENT9 declExt[] = {
 // stream, offset, type, method, usage, usageIndex
 { 0, 0, D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_POSITION, 0 },
 { 0, 12, D3DDECLTYPE_UBYTE4, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_NORMAL, 0 },
 // 2d uv
 { 0, 16, D3DDECLTYPE_FLOAT2, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 0 },
 { 0, 24, D3DDECLTYPE_SHORT2N, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 1 },
 // tangent
 { 0, 28, D3DDECLTYPE_UBYTE4, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 2 },
 D3DDECL_END()
};

In your shader just use:

struct VertexInput {
 float4 position : POSITION;
 float4 norm : NORMAL;  // compressed uint32
 float2 uv : TEXCOORD0;
 float2 uv2 : TEXCOORD1;
 float4 tangent : TEXCOORD2; // compressed uint32
};

VertexOutput main(VertexInput IN) {
 ...
 // decompress normal & tangent
 float3 N = 2.0f*IN.norm/255.0f-1.0f;
 float4 T = 2.0f*IN.tangent/255.0f-1.0f;
 ...
}

Here is some C++ code to de-/compress your normals:

class VecU32 {
public:
 union {
  u8 dir[4];
  u32 vec32;
 };

 inline void compress(const FVec3& nrm) {
  dir[0] = (u8)((nrm.x * 0.5f + 0.5f) * 255.0f);
  dir[1] = (u8)((nrm.y * 0.5f + 0.5f) * 255.0f);
  dir[2] = (u8)((nrm.z * 0.5f + 0.5f) * 255.0f);
  dir[3] = 255;
 }

 inline void compress(const FVec4& nrm) {
  dir[0] = (u8)((nrm.x * 0.5f + 0.5f) * 255.0f);
  dir[1] = (u8)((nrm.y * 0.5f + 0.5f) * 255.0f);
  dir[2] = (u8)((nrm.z * 0.5f + 0.5f) * 255.0f);
  dir[3] = (u8)((nrm.w * 0.5f + 0.5f) * 255.0f);
 }

 inline FVec4 decompress() {
  return FVec4(
   2.0f*dir[0]/255.0f-1.0f,
   2.0f*dir[1]/255.0f-1.0f,
   2.0f*dir[2]/255.0f-1.0f,
   2.0f*dir[3]/255.0f-1.0f
   );
 }
};

Wednesday, September 8, 2010

Light Pre-Pass Rendering

This is my implementation of Light Pre-Pass Rendering. 


It's a deferred lighting technique (similar to deferred shading, but without the need for big G-Buffers) - Basically, in a first pass you render depth and normals, then accumulate lighting values of all lights into a light buffer rendertarget (using the normals stored in the first pass) and finally in the third step you compose the lighting info with your material info (textures, etc.) during forward rendering.

You should check the presentations by Wolfgang Engel and others for more background info, the method is currently very popular on PS3/XBox (Uncharted, Resistance2, LBP, Blur, to name a few).

Some notes on the code:
  • the ZN pre-pass stores depth and view-space normals packed in a single ARGB8888 rendertarget using 2x8-bit components each
  • to encode the normals I use the old N.z = sqrt(1-dot(N.xy, N.xy) trick which is not 1oo% correct, but works OK for now
  • to en/decode the depth values I came up with the following formulas, didn't check yet if there are better solutions (Note: farZ = cameraFarPlane / 256.0) :
float2 EncodeFloat16(float v) { 
    float fac = v / 256.0f; 
    float fra = frac(fac); 
    return float2((fac-fra) / farZ, fra); 
} 

float DecodeFloat16(float2 v) { 
    return (v.x * farZ + v.y) * 256.0f; 
} 
  • compositing the lighting info in the second (forward) geometry pass is done in linear color space: textures are converted from gamma2 to gamma1 during texture fetch, and the final result is converted back to gamma2 for display
  • the ZN ARGB8888 packing of course causes some artifacts, but it's still acceptable in my current test scenes

Monday, August 30, 2010

Books

Two books I enjoyed reading recently are:

Programming with POSIX Threads by David R. Butenhof (Addison-Wesley)

Very well written, concise and nicely layouted. Additionally to discussing the posix functions in detail, the second half of the book contains further material regarding worker thread pools, synchronization techniques, etc. Highly recommended.

Hacker's Delight by Henry S. Warren (Addison-Wesley, 2003)

Bit-twiddling to the max! This came in handy several times already, when needing to squeeze out some additional CPU cycles in innerloops.
See also: http://www.hackersdelight.org/

Check them out, you might also like them...

Tuesday, August 24, 2010

Assembly 2010

One of the best examples of creativity and skilled programming of GPU effects could be found at Assembly this year, you should definitely check out the released demos and intros.

ASD's "Happiness is around the bend" made 1st place in the competition:

Gamescom

My personal Gamescom highlight was clearly Enslaved by Ninja Theory. I just love the scenario, atmosphere and graphics! I hope the final gameplay will also be improved over Heavenly Sword, but let's see..

Here are some links and pictures:
http://uk.ps3.ign.com/articles/107/1079980p1.html
http://www.joystiq.com/tag/enslaved

Second highlight for sure were the FMVs of Star Wars The Old Republic - wow, they seemed actually higher quality than the films' CGs (except for the super-obvious matte paintings in the background during the first minutes)..
http://www.swtor.com/media/trailers/hope-cinematic-trailer

But the game itself is not my cup of tea, though, I fear...

The rest of the show? Lost in the blurrines of decibels, Kinect hype, 2h waiting queues and hordes of people..

Wednesday, August 4, 2010

Ambient occlusion tests


Testing various features of the engine/3dsmax exporter: multiple uv-channels, ambient occlusion texture baking, etc.