Rendering Instanced Crowds
July 14, 2022About 3 min
Rendering Instanced Crowds
简单来说把动画数据存储到贴图上,在shader的顶点函数中计算动画蒙皮。
Baking animations into a texture
可以使用一个GL_RGBA32F
格式的贴图来存储动画信息,每个像素有4个32位浮点数来存。这个格式对于移动平台效果不好,可以考虑使用具有GL_UNSIGNED
储存类型的GL_RGA
。使用GL_UNSIGNED_BYTE存储类型,颜色的每个组件被限制为256个唯一值。这些值在采样时被规范化,并将在0到1的范围内返回。
同样贴图也可以把小的合并为大的,也就是说把所有动画都烘焙到一张大贴图上。
贴图的横坐标为帧数,纵坐标为骨骼信息,骨骼信息有三个:位置、旋转、缩放。一个信息放一个像素。简单的示意图:

每一格就是一个像素。那么伪代码:
for (unsigned int x = 0; x < texWidth; ++x)
{
time = (x / texWidith) * clip.GetDuration();
clip.Sample(pose, time);
for (unsigned int y = 0; y < pose.Size() * 3; y += 3)
{
Transform node = pose.GetGlobalTransform(y / 3);
outTex.SetTexel(x, y + 0, node.position);
outTex.SetTexel(x, y + 1, node.rotation);
outTex.SetTexel(x, y + 2, node.scale);
}
}
这里x就是Frame,y就是Joint。因为一个骨骼有三排所以这里需要乘3。
Sampling animation textures in a vertex shader
这里需要在CPU中更新当前实例播放动画到那一帧了,使用uniform
传过来即可,同时也需要下下一帧,以便做两帧的插值。在shader的顶点函数中根据骨骼和播放帧,直接对动画贴图采样,可以读出骨骼的信息:
mat4 GetPose(int joint, int instance)
{
int x_now = frames[instance].x;
int x_next = frames[instance].y;
int y_pos = joint * 3;
// Sample the current frame's position, rotation, and scale from the animation texture:
vec4 pos0 = texelFetch(animTex, ivec2(x_now, (y_pos + 0)), 0);
vec4 rot0 = texelFetch(animTex, ivec2(x_now, (y_pos + 1)), 0);
vec4 scl0 = texelFetch(animTex, ivec2(x_now, (y_pos + 2)), 0);
// Sample the next frame's position, rotation, and scale from the animation texture:
vec4 pos1 = texelFetch(animTex, ivec2(x_next, (y_pos + 0)), 0);
vec4 rot1 = texelFetch(animTex, ivec2(x_next, (y_pos + 1)), 0);
vec4 scl1 = texelFetch(animTex, ivec2(x_next, (y_pos + 2)), 0);
// Interpolate between the transforms of both frames:
if (dot(rot0, rot1) < 0.0) { rot1 *= -1.0; }
vec4 position = mix(pos0, pos1, time[instance]);
vec4 rotation = normalize(mix(rot0, rot1, time[instance]));
vec4 scale = mix(scl0, scl1, time[instance]);
// Use the interpolated position, rotation, and scale to return a 4x4 matrix:
vec3 xBasis = QMulV(rotation, vec3(scale.x, 0, 0));
vec3 yBasis = QMulV(rotation, vec3(0, scale.y, 0));
vec3 zBasis = QMulV(rotation, vec3(0, 0, scale.z));
return mat4(
xBasis.x, xBasis.y, xBasis.z, 0.0,
yBasis.x, yBasis.y, yBasis.z, 0.0,
zBasis.x, zBasis.y, zBasis.z, 0.0,
position.x, position.y, position.z, 1.0
);
}
Vertex Skinning
在顶点函数实现蒙皮:
void main()
{
// finding all four of the animated pose matrices,
// as well as the model matrix for the current actor in the crowd
mat4 pose0 = GetPose(joints.x, gl_InstanceID);
mat4 pose1 = GetPose(joints.y, gl_InstanceID);
mat4 pose2 = GetPose(joints.z, gl_InstanceID);
mat4 pose3 = GetPose(joints.w, gl_InstanceID);
mat4 model = GetModel(gl_InstanceID);
// Continue implementing the main function by finding the skin matrix for the vertex:
mat4 skin = (pose0 * invBindPose[joints.x]) * weights.x;
skin += (pose1 * invBindPose[joints.y]) * weights.y;
skin += (pose2 * invBindPose[joints.z]) * weights.z;
skin += (pose3 * invBindPose[joints.w]) * weights.w;
// Finish implementing the main function by putting the position
// and normal through the skinned vertex's transformation pipeline:
gl_Position = projection * view * model * skin * vec4(position, 1.0);
fragPos = vec3(model * skin * vec4(position, 1.0));
norm = vec3(model * skin * vec4(normal, 0.0f));
uv = texCoord;
}
Blending animations
不建议在顶点着色器中做两个动画的混合,主要原有有两个:
- 贴图采样的次数增加一倍
- 顶点着色器中使用的线性插值,在世界空间可能是对的,但是在本地空间就不一定。
Optimizing the crowd system
使用顶点着色器渲染人群时,每个顶点还是需要采样4次贴图,这是非常消耗的。
Limiting influences
最简单的方式就是根据权重的大小来控制是否采样图片:
mat4 pose0 = (weights.x < 0.0001) ? mat4(1.0) : GetPos(joints.x, instance);
mat4 pose1 = (weights.y < 0.0001) ? mat4(1.0) : GetPos(joints.y, instance);
mat4 pose2 = (weights.z < 0.0001) ? mat4(1.0) : GetPos(joints.z, instance);
mat4 pose3 = (weights.w < 0.0001) ? mat4(1.0) : GetPose(joints.w, instance);
可以把每根骨骼影响到的顶点数降低到1-2个。
Limiting animated components
限制动画信息,也就是让动画尽量简单,比如骨骼只有旋转没有位移和旋转。那么这样就可以只在贴图中储存旋转信息,同样这个方法也减小了动画贴图的大小。
Not interpolating
两帧之间可以不选择插值,这样既减少了采样次数,又减少了传递信息。