Rendering Instanced Crowds

July 14, 2022About 3 min

Rendering Instanced Crowds

简单来说把动画数据存储到贴图上，在shader的顶点函数中计算动画蒙皮。

Baking animations into a texture

可以使用一个GL_RGBA32F格式的贴图来存储动画信息，每个像素有4个32位浮点数来存。这个格式对于移动平台效果不好，可以考虑使用具有GL_UNSIGNED储存类型的GL_RGA。使用GL_UNSIGNED_BYTE存储类型，颜色的每个组件被限制为256个唯一值。这些值在采样时被规范化，并将在0到1的范围内返回。

同样贴图也可以把小的合并为大的，也就是说把所有动画都烘焙到一张大贴图上。

贴图的横坐标为帧数，纵坐标为骨骼信息，骨骼信息有三个：位置、旋转、缩放。一个信息放一个像素。简单的示意图：

每一格就是一个像素。那么伪代码：

for (unsigned int x = 0; x < texWidth; ++x)
{
    time = (x / texWidith) * clip.GetDuration();
    clip.Sample(pose, time);
    for (unsigned int y = 0; y < pose.Size() * 3; y += 3)
    {
    	Transform node = pose.GetGlobalTransform(y / 3);
    	outTex.SetTexel(x, y + 0, node.position);
    	outTex.SetTexel(x, y + 1, node.rotation);
    	outTex.SetTexel(x, y + 2, node.scale);
    }
}

这里x就是Frame，y就是Joint。因为一个骨骼有三排所以这里需要乘3。

Sampling animation textures in a vertex shader

这里需要在CPU中更新当前实例播放动画到那一帧了，使用uniform传过来即可，同时也需要下下一帧，以便做两帧的插值。在shader的顶点函数中根据骨骼和播放帧，直接对动画贴图采样，可以读出骨骼的信息：

mat4 GetPose(int joint, int instance)
{
    int x_now = frames[instance].x;
    int x_next = frames[instance].y;
    int y_pos = joint * 3;

	// Sample the current frame's position, rotation, and scale from the animation texture:
    vec4 pos0 = texelFetch(animTex, ivec2(x_now, (y_pos + 0)), 0);
    vec4 rot0 = texelFetch(animTex, ivec2(x_now, (y_pos + 1)), 0);
    vec4 scl0 = texelFetch(animTex, ivec2(x_now, (y_pos + 2)), 0);

	// Sample the next frame's position, rotation, and scale from the animation texture:
    vec4 pos1 = texelFetch(animTex, ivec2(x_next, (y_pos + 0)), 0);
    vec4 rot1 = texelFetch(animTex, ivec2(x_next, (y_pos + 1)), 0);
    vec4 scl1 = texelFetch(animTex, ivec2(x_next, (y_pos + 2)), 0);

	// Interpolate between the transforms of both frames:
    if (dot(rot0, rot1) < 0.0) { rot1 *= -1.0; }

    vec4 position = mix(pos0, pos1, time[instance]);
    vec4 rotation = normalize(mix(rot0, rot1, time[instance]));
    vec4 scale = mix(scl0, scl1, time[instance]);

	// Use the interpolated position, rotation, and scale to return a 4x4 matrix:
    vec3 xBasis = QMulV(rotation, vec3(scale.x, 0, 0));
    vec3 yBasis = QMulV(rotation, vec3(0, scale.y, 0));
    vec3 zBasis = QMulV(rotation, vec3(0, 0, scale.z));

    return mat4(
        xBasis.x, xBasis.y, xBasis.z, 0.0,
        yBasis.x, yBasis.y, yBasis.z, 0.0,
        zBasis.x, zBasis.y, zBasis.z, 0.0,
        position.x, position.y, position.z, 1.0
    );
}

Vertex Skinning

在顶点函数实现蒙皮：

void main()
{
	// finding all four of the animated pose matrices,
	// as well as the model matrix for the current actor in the crowd
	mat4 pose0 = GetPose(joints.x, gl_InstanceID);
	mat4 pose1 = GetPose(joints.y, gl_InstanceID);
	mat4 pose2 = GetPose(joints.z, gl_InstanceID);
	mat4 pose3 = GetPose(joints.w, gl_InstanceID);

	mat4 model = GetModel(gl_InstanceID);

	// Continue implementing the main function by finding the skin matrix for the vertex:
    mat4 skin = (pose0 * invBindPose[joints.x]) * weights.x;
    skin += (pose1 * invBindPose[joints.y]) * weights.y;
    skin += (pose2 * invBindPose[joints.z]) * weights.z;
    skin += (pose3 * invBindPose[joints.w]) * weights.w;

	// Finish implementing the main function by putting the position
	// and normal through the skinned vertex's transformation pipeline:
    gl_Position = projection * view * model * skin * vec4(position, 1.0);
    fragPos = vec3(model * skin * vec4(position, 1.0));
    norm = vec3(model * skin * vec4(normal, 0.0f));
    uv = texCoord;
}

Blending animations

不建议在顶点着色器中做两个动画的混合，主要原有有两个：

贴图采样的次数增加一倍
顶点着色器中使用的线性插值，在世界空间可能是对的，但是在本地空间就不一定。

Optimizing the crowd system

使用顶点着色器渲染人群时，每个顶点还是需要采样4次贴图，这是非常消耗的。

Limiting influences

最简单的方式就是根据权重的大小来控制是否采样图片：

mat4 pose0 = (weights.x < 0.0001) ? mat4(1.0) : GetPos(joints.x, instance);
mat4 pose1 = (weights.y < 0.0001) ? mat4(1.0) : GetPos(joints.y, instance);
mat4 pose2 = (weights.z < 0.0001) ? mat4(1.0) : GetPos(joints.z, instance);
mat4 pose3 = (weights.w < 0.0001) ? mat4(1.0) : GetPose(joints.w, instance);

可以把每根骨骼影响到的顶点数降低到1-2个。

Limiting animated components

限制动画信息，也就是让动画尽量简单，比如骨骼只有旋转没有位移和旋转。那么这样就可以只在贴图中储存旋转信息，同样这个方法也减小了动画贴图的大小。

Not interpolating

两帧之间可以不选择插值，这样既减少了采样次数，又减少了传递信息。