【UE】Unreal的实例化渲染

NoiseFloor · 发表于 2022-12-3 12:55

UE提供了两套用于实现实例化渲染的方案，分别是ISM（Instanced Static Mesh）与HISM(Hierarchical ISM)，下面对这两套方案的实现原理做一个大致的介绍。
ISM

ISM是Instanced Static Mesh Rendering的缩写，这个方案的做法是，将大量（甚至一个场景内所有）的相同实例合并成一个DrawCall进行绘制，这个方案存在如下的一些问题：

HISM

HISM是Hierarchical ISM的缩写，这是基于Cluster聚类的N叉树（称之为Cluster Tree），树上各个节点之间存在层级关系（方便降低剔除消耗），最终提交渲染的只有叶子节点，其他节点的存在则是为了方便剔除。
原理

这个方案的大致原理是将需要渲染的实例用一棵N叉树（ClusterTree）来表示，在渲染的时候，我们会有一个InstanceBuffer，其中存储了待渲染物件的实例信息，在HISM下，可以为每个叶子节点分别提供一个InstanceBuffer，也可以更简单一点，借助图形API可以指定Buffer偏移与长度的能力，使用同一个InstanceBuffer[1]

这棵树每个节点中存储了此节点中对应的实例的起始InstanceID与终止InstanceID，基于双面的InstanceBuffer共用逻辑，我们可以非常方便的实现Cluster（节点）的剔除与渲染数据的提交。

对应到UE的数据结构，这里的整棵树用ClusterTree来表示，其中最关键的FClusterNode代表了树的一个节点：
struct FClusterTree{ TArray<FClusterNode> Nodes; TArray<int32> SortedInstances; TArray<int32> InstanceReorderTable; int32 OutOcclusionLayerNum = 0;};struct FClusterNode{ // 当前节点的BoundingBox信息 FVector3f BoundMin; FVector3f BoundMax; //当前节点的子节点在ClusterTree的Nodes数组中的索引范围 int32 FirstChild; int32 LastChild; // 当前节点所包含的Instance在PerInstanceSMData的索引范围 int32 FirstInstance; int32 LastInstance; FVector3f MinInstanceScale; FVector3f MaxInstanceScale;}//截取部分class ENGINE_API UInstancedStaticMeshComponent : public UStaticMeshComponent, public ISMInstanceManager{ /** Array of instances, bulk serialized. */ TArray<FInstancedStaticMeshInstanceData> PerInstanceSMData;}
针对这个方案，还有一些细节需要澄清，总的来说，有两个问题需要回答：

1. N叉树的构造

给定一套物件实例，我们需要构建出对应的树，这个计算逻辑在UHierarchicalInstancedStaticMeshComponent::BuildTree中可以找到。

按照自顶向下的逻辑，这些实例数据可以组成我们整棵树的根节点，我们首先需要判断当前的实例数据是否需要划分出子节点，如果需要，该怎么划分。

一个节点是否需要划分，在UE中由一个叫做BranchingFactor的参数控制，这个参数表明了一个节点的最大子节点数目（在第一步实例划分的时候，这个数值表明叶子节点中可以装载的实例数目，在后续步骤自底向上构建ClusterTree的时候，这个就对应于父节点下子节点的最大数目），在实例划分（构建叶子节点）的时候，这个数值等于叶子节点的最大顶点数目（每批顶点数目）除以实例（LOD0）顶点数，并Clamp到[1, 1024]：
int32 UHierarchicalInstancedStaticMeshComponent::DesiredInstancesPerLeaf(){ int32 LOD0Verts = GetVertsForLOD(0); int32 VertsToSplit = CVarMinVertsToSplitNode.GetValueOnAnyThread(); if (LOD0Verts) { return FMath::Clamp(VertsToSplit / LOD0Verts, 1, 1024); } return 16;}
如果某个节点包含的实例数目大于BranchingFactor，那么就要进行二分（这里的二分还不是建树，而是在进行聚类，因此跟我们前面说的N叉树不是一个事情），这里二分是基于所有实例的位置进行的，统计所有实例的XYZ三个坐标轴的跨度，取跨度最大的轴，按照实例数目均分为两个cluster（如果是离线的计算，等分逻辑是否还可以优化一下，比如基于PCA？）：
void Split(int32 Start, int32 End){ int32 NumRange = 1 + End - Start; FBox ClusterBounds(ForceInit); // 计算所有实例组成的boundingbox，每个实例用一个点表示，数据存于SortPoints for (int32 Index = Start; Index <= End; Index++) { ClusterBounds += SortPoints[SortIndex[Index]]; } //判断是否需要继续细分，不需要就直接返回 if (NumRange <= BranchingFactor) { Clusters.Add(FRunPair(Start, NumRange)); return; } // 寻找实例分布的最长跨度轴，根据boundingbox来判断 SortPairs.Reset(); int32 BestAxis = -1; float BestAxisValue = -1.0f; for (int32 Axis = 0; Axis < 3; Axis++) { float ThisAxisValue = ClusterBounds.Max[Axis] - ClusterBounds.Min[Axis]; if (!Axis || ThisAxisValue > BestAxisValue) { BestAxis = Axis; BestAxisValue = ThisAxisValue; } } // 根据最长轴进行实例排序 for (int32 Index = Start; Index <= End; Index++) { FSortPair Pair; Pair.Index = SortIndex[Index]; Pair.d = SortPoints[Pair.Index][BestAxis]; SortPairs.Add(Pair); } SortPairs.Sort(); // 为二分做准备 for (int32 Index = Start; Index <= End; Index++) { SortIndex[Index] = SortPairs[Index - Start].Index; } int32 Half = NumRange / 2; int32 EndLeft = Start + Half - 1; int32 StartRight = 1 + End - Half; if (NumRange & 1) { if (SortPairs[Half].d - SortPairs[Half - 1].d < SortPairs[Half + 1].d - SortPairs[Half].d) { EndLeft++; } else { StartRight--; } } // 递归对二分后的实例Cluster进行进一步细分处理 Split(Start, EndLeft); Split(StartRight, End);}
经过上面的递归拆分之后，我们得到了若干个子节点，但同时我们也注意到，在需要二分的时候，实际上在当前函数中并没有添加新的节点，也就是说，经过上述拆分得到的节点（存储于Clusters中）相互之间并没有层级关系，为什么不在拆分的同时把层级关系一并建立起来呢？

这是因为聚类（不论是聚类出叶子节点还是父节点）是按照（空间）二分方式进行的，如果在聚类出叶子节点的同时就计算出对应的父节点，就会存在如下两个问题：

为了构建出这棵树，我们还需要基于这些已有的节点做一次层级的聚类，这个逻辑在FClusterBuilder::BuildTree接口中完成，整体分为如下几步:

for (int32 Index = 0; Index < NumRoots; Index++){ FClusterNode& Node = Result->Nodes[Index]; Node.FirstInstance = Clusters[Index].Start; Node.LastInstance = Clusters[Index].Start + Clusters[Index].Num - 1; FBox NodeBox(ForceInit); for (int32 InstanceIndex = Node.FirstInstance; InstanceIndex <= Node.LastInstance; InstanceIndex++) { const FMatrix& ThisInstTrans = Transforms[SortedInstances[InstanceIndex]]; FBox ThisInstBox = InstBox.TransformBy(ThisInstTrans); NodeBox += ThisInstBox; if (GenerateInstanceScalingRange) { FVector3f CurrentScale(ThisInstTrans.GetScaleVector()); Node.MinInstanceScale = Node.MinInstanceScale.ComponentMin(CurrentScale); Node.MaxInstanceScale = Node.MaxInstanceScale.ComponentMax(CurrentScale); } } Node.BoundMin = (FVector3f)NodeBox.Min; Node.BoundMax = (FVector3f)NodeBox.Max;}

// 自底向上，逐层处理while (NumRoots > 1){ // 将当前层的每个节点（boundingbox的中心）当成一个点（实例） SortIndex.Reset(); SortPoints.Reset(); SortIndex.AddUninitialized(NumRoots); SortPoints.AddUninitialized(NumRoots); for (int32 Index = 0; Index < NumRoots; Index++) { SortIndex[Index] = Index; FClusterNode& Node = Result->Nodes[Index]; SortPoints[Index] = (FVector)(Node.BoundMin + Node.BoundMax) * 0.5f; } // 对上面的点（实例），重复此前划分（聚类）逻辑，不过这里每个cluster的实例数目从MaxInstancesPerLeaf变成了InternalNodeBranchingFactor，这个决定了分支的数目，比如当前层有100个节点，而InternalNodeBranchingFactor = 20，那么我们就得到了5个父节点，也就是说，每个父节点有20个子节点，那么这棵树就是20叉树 BranchingFactor = InternalNodeBranchingFactor; if (BranchingFactor > 2 && OcclusionLayerTarget && NumRoots / BranchingFactor <= OcclusionLayerTarget) { BranchingFactor = FMath::Max<int32>(2, (NumRoots + OcclusionLayerTarget - 1) / OcclusionLayerTarget); OcclusionLayerTarget = 0; bIsOcclusionLayer = true; } Split(NumRoots);//NumRoots是当前层的节点数目 if (bIsOcclusionLayer) { Result->OutOcclusionLayerNum = Clusters.Num(); bIsOcclusionLayer = false; } InverseSortIndex.Reset(); InverseSortIndex.AddUninitialized(NumRoots); for (int32 Index = 0; Index < NumRoots; Index++) { InverseSortIndex[SortIndex[Index]] = Index; } // 对instances进行重排，从而在此前节点顺序调整后，实现数据匹配，达到一个InstanceBuffer实现任意drawcall的目的 { RemapSortIndex.Reset(); RemapSortIndex.AddUninitialized(Num); int32 OutIndex = 0; for (int32 Index = 0; Index < NumRoots; Index++) { FClusterNode& Node = Result->Nodes[SortIndex[Index]]; for (int32 InstanceIndex = Node.FirstInstance; InstanceIndex <= Node.LastInstance; InstanceIndex++) { RemapSortIndex[OutIndex++] = InstanceIndex; } } InverseInstanceIndex.Reset(); InverseInstanceIndex.AddUninitialized(Num); for (int32 Index = 0; Index < Num; Index++) { InverseInstanceIndex[RemapSortIndex[Index]] = Index; } for (int32 Index = 0; Index < Result->Nodes.Num(); Index++) { FClusterNode& Node = Result->Nodes[Index]; Node.FirstInstance = InverseInstanceIndex[Node.FirstInstance]; Node.LastInstance = InverseInstanceIndex[Node.LastInstance]; } OldInstanceIndex.Reset(); Swap(OldInstanceIndex, SortedInstances); SortedInstances.AddUninitialized(Num); for (int32 Index = 0; Index < Num; Index++) { SortedInstances[Index] = OldInstanceIndex[RemapSortIndex[Index]]; } } //同样，对节点进行重排 { RemapSortIndex.Reset(); int32 NewNum = Result->Nodes.Num() + Clusters.Num(); // RemapSortIndex[new index] == old index RemapSortIndex.AddUninitialized(NewNum); LevelStarts.Reset(); LevelStarts.Add(Clusters.Num()); for (int32 Index = 0; Index < NodesPerLevel.Num() - 1; Index++) { LevelStarts.Add(LevelStarts[Index] + NodesPerLevel[Index]); } for (int32 Index = 0; Index < NumRoots; Index++) { FClusterNode& Node = Result->Nodes[SortIndex[Index]]; RemapSortIndex[LevelStarts[0]++] = SortIndex[Index]; int32 LeftIndex = Node.FirstChild; int32 RightIndex = Node.LastChild; int32 LevelIndex = 1; while (RightIndex >= 0) { int32 NextLeftIndex = MAX_int32; int32 NextRightIndex = -1; for (int32 ChildIndex = LeftIndex; ChildIndex <= RightIndex; ChildIndex++) { RemapSortIndex[LevelStarts[LevelIndex]++] = ChildIndex; int32 LeftChild = Result->Nodes[ChildIndex].FirstChild; int32 RightChild = Result->Nodes[ChildIndex].LastChild; if (LeftChild >= 0 && LeftChild < NextLeftIndex) { NextLeftIndex = LeftChild; } if (RightChild >= 0 && RightChild > NextRightIndex) { NextRightIndex = RightChild; } } LeftIndex = NextLeftIndex; RightIndex = NextRightIndex; LevelIndex++; } } checkSlow(LevelStarts[LevelStarts.Num() - 1] == NewNum); InverseChildIndex.Reset(); // InverseChildIndex[old index] == new index InverseChildIndex.AddUninitialized(NewNum); for (int32 Index = Clusters.Num(); Index < NewNum; Index++) { InverseChildIndex[RemapSortIndex[Index]] = Index; } for (int32 Index = 0; Index < Result->Nodes.Num(); Index++) { FClusterNode& Node = Result->Nodes[Index]; if (Node.FirstChild >= 0) { Node.FirstChild = InverseChildIndex[Node.FirstChild]; Node.LastChild = InverseChildIndex[Node.LastChild]; } } { Swap(OldNodes, Result->Nodes); Result->Nodes.Empty(NewNum); for (int32 Index = 0; Index < Clusters.Num(); Index++) { Result->Nodes.Add(FClusterNode()); } Result->Nodes.AddUninitialized(OldNodes.Num()); for (int32 Index = 0; Index < OldNodes.Num(); Index++) { Result->Nodes[InverseChildIndex[Index]] = OldNodes[Index]; } } int32 OldIndex = Clusters.Num(); int32 InstanceTracker = 0; for (int32 Index = 0; Index < Clusters.Num(); Index++) { FClusterNode& Node = Result->Nodes[Index]; Node.FirstChild = OldIndex; OldIndex += Clusters[Index].Num; Node.LastChild = OldIndex - 1; Node.FirstInstance = Result->Nodes[Node.FirstChild].FirstInstance; checkSlow(Node.FirstInstance == InstanceTracker); Node.LastInstance = Result->Nodes[Node.LastChild].LastInstance; InstanceTracker = Node.LastInstance + 1; checkSlow(InstanceTracker <= Num); FBox NodeBox(ForceInit); for (int32 ChildIndex = Node.FirstChild; ChildIndex <= Node.LastChild; ChildIndex++) { FClusterNode& ChildNode = Result->Nodes[ChildIndex]; NodeBox += (FVector)ChildNode.BoundMin; NodeBox += (FVector)ChildNode.BoundMax; if (GenerateInstanceScalingRange) { Node.MinInstanceScale = Node.MinInstanceScale.ComponentMin(ChildNode.MinInstanceScale); Node.MaxInstanceScale = Node.MaxInstanceScale.ComponentMax(ChildNode.MaxInstanceScale); } } Node.BoundMin = (FVector3f)NodeBox.Min; Node.BoundMax = (FVector3f)NodeBox.Max; } NumRoots = Clusters.Num(); NodesPerLevel.Insert(NumRoots, 0); } }2. HISM运行时管理

HISM的运行时管理介绍的是HISM中cluster或者节点的剔除与渲染。

由于HISM为一棵N叉树，因此剔除的时候可以自顶向下，从粗到精进行，主要逻辑存放在FHierarchicalStaticMeshSceneProxy::GetDynamicMeshElements接口中：

template<bool TUseVector>void FHierarchicalStaticMeshSceneProxy::Traverse(const FFoliageCullInstanceParams& Params, int32 Index, int32 MinLOD, int32 MaxLOD, bool bFullyContained) const{ const FClusterNode& Node = Params.Tree[Index]; if (!bFullyContained) // 判断当前节点是否被剔除（视锥剔除） if (CullNode<TUseVector>(Params, Node, bFullyContained)) return; // 更新当前节点的LOD if (MinLOD != MaxLOD) { // 根据boundingbox的距离以及width（尺寸）判断当前节点的最小（高精）最大（低精）LOD CalcLOD(MinLOD, MaxLOD, (FVector)Node.BoundMin, (FVector)Node.BoundMax, Params.ViewOriginInLocalZero, Params.ViewOriginInLocalOne, Params.LODPlanesMin, Params.LODPlanesMax); // 超出设定的最高LOD，不予显示？ if (MinLOD >= Params.LODs) return; } // 遮挡剔除判断 if (Index >= Params.FirstOcclusionNode && Index <= Params.LastOcclusionNode) { check(Params.OcclusionResults != NULL); const TArray<bool>& OcclusionResultsArray = *Params.OcclusionResults; if (OcclusionResultsArray[Params.OcclusionResultsStart + Index - Params.FirstOcclusionNode]) { INC_DWORD_STAT_BY(STAT_OcclusionCulledFoliageInstances, 1 + Node.LastInstance - Node.FirstInstance); return; } } // 节点不可拆分 bool bShouldGroup = Node.FirstChild < 0 || ((Node.LastInstance - Node.FirstInstance + 1) < Params.MinInstancesToSplit[MinLOD] && CanGroup((FVector)Node.BoundMin, (FVector)Node.BoundMax, Params.ViewOriginInLocalZero, Params.ViewOriginInLocalOne, Params.LODPlanesMax[Params.LODs - 1])); bool bSplit = (!bFullyContained || MinLOD < MaxLOD || Index < Params.FirstOcclusionNode) && !bShouldGroup; if (!bSplit) { MaxLOD = FMath::Min(MaxLOD, Params.LODs - 1); // 连续Index的Cluster节点会在AddRun后进行合并，比如[0, 31], [32, 50]合并成[0, 50] Params.AddRun(MinLOD, MaxLOD, Node); return; } // 对子节点进行递归处理 for (int32 ChildIndex = Node.FirstChild; ChildIndex <= Node.LastChild; ChildIndex++) { Traverse<TUseVector>(Params, ChildIndex, MinLOD, MaxLOD, bFullyContained); }}

特点

参考

[1]. [ue4] HISM 大规模植被渲染解决方案

[2]. (UE4 4.27) UHierarchicalInstancedStaticMesh(HISM)原理分析

[3]. UE4的HISM深入探究

		自动登录	找回密码
密码			立即注册

【UE】Unreal的实例化渲染

本帖子中包含更多资源

浏览过的版块