Occlusion culling is a term that describes the method of reducing the rendering load on the graphics system by eliminating objects from pipeline that are hidden from view. The idea behind occlusion culling is to find geometry that has no effect on the end result of the frame buffer, as it is behind other geometry (occluded), and remove that geometry (cull) from the rendering process at an early stage so that we can reduce the amount of wasted GPU and CPU resources.
Level optimization is a necessary part of a good engine. Different optimization techniques are relevant to different types of levels. Indoor settings rely on the use of occlusion culling, while large outdoor environments may use level-of-detail (LOD) optimizations. A complete engine will make use of both methods to handle either situation.
Occlusion culling is ideal for complex scenes where only a small portion of the scene is visible, such as indoor environments or a city walkthrough. This means that in each frame there are a large number of internal nodes in the spatial tree hierarchy being occluded. Nodes can contain multiple objects, so when we skip over them, we save on geometry, draw calls and overhead.
View Frustum Culling
View frustum culling is the coarsest test to determine the visibility of a given object, and is arguably the foremost important optimization. Generally, if an object is beyond a plane of the view frustum, we know it will not be visible and we remove it from the rendering pipeline. View frustum culling is the first phase used in designing a robust occlusion culling system.
Every world object should make use of a bounding volume class that represents its physical boundaries. Commonly used implementations include the axis-aligned bounding box (AABB), as well as bounding spheres or ellipsoids. In view frustum culling, we look at each object in the world and mark any within the view frustum as visible, and add it to the render queue. Objects that not visible are ignored, and we can safely assume that their child nodes are not visible.
With portal-based occlusion, we divide the scene into separate cells, and maintain a list of the objects in each one. Portal culling is a more sophisticated way of spatial organization, and defines each division as an enclosed space, typically a room, which provides a connection (portal) of limited visibility to other cells, such as a doorway or window.
An object needs to be visible in at least one sector for portal culling to be meaningful. Sectors may overlap, and objects may exist in more than one sector at a given time. This in itself does not create any problems since the process is only setting a visibility flag.
Portal culling has shortcomings in that it relies on a largely static scene to be efficient, such that visibility does not change, and is not appropriate for dynamic and interactive applications. The necessity for a cell to be analogous to a room restricts the use of portal culling largely to only indoor environments.
Performing commands on the graphics hardware allows us to reduce computation overhead and avoid expensive CPU processes. Namely, compared to the CPU, GPUs are very good at rasterization. Occlusion queries allow us to request the number of rasterized fragments and determine if a given object should be rendered. To test a complex object for occlusion, we use these steps (as provided by Wimmer, Piringer, 2005):
Initiate an occlusion query.
Turn off writing to the frame and depth buffer, and disable any superfluous state. Modern graphics hardware is thus able to rasterize at a much higher speed (NVIDIA 2004).
Render a simple but conservative approximation of the complex object—usually a bounding box: the GPU counts the number of fragments that would actually have passed the depth test.
Terminate the occlusion query.
Ask for the result of the query (that is, the number of visible pixels of the approximate geometry).
If the number of pixels drawn is greater than some threshold (typically zero), render the complex object.
This method can work well with complex objects. The drawback occurs, however, in step 5 when we must wait for the result of the query to become available. This creates stalls in the pipeline, and can result in a slower rendering than simply drawing all objects directly. This is because we must call the query and wait for its results with every object.
Spatial hierarchies group objects together to allow us to treat them as a single object for the occlusion query. Space partitioning data structures, such as k-dimensional trees, BSP trees, and bounding volume hierarchies, separate the scene until the cells of the partition are small enough according to some criteria. Tree structures have interior nodes that group other nodes, and leaves contain the actual geometry.
Using hierarchies for occlusion culling gives us the major advantage of testing interior nodes, which contain more geometry than the individual objects themselves. Now we can eliminate a very large quantity of objects where they are grouped together into a single node. By issuing an occlusion query for this node, we are saving the tests of all the objects inside.
M. Wimmer, H. Piringer. (2005). GPU Gems 2, chapter 6: Hardware Occlusion Queries Made Useful.
GECK. (2011). Occlusion culling. GECK.
Kreuzer , J. (2006, July 09). 3d programming weekly.