## Homework 3: Pathtracer

### Colin Steidtmann

## Overview

In my last post, I explored how 3D objects can be represented and rendered via triangle meshes. This post goes beyond that and attempts to create realistic photos and 3D models. A big part of creating a realistic feel is having the correct lighting, shadows, and depth perception. A common way of doing this is with ray tracing. In addition to our 3D objects build with triangle meshes, we include virtual light sources and simulate a camera perspective that traces rays coming from the different light sources, bouncing around objects in our scene, and finally reaching the camera lens. Implementing this requires numerous challenges. First, we must define how rays are generated from the camera and how they intersect with the geometric primitives (like triangles) that make up our scene. Next, we'll need to speed things up because for scenes with hundreds of thousands of triangles, we want to quickly find which rays triangles intersect with, and we can do this with a Bounding Volume Hierarchy (BVH) data structure that optimizes ray-object intersection queries. After that, we'll play around with how materials reflect light, such as making them reflect it uniformly in all directions. We'll need this to trace light rays bouncing off of them. Finally, we'll work on the actual ray tracing part, which is necessary for indirect illumination, and we'll make it even faster with adaptive sampling.

This assignment once again challenged my geometry skills and understanding of the physical nature, something that most CS classes don't, which makes this course special. I ran into many issues, from unbearably slow rendering times on my PC, forcing me to render on machines in the cloud, to scenes just not looking right, but that's what's required to learn! 🙂

## Vocab to be aware of

Why Use Triangles? We only need 3 points to define one, all three points of a triangle always lie on the same plane, modern GPUs are specifically designed to process triangles efficiently, and any 3D shape, no matter how intricate, can be approximated by combining numerous triangles.

Ray Tracing: an image-order algorithm for making renderings of 3D scenes. Ray tracing mimics real-world light paths to generate photorealistic images by tracing rays from your eye to objects in the scene and simulating their reflections, refractions, and interactions.

Spatial Partition: Partition space into non-overlapping regions. Objects can be contained in multiple regions

Object Partition: Partition set of objects into disjoint subsets. Bounding boxes for each set may overlap in space

Bounding Volume Hierarchy (BVH): A tree structure that speeds up ray-tracing by efficiently discarding empty regions of the scene.

Illuminance: The total amount of light energy hitting a surface.

Irradiance: The ambient light energy coming from all directions.

Radiance: The amount of light power emitted, reflected, or transmitted from a surface per unit area and solid angle.

Exposure: The amount of light captured by a camera sensor.

Unbiased Estimator: An algorithm that, with more samples, converges to the true solution statistically.

Monte Carlo Sampling: A technique using random sampling to solve problems with probabilistic outcomes.

Importance Sampling: Focusing samples on areas that contribute more to the final image.

Adaptive Sampling: Adjusting the number of samples based on the complexity of a scene area.

Direct Lighting: Illumination directly from light sources.

Indirect Lighting: Light that bounces off other surfaces before reaching a point.

Global Illumination: Simulating the interaction of light with all surfaces in a scene to create realistic lighting effects.

Russian Roulette Rendering: A technique that terminates unnecessary ray paths to improve rendering efficiency.

BRDF: Bidirectional Reflectance Distribution Function, describes how light reflects off a surface.

Lambertian Reflectance: An ideal reflection model where light is scattered equally in all directions.

## Section I: Ray Generation and Scene Intersection

### Part 1: Ray Generation and Scene Intersection

The first step in ray tracing is moving from the 2D realm to the 3D realm. Even though our ultimate image will be 2D, all the objects in our scene exist in 3D space. Moreover, we aim to replicate a camera or focal point that records the scene. To achieve this, we must map every 2D pixel position (from our output image) to a corresponding 3D location. Then, we generate rays that originate from the camera and extend to these new 3D pixel locations. Finally, we determine which objects intersect with our rays at those locations.

Our camera has a couple properties that helps us transition from image space to camera space:

hFov (horizontal field of view): The horizontal angle that the camera can "see" in degrees.

vFov (vertical field of view): The vertical angle that the camera can "see" in degrees.

nclip (near clipping plane): The minimum distance from the camera at which objects are considered visible. Anything closer than this distance is clipped and not rendered.

fclip (far clipping plane): This defines the maximum distance from the camera at which objects are considered visible. Anything beyond this distance is clipped and not rendered.

The "sensor," representing the image area in 3D space, has its width and height calculated using the camera's horizontal field of view (hFov) and vertical field of view (vFov). We then map each (x,y) coordinate of the final image to a corresponding location on the sensor. From the camera's origin, a virtual ray is created that extends towards this determined sensor location. Additionally, properties are assigned to the ray: min_t is set to nclip and max_t is set to fclip. These settings help us determine the starting and ending points for ray-object intersection checks in our scene.

Before checking how rays hit objects, we colored pixels based on ray direction. Red pixels show the horizontal direction (x-component) played a big role, while green pixels show the vertical direction (y-component) mattered more for their color.

To assign pixel colors based on the objects in our scene, it boils down to 4 steps:

Ray Setup: We "shoot" rays from the camera towards each pixel in the image. This is like tracing the path of light through the scene.

Estimating Radiance: Instead of directly assigning colors, we estimate the light energy (radiance) at each pixel by randomly sampling multiple points along per pixel. We check if these points intersect any objects in the scene.

Intersection and Material Properties: If a ray hits an object, we use the Möller-Trumbore algorithm to find the exact location and normal vector of the intersection point on the triangle mesh. We also store the material properties of the object at this point, described by the Bidirectional Scattering Distribution Function (BSDF), which determines how light scatters and reflects, ultimately influencing the final pixel color.

Final Color Calculation: Using the accumulated light energy (radiance) and material properties from the intersection points, we can finally calculate and assign a color to each pixel in the image.

### Part 2: Bounding Volume Hierarchy

In addition to looking great, we also want our scenes to be rendered fast. This is especially important for scenes with lots of large objects. Our current ray-object intersection algorithm is O(n), the runtime is proportional to the number of objects in our scene, even if the ray is nowhere close to intersecting with them. This is quite slow, rendering this cow made up of lots of triangles took 1 minute, 43 seconds on my machine.

To accelerate ray-object intersection checks, we implemented a Bounding Volume Hierarchy (BVH), a tree structure that organizes objects in our scene into bounding-boxed volumes. This allows us to traverse objects in our scene in O(logn) time complexity, rather than O(n) complexity. The concept is straightforward: if a ray doesn't intersect with a bounding volume, there's no need to investigate that volume further.

Our BVH construction starts with a large bounding volume encompassing the entire scene. This volume is then recursively subdivided into smaller volumes based on a calculated split point. To determine the split point, we evaluate the cost of splitting along the median of the x, y, or z-axis. The cost is the sum of the bounding box areas of the objects to the left and right of the median. The goal is to minimize the bounding box areas, making them as tight as possible around the objects, which reduces the time spent checking ray intersections with empty space.

Recursion continues until a bounding box contains fewer than a heuristic threshold number of objects (we used 4), at which point we stop subdividing. This approach significantly speeds up ray-object intersection checks by efficiently reducing the number of objects to be checked for intersection with a ray.

The speedup achieved by using BVH for ray intersection checks is remarkable. Previously, rendering the cow model took 1 minute and 43 seconds, but with BVH, it now takes only 0.416 seconds. The speedup for other models was even more impressive.

Rendering a model of the famous theoretical physicist Max Planck with 50,801 triangles took 21 minutes without BVH, but only 1 second with BVH.

Similarly, rendering a dragon model with 100,012 triangles took 36 minutes without BVH, but only 2 seconds with BVH.

The rendering process was so computationally intensive that my computer became unresponsive to keyboard input by the time it finished rendering. I had to take pictures of my terminal using my phone and then restart my computer to regain keyboard functionality.

### Part 3: Direct Illumination

The images rendered above clearly aren't realistic as the pixels colors are simply set to the normal vector of the surface. So in this part, our goal is to simulate lighting effects like direct illumination and shading by calculating the amount of light reaching a point from various sources.

Our first step involved implementing a diffuse material. This material type scatters incoming light equally in all directions, mimicking the way rough surfaces like walls or unpolished wood reflect light. The implementation was straightforward: regardless of the input or output ray directions, we simply calculated the albedo, which represents the proportion of light reflected by the surface, at the point of contact. We then divided this value by π and returned it, reflecting light uniformly for all rays.

Next we implemented zero-bounce illumination, which refers to light reaching the camera directly from the light source without any reflections. Our function takes a Ray object and an Intersection object, then checks if the intersected object is a light source and returns its emitted light for direct illumination.

One-bounce illumination builds on zero-bounce by simulating how light reflects off objects. For each point where a ray hit an object, we sample directions across a hemisphere and shoot rays in those directions. If a sample ray hits a light source, we consider that light reaching the original point indirectly (one bounce). By summing these contributions and accounting for sampling probability, we calculate the total light arriving at the point, resulting in more realistic lighting effects like soft shadows and variations in brightness across surfaces. The math underlying our implementation is represented by the reflection equation:

Li(p, wi): This represents the radiance arriving at a point p on the surface from a direction wi. Radiance is a measure of light power per unit area per unit of solid angle.

fr(p, wi, wo): This term denotes the bidirectional reflectance distribution function (BRDF) of the material at point p. The BRDF describes how light is reflected at a point based on the incoming direction wi and outgoing direction wo.

Cos(θ): This represents the cosine of the angle between the outgoing direction wo and the surface normal n at point p. It accounts for the fact that light reflected at a more glancing angle will have a lower intensity.

p(wi): The probability density function (PDF) related to the incoming light direction, aka probability of sampling from that direction (we're using a uniform hemisphere model so its 1/2π.

N: The number of samples

In simple terms: Reflected light = Incoming radiance * Material's reflectance * How "head-on" the light is reflected, normalized by the number of samples and probability of sampling from each direction.

Improving our rendering above, we can implement importance sampling. Before, we were sampling uniform directions in a hemisphere and this has two downfalls:

Noise: Uniform hemisphere sampling casts rays in random directions, which can be inefficient. Many of these rays might not hit a light source, leading to noisy results that require a high number of samples to converge.

Point Lights: Uniform hemisphere sampling struggles with point lights because the probability of a random direction hitting an infinitely small point is very low. This makes rendering scenes lit only by point lights difficult.

Importance sampling overcomes these issues by directly sampling the light sources themselves. Instead of random hemisphere directions, the algorithm samples directions specifically towards each light source in the scene. If a ray cast in this direction hits the light source and there are no objects in between, we know that light is reaching the point of intersection. This approach improves the quality and efficiency of direct lighting calculations, especially for scenes with point lights. The image below highlights this with a less grainy image of the bunny than before, along with softer shadows.

Focusing solely on shadows and using light importance sampling, we can observe how increasing the number of light samples per pixel progressively reduces graininess in shadows and refines the softness of shadows.

### Part 4: Global Illumination

Our next challenge is implementing global illumination. Global illumination simulates how light bounces off surfaces to create realistic indirect lighting effects in 3D scenes. This means lighting up areas where there isn't a direct path to the light source, mimicking real-world lighting effects such as soft shadows, color bleeding, and ambient occlusion.

Previously, we implemented a basic lighting model that only considered direct illumination from light sources. However, for truly realistic rendering, we need to account for the way light interacts with objects in the scene – how it bounces off surfaces and illuminates indirect areas. This is where global illumination comes in.

The core idea behind global illumination is to trace the paths that light takes as it bounces around the scene. This is often referred to as n-bounce illumination, where n represents the maximum number of times a ray can bounce off surfaces before reaching the camera. The first bounce of light is typically the harshest, resulting in flat lighting (either all dark or not at all) and harsh shadows. However, upon examining the 2nd and 3rd bounces of illumination in the images below, you can observe refined details (such as the bunny's texture), indirect lighting (the ceiling is illuminated), soft shadows, and subtle color bleeding (the blue wall bleeds onto the edges and corners).

To achieve this, we need to extend the capabilities of our materials beyond simply reflecting light in all directions with equal probability. In Part 3, we implemented a function for our diffuse material, which returned a constant value for the amount of light reflected in any direction. Now, we need to modify the diffuse material to account for the possibility of light arriving from various directions due to bounces off other objects.

When calculating the illumination for a point on a surface, we simulate the way light interacts with the scene by tracing its path as it bounces off objects. We follow a ray for a certain number of bounces (n-bounces) before it reaches that point and accumulate the illumination from each bounce point. This accumulated value represents the total indirect light affecting the final color of the pixel. But a key question arises: how many times, or "bounces," should we follow a ray as it bounces off surfaces?

One approach is to set a maximum ray depth (max_ray_depth) and stop tracing after that many bounces. However, this approach has limitations. A high enough max_ray_depth to capture all possible light paths would be computationally expensive. Additionally, simply stopping at a fixed depth introduces bias into the lighting calculations.

Instead, we can use a technique called Russian Roulette Monte Carlo. We essentially "flip a coin" and if the outcome is "heads", the ray tracing continues to the next bounce. However, if the outcome is "tails," the ray tracing stops for that particular path. This approach is unbiased because even though we terminate some paths early, we weight the contribution of those paths based on the probability of them continuing. This probabilistic approach allows us to achieve accurate lighting calculations while maintaining computational efficiency.

In addition to tinkering with the number of bounces we use for ray tracing, we can also change the samples-per-pixel rate. This refers to the number of times a ray is traced through a single pixel in a scene during path tracing. It essentially determines the level of detail and the amount of noise present in the final rendered image. It also affects the time it takes to render images, with higher sample rates taking longer.

### Part 5: Adaptive Sampling

As previously mentioned, while some of the images above look amazing - the ones with the high sample rate - they took a loooong time to render (a couple hours on my machine). To make the rendering more efficient while preserving quality and keeping noise in the image at a minimum, we can use a technique called adaptive sampling. Adaptive sampling works by sampling more in parts of the image that take a longer time to converge, and less in parts that converge quickly. Now, what does it mean "to converge?"

Each time a ray is traced through a pixel, it contributes a color value (illuminance) to that pixel. Initially, the pixel's illuminance will vary significantly as you take more samples, because you haven't captured the full picture yet. As you take more and more samples, the illuminance values start to cluster around an average value. Adaptive sampling defines a stopping point based on how much the illuminance values are expected to vary around the average.

Here's how we implemented it:

For each pixel initialize the mean and standard deviation to zero.

For each new sample:

Add the sample's illuminance to the running total.

Add the squared illuminance to another running total.

Update the mean and standard deviation based on the current number of samples.

Check the convergence criterion.

Define: I = 1.96⋅σ/√n

Converged if: I ≤ maxTolerance⋅μ

If it is met, stop tracing samples for this pixel. Otherwise, continue to the next sample.

maxTolerence is a parameter to play around with, we set it equal to 0.05. The 0.05 value defines how wide this confidence interval can be relative to the mean illuminance. A smaller value (like 0.05) creates a tighter tolerance, meaning the average illuminance needs to be very stable (low standard deviation) to be considered converged. Using 1.96 allows us to define a range around the mean illuminance (calculated from the samples) that we are 95% confident captures the true average illuminance of the pixel.

Here, we can see how areas directly in the path of light rays (shaded in blue) – including the ceiling light, floor, and walls – require significantly fewer samples to converge compared to areas that are blocked from direct light rays. These areas, such as the underside of the dragon and bunny or the ceiling surrounding the light source, rely on illumination bounces from their surroundings.

### Conclusion

This assignment taught me so much about all the work that goes into rendering realistic-looking computer graphics. It makes me admire and respect Pixar films like Toy Story much more. I worked on this project with my friend Ian Dong. It was great having such a hardworking and smart friend to do a school assignment with. I was extremely busy during the week this homework was assigned (3 midterms in 1 week 😰), so I honestly don't think I could have completed it without him. 🙌🚲