15618 Project Proposal -- Ray Tracing in CUDA

Jiaqi Song (jiaqison@andrew.cmu.edu)

Xinping Luo (xinpingl@andrew.cmu.edu)

URL: Ray Tracing in CUDA (raytracingcuda.github.io)

Summary

We aim to implement a GPU-based ray tracer inspired by the "Ray Tracing in One Weekend" series. We will use the CPU code that the book provides as our start code and baseline of performance, and incorporate basic OpenMP parallelism into the code. Our main focus will be transforming the code on GPU using CUDA and compare the performance of the single-core raw CPU code, the OpenMP-enhanced multi-core version, and the CUDA implementation.

Background

Background on Rendering Techniques: Rasterization vs. Ray Tracing

Rendering is a critical process in computer graphics, converting 3D models into 2D images, and two mainstream techniques dominate: rasterization and ray tracing. Rasterization, commonly used in real-time applications like video games, involves projecting vertices of 3D models onto a 2D plane and filling in pixels to represent the object. It’s efficient and fast but has limitations in simulating complex light interactions such as reflections and shadows.

Ray tracing, on the other hand, offers more photorealistic results by simulating how rays of light interact with objects in a scene. This technique calculates each pixel by casting rays from the viewer's perspective into the scene, tracing their paths as they reflect, refract, or are absorbed by materials. As a result, ray tracing achieves realistic lighting, shadows, and reflections, making it the preferred method for high-quality rendering in film and visual effects.

Basics of Ray Tracing: Reflection, Refraction, and Light Interactions

In ray tracing, rays are cast from a camera through each pixel on the screen into the scene, following a path that simulates how light travels in the real world. When a ray hits an object, it can reflect (bounce off the surface), refract (pass through transparent objects), or terminate if it absorbs into a surface. Reflection is calculated based on the surface’s material and angle, leading to mirror-like or glossy reflections. Refraction occurs in transparent materials, bending rays as they pass through surfaces like glass or water. This process of tracing rays through multiple reflections and refractions enables realistic rendering of complex light interactions, contributing to the high visual quality of ray tracing.

Accelerating Ray Tracing: Monte Carlo Sampling and Bounding Volume Hierarchies (BVH)

Ray tracing can be computationally intensive, as each pixel may require casting numerous rays and tracing them across complex scenes. Two methods that accelerate this process are Monte Carlo sampling and BVH.

Monte Carlo sampling reduces the number of rays needed to simulate realistic lighting by randomly sampling paths and averaging results, approximating the overall lighting effect with fewer calculations. This technique is particularly useful in scenes with complex light interactions, as it achieves realistic lighting without casting rays for every possible light path.

Bounding Volume Hierarchies (BVH) optimize the ray-object intersection checks by organizing objects into a hierarchical tree structure of bounding volumes, such as boxes. When a ray enters the scene, the BVH allows it to quickly bypass large areas without objects, only performing detailed intersection checks when necessary. This hierarchy significantly reduces computation time, especially in scenes with many objects.

Ray Tracing on the GPU

The GPU’s parallel architecture is well-suited for ray tracing, as it can simultaneously compute the paths of thousands of rays, each of which is independent of others. This parallelism aligns with ray tracing’s design, where each pixel’s color is determined by individual rays cast into the scene. GPUs can handle large numbers of these calculations efficiently, significantly speeding up the process. However, implementing ray tracing on a GPU requires adapting traditional ray tracing techniques to fit within the GPU’s architecture, such as rewriting recursive functions iteratively and efficiently handling memory access patterns for structures like BVH. As a result, GPU-based ray tracing is becoming increasingly feasible for real-time applications, bringing high-quality rendering into interactive contexts like games and virtual reality.

Challenge

There are three main challenges.

  1. Where to Parallel: Identifying areas of the ray tracing process that can benefit most from parallelization is crucial. We should first test our thoughts on the CPU version using OpenMP and then implement our GPU version. For instance, ray casting itself—where rays are generated for each pixel and intersect with objects in the scene—lends itself well to parallelism, as each ray calculation is independent. However, more complex components, like handling reflections, lighting, or recursive rays, need careful partitioning to maintain efficiency. The granularity of parallelim is another problem, if we put parallelism to everything that is parallelizable, we may introducing too much overhead. Deciding where and how to introduce parallelism is key to achieving optimal performance.

  2. Adapt code to GPU: While there is a complete CPU codebase for us to reference and several blogs instructing the migration, moving a CPU-based implementation to a GPU still requires significant adaptation of code to harness GPU architecture. With some necessary steps done in CPU, we need to identiy and transform the most heavy-load work to GPU and try to let them work on thousands of threads. Additionally, any sequential dependencies in the CPU code must be minimized or removed to allow for concurrent execution, which may involve rethinking how certain calculations or data flows are handled.

    GPU threads excel in executing independent, flat loops, but the cpu ray tracing often involves recursion—particularly for calculating reflections, refractions, and light bounces. To address this, recursive code must be converted into iterative code, typically using explicit stacks or queues that GPU threads can manage. Implementing a Bounding Volume Hierarchy (BVH) to optimize intersection checks introduces another layer of complexity. On the GPU, a BVH must be carefully traversed without deep recursion, often using an iterative approach that keeps track of traversal order to ensure optimal performance and memory efficiency.

  3. Locality and work balance: GPUs perform best with high memory locality and balanced workloads. Ray tracing often involves accessing large amounts of scene data, so careful memory management and data structure optimization (such as using spatial partitioning or bounding volume hierarchies) are essential to minimize latency. Furthermore, balancing workloads across GPU threads to avoid idling and ensure all GPU cores are utilized efficiently is a challenging yet essential task. We find that the ray tracing process of pixels can vary a lot due to branching and partition pixels into blocks as we do in assignment 2 can potentially improve both locality and work balance. Effective work distribution and memory locality will directly impact the performance and scalability of the ray tracer.

Resources

We will use GHC machines as our primary development platform.

Our starting code will be the CPU implementation from the Ray Tracing in One Weekend series, which will also serve as a foundational guide into the realm of ray tracing.

There are also several CUDA ray tracing tutorials, such as NVIDIA's technical blog, Accelerated Ray Tracing in One Weekend in CUDA. One of our biggest challenges will be implementing tree structures on the GPU, for which another NVIDIA blog, Thinking Parallel, Part II: Tree Traversal on the GPU, may provide useful insights.

Most importantly, we have friends who have completed the 15-668 course, and as we embark on this journey, they will be our closest allies.

Goals and Deliverables

Plan to achieve

Hope to achieve

Deliverable Result

Platfrom Choice

We aim for our ray tracing application to run efficiently on personal computers with NVIDIA GPUs. To this end, we will primarily use GHC machines with RTX2080 GPU, which provide a realistic approximation of typical computational power. Additionally, GHC machines offer OpenMP support, making it ideal to serve as a reliable platform for performance testing. We will also test the application on a personal laptop equipped with an RTX3060 GPU to assess its performance on newer consumer-grade hardware.

Schedule