S
o, a couple of days ago, I was feelin' pretty suave getting the ray tracing going in pure Swifty goodness. But I could not resist the allure of the GPU. I decided to stalk the illusive beast known as Metal. And that's kinda what it felt like. I did not jump right in and start coding it up, since I have but a modest idea what's going on in Metal, apart from aping Apple's example code, and transliterating some OpenCL code, earlier in the year. For me the troubles were about how to paint on a texture and to understand the ins and outs of the Metal Shading Language.
For those just wanting to know if it was worth it: The same 1600x800 scene and over 480 reflective objects, went from 702 seconds in the pure Swift version down to 14 seconds! Roughly a 50x speed improvement. That is racing a 12-core Xeon with hyperthreading against a D700 graphics card. Well worth the time invested to compare.
MSL (Metal Shading Language) is a stripped-down version of C++, a language I use only when I am forced to, especially now that Swift is around. Whenever I see two colons next to each other, I usually avert my eyes. I decided to start by grabbing a project I knew worked, the really excellent
Metalbrot GitHub project by Jacob Bandes-Storch. I made a copy of it and then stripped it down, taking out all the beautiful complex math Jacob put in there, until all it did was paint a color on each pixel. I also moved it out of a playground and into a Mac project, so that it was easier to debug.
Not knowing how complicated I was allowed to make things in a Metal kernel source file, I started very cautiously, bringing tiny bits of my ray tracer into the kernel. And also changing all the Swift syntax into MSL syntax. Mostly this means getting rid of lets and vars, moving types to the front of the line, and adding lots and lots of semicolons, even more semicolons than Objective-C likes. I didn't pass in any information from the host code. I just ran the old code, stopped it in the debugger, and copied the value of every important constant, vector and color, and then just put those values verbatim into the Metal kernel to fake a real setup. By the end of the day yesterday, I had a very simple 4 object scene rendering, but looking not right. Lots of ugly artifacts remained, but it was very fast, which was enough to keep me encouraged.
There were a number of troubles that I feel are worth jotting down for others, in case someone has a better suggestion, or if you want to hear my ways around them. The second day saw the proper passing-in of data from the host, including moving the objects from being made directly in the kernel to being made on the host, and the proper use of random numbers to get the realistic look to the scene. Some things I learned along the way:
There is no real way to debug a kernel that's in motion. Unlike OpenCL, there's no print statement, and even that was anemic. So I set up a buffer of floats that I could use to pass back data from the kernel to the host, and then print it out in the host code. Really feels like old-school coding, but does the job. There is no random number function built-in. Graphics people are aware of this, and I kinda thought it might not be there, but it's definitely not there. I wasted hours shopping for pseudo-random number generators, and found some good ones, but couldn't figure out how to make them work. Then it hit me to just generate all my random numbers in Swift and pass a huge array of them to the kernel. It turns out you can happily pass megabytes of such arrays to kernels without causing any memory problems. That said, because the code actually needs more random numbers than the number I passed in by a couple of orders of magnitude, you can't simply reuse them without producing visual artifacts that give away that something is wrong under the covers. I was able to figure out a way to scramble them up so that the eye is pretty well fooled, but the code would certainly raise an eyebrow or two. (A possibly much better way was suggested to me that I may put in place later, using a CIRandomGenerator to make a white noise texture.) There is no inheritance allowed in MSL (as mentioned on page 2 of the Metal Shading Language Guide), so you have to flatten out your classes and/or use C++-like templates. I have no idea how to do the latter, so I just flattened things out. I don't have many classes, and there was limited use of inheritance in my code; not too bad. Just be aware. Once I did start sending data in from the host, I had to mirror the structs and classes on both sides so that everyone knew where to put the bits when the references are handed over. That seems not so bad, but there were many weird data alignment troubles that had to be solved, and I also ran into a case where the values of an enum, which was in a struct, which was in another struct, was getting lost in translation. Ultimately I simply ditched the enum, and then also flattened the struct-within-a-struct into a single struct, and then both sides happily agreed on what the data looked like. I still wonder if I couldn't have made it work, but it's a question for another project. If you die in the dream, you die in real life. This is not at all true. I've died in a dream. (Maybe this is the afterlife, however.) But it's the phrase that comes to mind with GPU programming, and is true of OpenCL, etc, because if you make a normally innocuous mistake in your kernel code you can easily and immediately bring down your whole machine. I crashed my machine 6 times today debugging my Metal code. No kernel panic, no mouse movement, everything on the screen just freezes as soon as you say build-n-run. Hard reset required. Memory mistakes caused 2 of them, and 4 were caused by a while loop that I didn't realize could get stuck. (Lesson on while loops: Always add a counter that absolutely limits the amount of time it can spend in there.) Fortunately, OS X pops back up within seconds and no files are lost, and all your windows are where you left them. I'm very impressed at how smooth the OS handles catastrophes these days.My initial tests on that first day were showing that i could render the 4-object scene in 0.001 seconds, but without any of the code handling random numbers that makes the whole thing look realistic. Adding that in along with increasing the blending iterations up to 75 and setting my light bounce limit to 50, brought the render time up to about 0.1s, adding all the 480+ objects brought that up to about 1.5 seconds. Making the image jumbo-sized brings it to the final render time of about 14 seconds. Way way better than ~700 seconds, but still far away from some of the great live rendering speeds I see other people get on the Internet.
But, I learned a heck of a lot the details of Metal's MSL, and I'm ready for another challenge. Thanks also to everyone in the Cocoa3D Slack domain.