Scientific computing and the cloud

20110827-091100.jpg
This year I’ve had a chance to experiment with tools for compute intensive applications. In particular, tools that harness the profusion of inexpensive CPU/GPU cycles available — OpenMP for multi-threading on single machines so that multiple cores can be leveraged; MPI to distribute compute load over clusters of machines; OpenCL for handing general purpose computation off to a graphics processor. And then on top of these tools, NumPy and SciPy for scripting and visualization from Python. The amount of excellent computational software which is now available is amazing, these capabilities would have cost immeasurable amounts of money just a decade ago. And the first time I tied together a cluster of machines or yoked up a GPU and did a massive computation, and then displayed the animated results using Python — what a great feeling! The ability to attack really hard, really large problems is better than it is has ever been.

But what a nightmare of housekeeping. Breaking up computation into threads and spreading it across multiple cores with shared memory and file system is tedious and error-prone — hand-offs between threads create opportunities for many errors. The work to break up and manage the computation load across multiple machines is even more mind-numbing and error-prone, and now the lack of shared memory and files are additional complications. Using graphics processors is even more obtuse, with their funky fractured memory spaces and architectures and limited language support. And getting all the software piece parts running in the first place takes a long time to work through all the dependencies, mixing and matching distributions and libraries and tools, and then getting it all right on multiple machines. And then you get to maintain all this as new versions of libs and runtimes are released..

But again the results can be stunning — just look around the web at what people are doing in engineering (“Youtube video”:http://www.youtube.com/watch?v=4z1STnnA3aM), life sciences (“Science Mag article”:http://www.sciencemag.org/content/331/6019/848.full#F3), or any of a dozen other areas. Harnessing multiple cheap processors to perform complicated modeling or visualization can have huge payoff in financial services, bioinformatics, engineering analysis, climate modeling, actuarial analysis, targeting analysis, and so many other areas.

However, it is just too darn hard to wield all these tools. The space is crying out for a cloud solution. I want someone else to figure out all the dependencies and library requirements and spin up the correctly configured virtual machines with all the necessary componentry. And keep that up to date as new libraries and components are developed. I want someone else to figure out the clustering and let me elastically spin up 1, 10, 100 machines as I need to, and manage all the housekeeping between these machines. I want someone else to buy all the machines and run them, and let me share them with other users, because my use is very episodic, and I don’t want to pay for 100 or 1000 or 10000 machines all the time, when I only need the machines for a week here and there. Maybe I want to run all my code in the cloud, or maybe I want to have all the VMs and clustering info delivered to my data center, but I want someone else to solve the housekeeping and configuration issues, and let me get to work on my problems.

Amazon is doing some great work in AWS with their HPC support (“AWS HPC support”:http://aws.amazon.com/hpc-applications/#HPCEC2).
Microsoft has made a commitment to provide scientific computing resources in the cloud (“NYT article”:http://www.nytimes.com/2010/02/05/science/05cloud.html). There is a lot of great academic work happening (“ScienceCloud2011”:http://datasys.cs.iit.edu/events/ScienceCloud2011/). But the opportunity is out there to do a lot more.

Playing around with GPU programming

Been spending a lot of time playing around with GPU programming for scientific computing the last couple weeks. Fascinating stuff, GPUs are computational beasts. Some observations:

* If you want to get into it, “GPGPU.org”:http://gpgpu.org/ has boatloads of great info — news, tools, definitions, primers, etc etc etc. The place to start.
* There is a good chance you’ll end up using OpenCL as the device- and platform-independent interface to GPUs. “Khronos.org”:http://www.khronos.org/ has tons of great info and in particular, the “OpenCL Reference Card”:http://www.khronos.org/files/opencl-quick-reference-card.pdf. Good stuff.
* The OSX platform has awesome support for OpenCL within Xcode. Very easy to get up and going. Great sample code up at the “Apple Developer web site”:http://developer.apple.com/library/mac/search/?q=opencl.
* Also tons of samples from “Nvidia”:http://developer.download.nvidia.com/compute/opencl/sdk/website/samples.html.
* However…you may quickly hit a dead end on OSX because only the most expensive Mac Pros come with GPUs which will support double precision, and double precision is kind of necessary for scientific computing. Info on which Nvidia processors support double precision “here”:http://en.wikipedia.org/wiki/CUDA. I could go whack around and build my own double precision math libraries for unsupported GPUs but what a pain that would be.
* So onto a PC, I happen to have one with an ATI HD 57xx which will support double precision. WAY harder to get working OpenCL code working on a Windows PC tho. After much wandering around, the “AMD SDK”http://developer.amd.com/gpu/AMDAPPSDK/Pages/default.aspx seems to be the best way to get working buildable OpenCL sample code. The most freaking obtuse make files ever tho, I am ripping them apart. But if you start with one of the sample code bases and duplicate it for your use, it works. (C++ by the way).
* However now I am currently blocked by limitations in the trig function implementations. Some discussion online that suggests that they are “single precision only”:http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=137564. And even the single precision results seem to have crappy precision. I will definitely have to build my own.

UPDATE: a friend points out that Amazon also offers an “EC2-based instance with GPU capabilities”:http://aws.amazon.com/ec2/hpc-applications/. Worth a look