January 14, 2010

Google.CN


I am sure that anyone reading this is following the story with baited breath, but here's a link roundup anyway:

Google exit from China could change face of Internet

As a side note, this isn't the first time China has gone after Northrop Grumman:

December 13, 2009

Using Subversion with Dropbox

During my last assignment for my CS class, I experimented with using Subversion and Dropbox together for synchronization. I have to admit that at first glance it's a pretty sweet deal. My entire team didn't use the Subversion repository, but we did share code via Dropbox. I would manually merge changes on a machine, and then copy the code to Dropbox so everyone would have access to the merged source. I'd then commit the merged source to Subversion (also stored on Dropbox, but not publicly shared).

I'm considering expanding this to allow other team members access to the Subversion repository. I haven't yet decided if I'm going to migrate to a real subversion host, like Google Code, or if I'm going to keep the source private for now. It's nothing ground-breaking that we're working on--in fact, I'm pretty sure we couldn't make any money off of it to begin with. Regardless, it's a project we're all keen on working on, and I need a way to make life easier on myself... which involves the other team members committing their own code.

There is a nasty race condition, however, with using Subversion and Dropbox. Dropbox continuously syncs files from the desktop to the shared folder. This is normally very convenient. However, when synchronizing with a Subversion repository, you need the ability to lock the repo while transmitting your update. When I tried committing the first version of the project to Subversion, I got an error message from svn saying that my local copy wasn't in sync with the repository. Grumble-worthy.

The not-so-elegant work-around is to kill Dropbox when committing to the repo. This is annoying, to say the least. It has its good side, though. Rather than making many spurious commits, this requires you to take an extra step before fooling around with the repo. This also ensures that other users of the repository don't have Subversion repositories in garbled states while you're committing your files.

Of course, there's also another nasty side effect. If I'm working on the project and someone else has restarted their Dropbox process, the files are downloaded to my machine in a piecemeal fashion, leaving my local copy of the Subversion repository out of sync while Dropbox is updating--meaning I cannot touch the repository until Dropbox has finished.

There's also the rather annoying bit of needing to coordinate check-ins with teammates. Your copy of the repository must be the latest version before you go committing your files. In essence, this takes away any added benefit that dropbox gives us--save the benefit of not having to find paid hosting for a Subversion repository.

In the end, I think I'm going to just add a new Subversion repository to my Dreamhost account, and have people use it. It's frustrating, but the management is easy. I'm working on figuring out a bug-tracking system now. Trac would be fantastic, but getting Trac to work on Dreamhost is annoying and painful. I'm experimenting with dreamy-trac at the moment, but to little avail.

November 30, 2009

A Break from CUDA and on to... AI

Our final project in CS 2 (you know the class, intro to data structures), is Quantum Tic-Tac-Toe. If you've never experimented with it, take a gander. It's fairly intuitive, and the corresponding paper is a fascinating read. There are a number of programming challenges involved:
  • Creating a working, efficient, competent AI
  • Representing the state of the game
  • Presenting the game to the user
It's an interesting introduction into encapsulation, game theory, and artificial intelligence. We were tasked with using game trees and optionally alpha-beta pruning and transposition tables for the AI. Presentation should be graphical, and management of the game state is entirely left up to our own design. This is all done in Java.

Our team divided the project into AI, GUI and Game State modules. I was primarily in charge of the AI, and my two additional team members took the Game State and GUI. Working in randomly-assigned teams is troubling, because not everyone shares the same level of commitment to the course or their work therein. This is compounded by an inability to assign milestone deadlines, because there are no real repercussions to missing those deadlines. My attempt at project management failed miserably.

For the AI, I opted out of doing a full-blown game tree for the QTTT game as whole, instead favoring sub-trees for the game's classical ensemble (i.e. the set of classical TTT games derived from the states of the quantum game). The AI evaluates all of the game trees using minimax with alpha-beta pruning and transposition tables. It's much faster to go this route, than say: determining all possible outcomes of a game of QTTT. Perhaps trivially so, but classical TTT is a solved game and easily so. I also found it much easier to implement the AI more robustly for CTTT as opposed to QTTT. I have considered extending the AI to consider the possible quantum outcomes of a subset of the possible QTTT moves it could take (based solely on evaluation of the CTTT game states). I have yet to determine if this is a worthwhile endeavor.

The game state is composed of a semi-complex network of objects:
  • the GameState
  • ClassicalBoards and a QuantumBoard
  • Player objects
  • the QuantumBoard stores QuantumMoves which in turn relate two QuantumMarks and the squares they are made in
I found this to be the most intuitive and natural way to describe a game with equitable division of duties and separation of powers between the classes. The interface to the GameState was made public to my GUI writer and he and I agreed upon it before he started coding. While I originally had a simpler design for the GameState class, I ended up modifying it to include the aforementioned component classes without modifying the public interface. He thanked me. Though, unfortunately, some of the public interface must change as operations previously considered simple have been proven to be require more input from the user or the AI (e.g. collapsing a quantum state requires selecting not just a position, but also a specific move).

The most complicated operation within the game state involves the QuantumBoard. At certain points in time during gameplay, two or more QuantumMoves become entangled. This entanglement is the direct result of cyclic superpositions of the QuantumMarks involved. Move 1 relies on the placement of Move 2, etc. At this point, the state of the game is collapsed and the squares involved essentially devolve to squares in a game of CTTT. This realization made things somewhat simpler, but I haven't yet found a completely graceful way to represent this state transition.

The most interesting challenge so far has been the transposition table. This table is barely useful during a single game, but over time (i.e. through multiple games), it will be tuned to recognize the most common board states. Serialization and storage of the transposition table will be interesting as I have yet to work with storing data on disk with Java. I've used serialization in Python, so I'm familiar with the techniques--and I've heard of serialization libraries such as JSON and Hibernate, but I think those will likely be a little too robust for me so I'm going to investigate any inherent capability that Java has (e.g. Object.serialize() and writing to a binary file).

It's been very exciting and stressful, and a somewhat refreshing break from CUDA and linear algebra libraries.

October 22, 2009

CUDA and Fortran and C, oh my!

Today, I did the unthinkable: I got all of my tools to cooperate.

The application we're working on is primarily driven by computation done in Fortran, with some bulky operations parallelized with CUDA in C. Getting it to work was not exactly the easiest thing ever. There are a number of things to take into consideration:

  • Fortran symbol mangling
  • Data type equivalency
  • Fortran passes by reference
  • Function or Subroutine?
  • Row-major vs. column-major arrays

There are a number of other things to consider, but those were the main stumbling blocks that I came across.

Fortran Symbol Mangling
Fortran isn't case sensitive like C, so the Fortran compiler forces all symbol names into lowercase. This can be changed with a compiler option, but I am perfectly okay with lowercase C function names. If you absolutely must have camelCase function names, take a look at the -Mupcase option for the Fortran compiler. Fortran also adds an underscore to all function and subroutine names. So if you want to call a function in Fortran (e.g. add()), the function in your C source should have an underscore appended (add_()).

Data type equivalency
This is going to be specific for each Fortran implementation. HP's Fortran has different data type equivalency than the PGI Fortran. If you take the time to make sure that you use equivalent data types, you will have a much easier time of getting Fortran and C to play together. Refer to your Fortran documentation--they should include a table that indicates which data types you should use.

Fortran passes by reference
This was a point of contention with the professor I work for. He speaks Fortran, and I speak C (though, I speak a little Fortran now). When he saw all of my pointers in the source, he was very confused, and almost ditched the idea of cross-language interoperation altogether. I was a little flummoxed by this, but discovered something key: hide your C from your Fortran-speaking professors! :) Seriously, though, the implications of pass by reference are that all of your function arguments should be pointers.

Functions or Sub-routines
I had a hell of a time here. Theoretically, you should be able to return values from the C functions that you're calling from Fortran. This didn't work for me in practice. This may very well be my fault, but I ended up treating all of my C functions as sub-routines, and I'm perfectly okay with the outcome. This means that in your Fortran code, you will need to call the subroutine and pass the variable in which you wish to store the return as an argument.

Row-major vs. column-major arrays
Fortran stores its array members in column-major order instead of row-major order. This means that if you want to access element (1, 2, 3) in a Fortran array, you would access it as array[2][1][3] in C. Since we're being passed a reference to the beginning of the array, we'll be doing a little bit of simple math to find the address of the element we want to access:

index = z*dim*dim + x*dim + y


Whereas in C we would normally calculate this as:

index = z*dim*dim + y*dim + x

To get an idea of how all this comes together, take a look at the following examples. Let's say you want to perform some simple addition in C instead of in Fortran. First, you need to write your C (remember what we've already discussed about references, name mangling, etc):

int add_(int *a, int *b, int *result); {
*result = *a + *b;
}

Then compile with gcc -c add.c. This will give you add.o--the object file you'll be linking into your Fortran application. From here we write our Fortran.

    integer x, y, sum
x=1
y=1
add(x, y, sum)
print *, sum

Now compile this and link it with your add.o object file. We use PGI Fortran, but the syntax is similar pgf90 add.o add.f90. Now just run ./a.out.:

$ ./a.out
2
$

Obviously a contrived and very simple example, but it should be enough to get you going.

October 12, 2009

An Exciting Venture

Last night, I received an invitation to try out Google Wave. I had been waiting for that moment for some time, and when it finally came, I was overjoyed! I spent an hour installing gadgets, playing cooperative Sudoku with a friend, and having discussions about maps with a drunken friend of mine in Atlanta who was attempting to convince me to make the trip this weekend. It was great fun.

Today, however, I had an extraordinary conversation with another one of my friends in Wave, and it was one of the most exciting things ever. We started our wave as the rest of my waves have started: discussing how the UI was somewhat lacking, but how it was a great start, etc. Then it morphed into a discussion of the Computer Science GRE. That's when things started to get interesting. We started a discussion of graph theory, quickly Googling our way to a Wave Gadget that we could use as a sketch pad.

What was so interesting was how over the period of an hour or so, we came to figure out a way to use Wave that was both helpful and interesting and much more enjoyable than the standard chat. If you'd like to try it sometime, feel free. Edit the same text area in Wave and then type from the inside out. All of the relevant conversational data was near the center of our field of vision and it would be fairly trivial to reconstruct the conversation by watching the wave play out. Overall, I'm very, very excited about Google Wave. I hope that this does not go the way of previous genius Google ventures and languish in obscurity (e.g. Google Notebook). This really is a whole new paradigm for collaborative communication, and I hope it catches on like wildfire.

In other news, I'm working on my applications for ANL and ORNL. I'm so excited about these internship opportunities. Either would provide a logical extension for the work I'm doing at Auburn in computational physics, and honestly, I'm having a really difficult time deciding which is my first choice. Sadly, I have to make that decision, because all of the applications go through the DOE, and each lab screens candidates in order based on whether or not they chose that lab first. Crazy, I know, but hey... what can I do? I sent out requests for letters of recommendation this week, and I'm hoping things go well. Wish me luck!

September 6, 2009

The Perils of Parallel GPU Computing

Well, I did something very silly today. I should begin by saying that I have finally compiled and run my first CUDA application! It's very simple, but it's the first step toward the completion of my current project. This morning I was doing some load testing--to see just how big my vectors could be for a simple multiplication/summation algorithm.

I started by re-tooling the way I was calling my CUDA kernel. Previously, I was testing with very small vectors (N=6 to 32). I realize that this is like buying a F430 Spider and then refusing to drive it faster than 45 kph. I had good reason, though... we'll get to that later.

So I went from this:

Kernel<<<1, N>>>(vec1, vec2, result);

To this:


dim3 dimBlock(256);
dim3 dimGrid(N / 256);
Kernel<<<dimGrid, dimBlock>>>(vec1, vec2, result);

Annnnddd.. I sort of shot N up to 65536. I realize that this is still not a lot of flops, but ... and this is the big but. I am running all of my tests on an active video card at the moment. I realize that this isn't recommended, but for some basic, simple development... it shouldn't hurt. Well, when you start doing millions of operations (I'm actually doing something a little bit more complicated that vector multiplication)... and taking up a sizeable chunk of video memory... apparently, things can go wrong.

I started to notice it in my gaim window first. Odd, random pixelation was occurring and the screen wasn't re-drawing like it should. It was really funny. I must have somehow malloc'd part of the video card's frame buffer or something. The display seized for a couple of seconds while my program chugged away. I didn't get the expected results from my program either. So yeah... don't try this at home! I'm going to be moving all of my development over to my test machine without a display. It was just very funny to me. :)

September 4, 2009

Matrix Operations and CUDA

nVidia's CUDA has been a blessing for me as of late. I've been continuing work on my matrix library, and have started looking into CUDA. When you have non-trivially sized matrices, these operations tend to become tedious and take forever to complete. However, matrix operations can be broken down into discrete operations of multiplication and addition that can be parallelized.

School is demanding at the moment, but the holiday weekend approacheth. I'm hoping to have something to post next week that gives an example of the power of CUDA alongside the SIMD-leanings of something like elementary matrix operations.

In the mean time, here are a few excellent CUDA articles that I've been reading:

CUDA, Supercomputing for the Masses: Part 1

Farber claims performance increases in multiple "orders of magnitudes." I'd have to say that this is rather far-fetched. You should not expect your software to run 100, 1,000, or 10,000 times faster than it currently does. Most of the performance gains I've heard of have been in the range of 20-230x faster. Sure... 230x is "orders" of magnitudes, but come on... I think we're misusing the phrase here.

I would also argue that CUDA is not super-computing for the masses. Yes, CUDA provides you with a wealth of processing power that is pure, delicious threading bliss, but at the same time the entry curve is a little steep, and in order to take full advantage of your GPU's power (i.e. when you stop worrying about processing and start worrying about bandwidth across the PCI-express bus and the amount of DRAM your GPU has), you need some serious, bit-twiddly understanding of C and C++. That being said, CUDA does bring serious computing capability to the desktop and the common man.

The GPU's multiprocessing cores are capable of thread management beyond anything which your CPU can dream. We're talking about each core being able to handle hundreds of threads, simultaneously, without an insane number of cache misses or expensive context switching. CUDA manages to make all of this seamless and easy to use. Bravo nVidia.

And, from nVidia themselves:

Efficient Sparse Matrix-Vector Multiplication on CUDA

This was, as you can imagine, of immense interest to me--as I mentioned non-trivial matrix dimensions earlier. I haven't digested this one yet, but the first couple of pages have some good CUDA background information for the recently initiated.