How Chromium Works

This document is intended to give an overview of how the various pieces of Chromium work and what's going on behind the scenes when you do one of these runs. It will probably be a grab-bag of useful nuggets of information in no particular order. Hopefully this will be useful if someone wants to experiment with changing some part of the core architecture instead of just extending it.

The Mothership

The mothership is usually the first thing that you will run for any Chromium session. We've already described in detail how the scripts themselves work, so let's focus on the mothership itself. The guts of the mothership are invoked by the "Go" method of the CR objects created in a configuration script. The last line of a Chromium configuration script will almost always be:

cr.Go()

This function actually takes a single argument: the port to listen on. The default port is 10000, but it can sometimes be useful to have the port be an argument to the script if, for example, sockets take a minute to shut down properly (like they can on Linux). The Go method will create a socket on the specified port, and accept connections from (possibly multiple simultaneous) clients.

The mothership then calls ProcessRequest, which will read a single line of data from the connected client. This line is split into words: the first word is considered to be a command, and the rest of the words are arguments to that command. So, for example, you can connect to the mothership (just telnet to port 10000) and send the following line:

spu 10

This will be broken by the mothership into the words "spu" and "10". Then, the Python script will build an on-the-fly reference to a method of the CR class, using Python's reflection API. This is done by the following lines of ProcessRequest in mothership/server/mothership.py:

command = string.lower( words[0] )
print "command = " + command
try:
    fn = getattr(self, 'do_%s' % command )
except AttributeError:
    self.ClientError( sock_wrapper, SockWrapper.UNKNOWNCOMMAND, "Unknown command: %s" % command )
    return
fn( sock_wrapper, string.join( words[1:] ) )

The "getattr" function tries to find a method called, in our example "do_spu". If the method is not found, an error is returned to the calling client. Errors are reported with a three-digit numeric code and a descriptive string. In this case, the code SockWrapper.UNKNOWNCOMMAND happens to be 402. This is a lot like the way HTTP works -- a successful reply actually has the ID 200.

If the function is found, the variable "fn" becomes a handle to it, and the rest of the arguments are combined into a single string and passed as arguments to the function.

To continue this example, let's look at the do_spu method:

def do_spu( self, sock, args ):
    try:
        spuid = int(args)
    except:
        self.ClientError( sock, SockWrapper.UNKNOWNSPU, "Bogus SPU name: %s" % args )
        return
    if not allSPUs.has_key( spuid ):
        self.ClientError( sock, SockWrapper.UNKNOWNSPU, "Never heard of SPU %d" % spuid )
        return
    sock.SPUid = spuid
    sock.Success( "Hello, SPU!" )

This method tries to convert the arguments (in our case, the single string "10") into an integer. If the conversion fails, a "Bogus SPU name" is returned. This can happen if the client says "spu 10.5" or "spu 10 foo" or "spu clodburger". Next, the system tries to find the SPU from the given ID. Each SPU is assigned a unique numeric identifier by the mothership when it is added to a node; these identifiers are communicated to the application faker or the server when the SPU is loaded. The ID is then passed as the first argument to the SPU's SPUInit function.

If the SPU is found, the ID is stored along with the socket, so that subsequent requests on the same socket are referring to a particular SPU. This way, to get many configuration settings for a particular SPU, you set the "current" SPU and then send many "spuparam <param_name>" commands in succession.

The mothership also has some rudimentary facilities for brokering out-of-band connections between components, although this is not completely implemented yet. In fact, the Myrinet implementation of the Chromium networking abstraction uses what does exist of this facility to establish its connections. Because Myrinet GM is a completely connectionless API, WireGL had to have a TCP/IP based handshake occur first in which the two computers agreed to use Myrinet and exchanged some information related to the connection. In Chromium, that information is exchanged via the mothership. This paragraph is deliberately vague because the exact mechanism is likely to change in the near future.

The Server

The next portion of the system likely to get run is the Chromium server. The first thing that the server tries to do is figure out where the mothership is. This can be specified with the "-mothership" command line option. If this option is omitted, NULL will be passed to crMothershipConnect, which will cause the mothership library to look for the CRMOTHERSHIP environment variable, and if that fails to default to "localhost:10000". Notice that the format for specifying the location of the mothership is "<host>:port", although the port can be omitted. Eventually, the mothership will remotely invoke the server (and the application faker, which has a similar argument), so this will be a non-issue. For now, the most likely scenario is that the mothership will run on the same machine all the time, so you can set the CRMOTHERSHIP environment variable to just the name of that machine. The mothership requires TCP/IP to work -- it can not work over GM.

The server next initializes the state tracking and networking subsystems, and then connects to the mothership to configure itself. It turns out that the call:

crServerGatherConfiguration( mothership );

Is actually where the server will wait for clients to connect to it. Because the mothership knows the connectivity of the node graph, the server knows how many clients to wait for and what network protocol they will be using, obviating the need for the WireGL-style handshake.

The other interesting thing that happens at the configuration step is that the server pretends to be one of its clients for a while. The server needs to know how big the entire logical output space is (to properly handle glViewport and glScissor calls), but the tile configuration for the server only tells it about its local pieces. So the server finds out the SPU id of one of its clients, "poses" as that SPU to the mothership, and asks to find out about all of the other servers in the current run and their tilings. It uses this information to find the server with the most extremal tile boundaries and compute the width and height of the entire logical output space.

Back in the server's main, a "base projection" is then computed, which allows the server to place a translate-and-scale matrix at the top of the projection matrix stack. This is because any screen tiling is handed by the server, not by an individual SPU.

Next, the server's dispatch table is built (analogous to, but not exactly the same as, the seethrough SPU's seethroughspuBuildFunctionTable function. The state tracker is told to use this function pointer table when it computes differences for context switching.

Finally, the server enters its main loop:

crServerSerializeRemoteStreams();

This function will loop forever and dispatch remote blocks of work to the SPU chain that it loaded at configuration time.

crServerSerializeRemoteStreams is actually a pretty simple function (it's in crserver/server_stream.c). It gets a client off the run queue, makes current to that client's context (causing a context difference to be evaluated), and executes blocks of work for that client until the client blocks. The code is fairly self-explanatory.

The function at the very end of crserver/server_stream.c merits explanation:

int crServerRecv( CRConnection *conn, void *buf, unsigned int len )
{
    (void) conn;
    (void) buf;
    (void) len;
    return 0; // Never handle anything -- let packets queue up per-client instead.
}

When the networking library is initialized (in main), it is passed this function as a "handler" for incoming packets. When a packet is received by the networking library, it makes sure that it is a valid message in the Chromium protocol, and then passes it immediately to the provided handler function. If the handler function does something with it, the function should return 1, and the message will be discarded. If, however, the handler function does not handle the message, it is passed to the default message handler.

The default message handler takes care of flow-control messages, reassembling fragmented packets, and queueing actual blocks of OpenGL work. Since the server's receive function always returns 0, any work blocks that arrive at the server are queued up on a linked list stored inside the CRConnection data structure. So the implementation of crNetGetMessage (called from crServerSerializeRemoteStreams) simply checks this queue, and if it is empty it grabs blocks from the networking library until a block of work arrives on the requested queue. In practice, this scheduling algorithm has proved to work well, although certainly more sophisticated schemes would be possible.

The Application Faker

The application faker, or "crappfaker", is one of the ugly system-dependent muddy-voodoo pieces of the system that is probably best left alone. It predates WireGL all the way back to the early Pomegranate simulations. The job of crappfaker is to launch a child process in such a way that it will find the Chromium OpenGL DLL.

On Windows, this is accomplished by creating a temporary directory, copying the executable there, copying crfaker.dll to that directory and renaming it as opengl32.dll, spawning the executable as a child, and deleting the directory when the child exits. Of course, if the child crashes, the directory will not be cleaned, so beware of thousands of copies of things lying around in temporary directories.

On UNIX, crappfaker is slightly less invasive. It creates a temporary directory and fills it with appropriately named symbolic links to libcrfaker.so. It then prepends that temporary directory to the LD_LIBRARY_PATH environment variable. Then the executable is spawned and the directory cleaned up (again, unless the executable crashes).

On Darwin, the process is about the same as on UNIX, but with a few extra steps. Due to the nature of frameworks on Darwin, the entirety of OpenGL.framework has to be 'created' temporarily in order to properly fake the dynamic linker. The OpenGL framework is a series of folders and symbolic links that contain all the OpenGL headers and binaries. The faker creates one similar to the actual OpenGL framework, putting the faker library in where needed. The faker then prepends the framework directory to the DYLD_FRAMEWORK_PATH environment variable before spawning the executable. See make_temp_framework in app_faker/app_faker.c for the framework.

crappfaker can also be told where the mothership is on the command line. A pointer to the faker DLL can be specified explicitly.

crappfaker will set an environment variable called CR_APPLICATION_ID_NUMBER, which is used by the OpenGL faker DLL to disambiguate itself from other fakers that might be running on the same machine (which can happen when debugging parallel programs on a uniprocessor).

crappfaker also sets an environment variable called __CR_LAUNCHED_FROM_APP_FAKER, which SPU's can use to tell if they were loaded manually or by the app faker. This can be useful if the SPU wants to behave in a slightly different way, or work around a bug. See spu/render/renderspu_window.c for an example.

The OpenGL Faker Library

This library, called crfaker.dll (or libcrfaker.so on UNIX and libcrfaker.dylib on Darwin), exports the OpenGL API to an application. When a context is created, the mothership is contacted, and a chain of SPU's is loaded. This all happens in the function StubInit in opengl_stub/load.c. Once the SPU's are loaded, the dispatch table from the head SPU is copied into global variables called "__glim_FuncName" (e.g., __glim_Color3f).

These variables are used to dispatch the actual OpenGL API to SPU functions. The dispatch method varies from platform to platform; see opengl_stub/Linux_exports.py for the most complicated one.

The SPU Loader

The SPU Loader, located in spu_loader/, is responsible for reading a SPU DLL from disk and building a dispatch table for it. It can also load a chain of SPU's.

Loading a single SPU is pretty straightforward. The SPU DLL is opened explicitly using crDLLOpen. Then, the SPU's single entry point, called SPULoad, is extracted and called. This returns several pieces of information to the loader:

The SPU's name
The SPU's SuperSPU name
The SPU's SPUInit function
The SPU's SPUSelf function
The SPU's SPUCleanup function (currently unused)

The loader will load the SuperSPU first with a recursive call to itself. Note that the loader will default to loading the error SPU if no SuperSPU is provided (unless, of course, the SPU being loaded is the error SPU).

Then, the SPUInit function is called. This function is passed the unique SPU ID given to this SPU by the mothership, a pointer to the (already built) dispatch table for the SPU immediately following the one being loaded in the chain (the "child" SPU), a pointer to its own SPU structure (from which the already-loaded SuperSPU can be accessed through the ->superSPU member), and two more (currently unused) arguments.

Based on the named function table returned (see "Writing a new SPU" and "Automatically generating code for a SPU"), the dispatch table is built by the function __buildDispatch, implemented in spu_loader/dispatch.c (which is generated by spu_loader/dispatch.py). This function will search for named functions through a chain of SuperSPU's.

The built dispatch table is then passed back to the SPU through the SPUSelf function. Currently, no SPUs actually use this, although this is a convenient way to get access to your own built dispatch table (including your parent's functions, where appropriate) without actually declaring all of your own functions as "extern" and calling them explicitly. This would be an improvement over the ugly "extern" function declarations used in the vertex array implementation in "Automatically generating code for a SPU".

To load a SPU chain, SPUs are simply loaded in reverse order, so we can pass the built dispatch table for a child to the upstream SPU.

The Packer

The library in packer/ creates buffers suitable for sending over a network out of the OpenGL API. It is fairly straightforward, and described in some detail in our Supercomputing 2000 paper. However, Chromium's packer has some interesting quirks that are worth mentioning:

The packer doesn't interact with the state tracker at all. Therefore, in order to pack any function that requires knowledge of state (glArrayElement and glTexImage are good examples), the caller of the pack function must provide the relevant state structure. This means that the burden of properly dispatching things to the state tracker lies with the calling SPU, which is a slightly cleaner design. See the functions in spu/pack/packspu_pixel.c for an example of this usage.
If you want to maintain multiple buffers for packing (like the TileSort SPU does), you need to swap them in and out yourself. The packer maintains its own pointers into the global pack buffer to tell it where the next opcode and data elements should go. If you are switching between pack buffers, you need to extract those pointers so they can be properly put back when it's time to switch back again. See spu/tilesort/tilesortspu_flush.c, and look for calls to crPackSetBuffer and crPackGetBuffer.
There are multiple versions of the packing functions for vertices. The simple ones (e.g., crPackVertex3f) just pack the vertex, and nothing more. There are BBOX packing functions (e.g., crPackVertex3fBBOX) that update a geometry bounding box as they pack. This logic is in the packer instead of in the tilesorter for two reasons. First, it would be inefficient to use an extra function call to keep track of the bounding box on the side. Second, other SPU's may want to track bounding box information. See the initialization of the bucketInfo variable in spu/tilesort/tilesortspu_bucket.c for an example of how to get at this data. Finally, functions are available that can track the bounding box and count the number of vertices (e.g., crPackVertex3fBBOX_COUNT). This is used in the TileSort SPU when the geometry packing buffer fills up in the middle of a primitive, since we need to know how many vertices to save away for re-issuing in a future buffer. See spu/tilesort/tilesortspu_pinch.c (but shield your eyes, this code is complicated).
The packer will use a callback function to inform the calling library that the buffer has filled up. The callback function can be set with crPackFlushFunc, and it takes a "void *" argument that can be set with crPackFlushArg.

The Unpacker

The unpacking library is considerably simpler than the packing library. It simply walks over a packed buffer, calling the functions of a SPUDispatchTable. The API to be used is passed to the crUnpack function -- see crserver/server_stream.c for an example.

The server also uses features of the unpacker to extract "network pointers". Network pointers are simply memory addresses that reside on another machine. Although they're not useful to the server itself, when the client wants the server to sent it some information, it can put its local memory address in a packet, and the local memory address will get sent back along with the response. The networking layer will then take care of writing the payload data to the specified address.

See spu/pack/packspu_net.c for an example of a non-trivial receive function (unlike the server's, which always returns 0 to let packets queue up) and usage of network pointers.

The State Tracker

We saw the state tracker in action in the "Automatically generating code for a SPU" section. The state tracker is much too complex to describe every detail here. For a description of how it all works, read our Eurographics/SIGGRAPH Hardware Workshop 2000 paper.

The best way to figure out how state tracking works is to actually step through some of the code. Load an application in the debugger using the Chromium OpenGL DLL, as described in "Debugging a SPU". Once the SPU's have been loaded, set breakpoints in various state calls that you think will happen, and see what they do. In particular, observing the behavior of crStateDiff in the tilesort SPU is very illuminating for the lazy state update process.