This document is intended to give an overview of how the various pieces of Chromium work and what's going on behind the scenes when you do one of these runs. It will probably be a grab-bag of useful nuggets of information in no particular order. Hopefully this will be useful if someone wants to experiment with changing some part of the core architecture instead of just extending it.
The mothership is usually the first thing that you will run for any
Chromium session. We've already described in detail how the scripts
themselves work, so let's focus on the mothership itself. The guts
of the mothership are invoked by the "Go
"
method of the CR
objects created in a
configuration script. The last line of a Chromium configuration
script will almost always be:
cr.Go()
This function actually takes a single argument: the port to listen
on. The default port is 10000, but it can sometimes be useful to
have the port be an argument to the script if, for example, sockets take
a minute to shut down properly (like they can on Linux). The
Go
method will create a socket on the
specified port, and accept connections from (possibly multiple
simultaneous) clients.
The mothership then calls ProcessRequest
,
which will read a single line of data from the connected client.
This line is split into words: the first word is considered to be a
command, and the rest of the words are arguments to that command.
So, for example, you can connect to the mothership (just telnet to port
10000) and send the following line:
spu 10
This will be broken by the mothership into the words "spu" and
"10". Then, the Python script will build an on-the-fly reference to
a method of the CR
class, using Python's
reflection API. This is done by the following lines of ProcessRequest
in mothership/server/mothership.py:
command = string.lower( words[0] )
print "command = " + command
try:
fn = getattr(self, 'do_%s' % command )
except AttributeError:
self.ClientError( sock_wrapper,
SockWrapper.UNKNOWNCOMMAND, "Unknown command: %s" % command )
return
fn( sock_wrapper, string.join( words[1:] ) )
The "getattr
" function tries to find a
method called, in our example "do_spu
". If the method is not found, an error is
returned to the calling client. Errors are reported with a
three-digit numeric code and a descriptive string. In this case,
the code SockWrapper.UNKNOWNCOMMAND
happens to be 402.
This is a lot like the way HTTP works -- a successful reply actually has
the ID 200.
If the function is found, the variable "fn
" becomes
a handle to it, and the rest of the arguments are combined into a single
string and passed as arguments to the function.
To continue this example, let's look at the do_spu
method:
def do_spu( self, sock, args ):
try:
spuid = int(args)
except:
self.ClientError( sock,
SockWrapper.UNKNOWNSPU, "Bogus SPU name: %s" % args )
return
if not allSPUs.has_key( spuid ):
self.ClientError( sock,
SockWrapper.UNKNOWNSPU, "Never heard of SPU %d" % spuid )
return
sock.SPUid = spuid
sock.Success( "Hello, SPU!" )
This method tries to convert the arguments (in our case, the single
string "10") into an integer. If the conversion fails, a "Bogus SPU
name" is returned. This can happen if the client says "spu
10.5
" or "spu 10 foo
" or "spu
clodburger
". Next, the system tries to find the SPU from the
given ID. Each SPU is assigned a unique numeric identifier by the
mothership when it is added to a node; these identifiers are communicated
to the application faker or the server when the SPU is loaded. The
ID is then passed as the first argument to the SPU's SPUInit
function.
If the SPU is found, the ID is stored along with the socket, so that
subsequent requests on the same socket are referring to a particular
SPU. This way, to get many configuration settings for a particular
SPU, you set the "current" SPU and then send many "spuparam
<param_name>
" commands in succession.
The mothership also has some rudimentary facilities for brokering out-of-band connections between components, although this is not completely implemented yet. In fact, the Myrinet implementation of the Chromium networking abstraction uses what does exist of this facility to establish its connections. Because Myrinet GM is a completely connectionless API, WireGL had to have a TCP/IP based handshake occur first in which the two computers agreed to use Myrinet and exchanged some information related to the connection. In Chromium, that information is exchanged via the mothership. This paragraph is deliberately vague because the exact mechanism is likely to change in the near future.
The next portion of the system likely to get run is the Chromium
server. The first thing that the server tries to do is figure out
where the mothership is. This can be specified with the
"-mothership" command line option. If this option is omitted, NULL
will be passed to crMothershipConnect
,
which will cause the mothership library to look for the CRMOTHERSHIP environment variable, and if that fails to
default to "localhost:10000
". Notice that the format
for specifying the location of the mothership is
"<host>:port
", although the port can be omitted.
Eventually, the mothership will remotely invoke the server (and the
application faker, which has a similar argument), so this will be a
non-issue. For now, the most likely scenario is that the mothership
will run on the same machine all the time, so you can set the
CRMOTHERSHIP environment variable to just the
name of that machine. The mothership requires TCP/IP to
work -- it can not work over GM.
The server next initializes the state tracking and networking subsystems, and then connects to the mothership to configure itself. It turns out that the call:
crServerGatherConfiguration( mothership );
Is actually where the server will wait for clients to connect to it. Because the mothership knows the connectivity of the node graph, the server knows how many clients to wait for and what network protocol they will be using, obviating the need for the WireGL-style handshake.
The other interesting thing that happens at the configuration step is
that the server pretends to be one of its clients for a while. The
server needs to know how big the entire logical output space is (to
properly handle glViewport
and glScissor
calls), but the tile configuration for the
server only tells it about its local pieces. So the server finds
out the SPU id of one of its clients, "poses" as that SPU to the
mothership, and asks to find out about all of the other servers
in the current run and their tilings. It uses this information to
find the server with the most extremal tile boundaries and compute the
width and height of the entire logical output space.
Back in the server's main
, a "base
projection" is then computed, which allows the server to place a
translate-and-scale matrix at the top of the projection matrix
stack. This is because any screen tiling is handed by the server,
not by an individual SPU.
Next, the server's dispatch table is built (analogous to, but not exactly
the same as, the seethrough SPU's
seethroughspuBuildFunctionTable
function. The state tracker is told to use this function pointer
table when it computes differences for context switching.
Finally, the server enters its main loop:
crServerSerializeRemoteStreams();
This function will loop forever and dispatch remote blocks of work to the SPU chain that it loaded at configuration time.
crServerSerializeRemoteStreams
is actually
a pretty simple function (it's in crserver/server_stream.c). It gets a client off the
run queue, makes current to that client's context (causing a context
difference to be evaluated), and executes blocks of work for that client
until the client blocks. The code is fairly self-explanatory.
The function at the very end of crserver/server_stream.c merits explanation:
int crServerRecv( CRConnection *conn, void *buf, unsigned int len )
{
(void) conn;
(void) buf;
(void) len;
return 0; // Never handle anything -- let packets
queue up per-client instead.
}
When the networking library is initialized (in main
), it is passed this function as a "handler" for
incoming packets. When a packet is received by the networking
library, it makes sure that it is a valid message in the Chromium
protocol, and then passes it immediately to the provided handler
function. If the handler function does something with it, the
function should return 1, and the message will be discarded. If,
however, the handler function does not handle the message, it is
passed to the default message handler.
The default message handler takes care of flow-control messages,
reassembling fragmented packets, and queueing actual blocks of OpenGL
work. Since the server's receive function always returns 0, any
work blocks that arrive at the server are queued up on a linked list
stored inside the CRConnection
data structure.
So the implementation of crNetGetMessage
(called from crServerSerializeRemoteStreams
) simply checks this
queue, and if it is empty it grabs blocks from the networking library
until a block of work arrives on the requested queue. In practice,
this scheduling algorithm has proved to work well, although certainly
more sophisticated schemes would be possible.
The application faker, or "crappfaker", is one of the ugly system-dependent muddy-voodoo pieces of the system that is probably best left alone. It predates WireGL all the way back to the early Pomegranate simulations. The job of crappfaker is to launch a child process in such a way that it will find the Chromium OpenGL DLL.
On Windows, this is accomplished by creating a temporary directory, copying the executable there, copying crfaker.dll to that directory and renaming it as opengl32.dll, spawning the executable as a child, and deleting the directory when the child exits. Of course, if the child crashes, the directory will not be cleaned, so beware of thousands of copies of things lying around in temporary directories.
On UNIX, crappfaker is slightly less invasive. It creates a temporary directory and fills it with appropriately named symbolic links to libcrfaker.so. It then prepends that temporary directory to the LD_LIBRARY_PATH environment variable. Then the executable is spawned and the directory cleaned up (again, unless the executable crashes).
On Darwin, the process is about the same as on UNIX, but with a few extra
steps. Due to the nature of frameworks on Darwin, the entirety of
OpenGL.framework has to be 'created' temporarily in order to properly
fake the dynamic linker. The OpenGL framework is a series of folders and
symbolic links that contain all the OpenGL headers and binaries. The
faker creates one similar to the actual OpenGL framework, putting the
faker library in where needed. The faker then prepends the framework
directory to the DYLD_FRAMEWORK_PATH
environment variable before spawning the executable. See make_temp_framework
in app_faker/app_faker.c for the framework.
crappfaker can also be told where the mothership is on the command line. A pointer to the faker DLL can be specified explicitly.
crappfaker will set an environment variable called CR_APPLICATION_ID_NUMBER, which is used by the OpenGL faker DLL to disambiguate itself from other fakers that might be running on the same machine (which can happen when debugging parallel programs on a uniprocessor).
crappfaker also sets an environment variable called __CR_LAUNCHED_FROM_APP_FAKER, which SPU's can use to tell if they were loaded manually or by the app faker. This can be useful if the SPU wants to behave in a slightly different way, or work around a bug. See spu/render/renderspu_window.c for an example.
This library, called crfaker.dll (or
libcrfaker.so on UNIX and libcrfaker.dylib on Darwin), exports the OpenGL API to an
application. When a context is created, the mothership is
contacted, and a chain of SPU's is loaded. This all happens in the
function StubInit
in opengl_stub/load.c. Once the SPU's are loaded, the
dispatch table from the head SPU is copied into global variables called
"__glim_FuncName
" (e.g., __glim_Color3f
).
These variables are used to dispatch the actual OpenGL API to SPU functions. The dispatch method varies from platform to platform; see opengl_stub/Linux_exports.py for the most complicated one.
The SPU Loader, located in spu_loader/, is responsible for reading a SPU DLL from disk and building a dispatch table for it. It can also load a chain of SPU's.
Loading a single SPU is pretty straightforward. The SPU DLL is
opened explicitly using crDLLOpen
. Then,
the SPU's single entry point, called SPULoad
, is extracted and called. This returns
several pieces of information to the loader:
SPUInit
function
SPUSelf
function
SPUCleanup
function
(currently unused)
The loader will load the SuperSPU first with a recursive call to itself. Note that the loader will default to loading the error SPU if no SuperSPU is provided (unless, of course, the SPU being loaded is the error SPU).
Then, the SPUInit
function is called.
This function is passed the unique SPU ID given to this SPU by the
mothership, a pointer to the (already built) dispatch table for the SPU
immediately following the one being loaded in the chain (the "child"
SPU), a pointer to its own SPU structure (from which the already-loaded
SuperSPU can be accessed through the ->superSPU
member),
and two more (currently unused) arguments.
Based on the named function table returned (see "Writing a new SPU" and "Automatically generating code for a SPU"), the
dispatch table is built by the function __buildDispatch
, implemented in spu_loader/dispatch.c (which is generated by spu_loader/dispatch.py). This function will search
for named functions through a chain of SuperSPU's.
The built dispatch table is then passed back to the SPU through the
SPUSelf function. Currently, no SPUs actually use this, although
this is a convenient way to get access to your own built dispatch table
(including your parent's functions, where appropriate) without actually
declaring all of your own functions as "extern
" and calling
them explicitly. This would be an improvement over the ugly
"extern
" function declarations used in the vertex array
implementation in "Automatically generating code
for a SPU".
To load a SPU chain, SPUs are simply loaded in reverse order, so we can pass the built dispatch table for a child to the upstream SPU.
The library in packer/ creates buffers suitable for sending over a network out of the OpenGL API. It is fairly straightforward, and described in some detail in our Supercomputing 2000 paper. However, Chromium's packer has some interesting quirks that are worth mentioning:
glArrayElement
and
glTexImage
are good examples), the caller of the pack function
must provide the relevant state structure. This means that the
burden of properly dispatching things to the state tracker lies with
the calling SPU, which is a slightly cleaner design. See the
functions in spu/pack/packspu_pixel.c for an
example of this usage.
crPackSetBuffer
and
crPackGetBuffer
.
crPackVertex3f
) just pack the vertex, and nothing
more. There are BBOX packing functions (e.g., crPackVertex3fBBOX
) that update a geometry bounding box
as they pack. This logic is in the packer instead of in the
tilesorter for two reasons. First, it would be inefficient to use
an extra function call to keep track of the bounding box on the
side. Second, other SPU's may want to track bounding box
information. See the initialization of the bucketInfo
variable in spu/tilesort/tilesortspu_bucket.c
for an example of how to get at this data. Finally, functions are
available that can track the bounding box and count the number
of vertices (e.g., crPackVertex3fBBOX_COUNT
). This is used in the
TileSort SPU when the geometry packing buffer fills up in the middle of a
primitive, since we need to know how many vertices to save away for
re-issuing in a future buffer. See spu/tilesort/tilesortspu_pinch.c (but shield your eyes,
this code is complicated).
crPackFlushFunc
, and it takes a
"void *
" argument that can be set with crPackFlushArg
.
The unpacking library is considerably simpler than the packing
library. It simply walks over a packed buffer, calling the
functions of a SPUDispatchTable
. The API
to be used is passed to the crUnpack
function -- see crserver/server_stream.c for an
example.
The server also uses features of the unpacker to extract "network pointers". Network pointers are simply memory addresses that reside on another machine. Although they're not useful to the server itself, when the client wants the server to sent it some information, it can put its local memory address in a packet, and the local memory address will get sent back along with the response. The networking layer will then take care of writing the payload data to the specified address.
See spu/pack/packspu_net.c for an example of a non-trivial receive function (unlike the server's, which always returns 0 to let packets queue up) and usage of network pointers.
We saw the state tracker in action in the "Automatically generating code for a SPU" section. The state tracker is much too complex to describe every detail here. For a description of how it all works, read our Eurographics/SIGGRAPH Hardware Workshop 2000 paper.
The best way to figure out how state tracking works is to actually step
through some of the code. Load an application in the debugger using
the Chromium OpenGL DLL, as described in "Debugging a SPU". Once the SPU's have been
loaded, set breakpoints in various state calls that you think will
happen, and see what they do. In particular, observing the behavior
of crStateDiff
in the tilesort SPU is very illuminating for the lazy state update
process.