The following questions and their corresponding answers were gleaned from
the chromium-devel and chromium-user mailing lists over the last couple
of years.
A: Chromium has primarily been developed and debugged on Linux and other UNIX platforms. While it does work on Windows and Darwin (OS X), there are known problems.
Most notably, display lists are problematic. If at all possible, try
using Linux instead.
If you're experiencing trouble, be sure to compile Chromium with the
RELEASE
option in options.mk turned off. This will enable extra debugging
information that can be useful in times of trouble. Chromium can be
extremely verbose in this mode, but sometimes the warning messages will
tell you something helpful. Additionally, setting the CR_DEBUG environment variable causes some extra warnings
to be printed to standard error.
If your application uses OpenGL selection or feedback, you have be sure to add the feedback SPU to your network. This SPU implements those features.
Try inserting the print SPU into your network at various points. As mentioned in the "Useful SPUs" section the print SPU writes a human-readable version of the OpenGL stream to a file. This can be valuable if you aren't entirely sure what's in the stream. More than one print SPU can help determine how the stream is being processed through the pipeline.
The problem might have to do with interaction with your vendor's OpenGL driver.
Try using Mesa instead of the system's OpenGL, and set the MESA_DEBUG environment variable to print warnings to standard error. Try increasing the size of the global Chromium MTU. his has implications for the network layers of Chromium. The optimum setting is dependent on your switch fabric, though this is a bit of black magic. Sometimes this can "magically" fix problems with tilesort.
cr = CR()
cr.MTU(1024*1024)
If you're having problems with the tilesorter, try setting it to "broadcast" mode. In this mode, all geometry is sent to all tiles, regardless of layout:
tilesort = SPU("tilesort")
tilesort.Conf("bucket_mode", "broadcast")
A: You may have to set the display_string parameter for render SPU in your config file.
Was it ever established that the 5336 drivers worked with Cr-1.5
Was it ever established that the 5336 drivers worked with Cr-1.5 or 1.6 under RH9 with e.g. the NV 980XGL or 3000's? I'm seeing nearly immediate segfaults (floating point exception), booted either SMP or uni. Cr is fine with the 4496 drivers.
A: The problem depends on which driver you have installed.
I found that the 53.xx drivers always crashed when a render (or readback) SPU was used on an application node (ala sort-last). But the driver worked fine when the render SPU was on a network node (ala sort-first). Others report crashes no matter which configuration they used.
The thread local storage issue (RH9) is another dimension to the problem.
I'm still on RH8 and haven't seen it first hand, but setting CR_SYSTEM_GL_PATH to /usr/lib/tls fixes some problems on RH9.
NVIDIA is aware of the problem and it should be fixed in a future driver
release.
I'm seeing nearly immediate segfaults (floating point exception), booted either SMP or uni. Cr is fine with the 4496 drivers.
A: The problem depends on which driver you have installed.
I found that the 53.xx drivers always crashed when a render (or readback) SPU was used on an application node (ala sort-last). But the driver worked fine when the render SPU was on a network node (ala sort-first).
Others report crashes no matter which configuration they used.
The thread local storage issue (RH9) is another dimension to the problem. I'm still on RH8 and haven't seen it first hand, but setting CR_SYSTEM_GL_PATH to /usr/lib/tls fixes some problems on RH9. NVIDIA is aware of the problem and it should be fixed in a future driver release.
A: It appears that any configuration that
uses the render SPU on a crappfaker node leads to a segfault inside the NVIDIA
glXMakeCurrent
function. In particular,
sort-last configurations aren't working.
I've been digging into this today but I still haven't figured out the root of the problem. I'd like to try to narrow down the problem a bit further before reporting it to NVIDIA. In the mean time, it might be best to stick with the 44.96 drivers.
(gigabit-eth, nvidia) , but there seems to be a performance problem. All applications we tested result in an framerate of 3-6 fps. There have been ails in this list regarding this problem but no solution was replied. Can anyone give me a hint how to solve this performace problem?
A: There are LOTS of dependencies here. If you are reading back in RGB mode, Nvidia and ATI boards will only get you ~150MB/s. RGBA is a little faster on Nvidia boards and is a LOT faster on ATI boards (600+MB/s). Depth readback on both boards is very slow, ~80MB/s. Now, the network transmit doesn't overlap the readback, so things start to get really slow. Basically, if you are doing RGB readback and transmit of 1024x1024 images over GigE, you will only get ~20fps in the best case. This assumes that rendering in your application is free. However, if your app runs at ~20fps, then the total cost of rendering + readback + network transmit will yield about 10fps. And I'm of course discounting the drawpixels cost I'm assuming you are doing on each frame, but this tends to be fast.
Faster readback, by switching to RGBA readback and avoiding depth readback, or using different hardware will help. A faster network will also help. I've yet to see PCI Express boards that deliver better readback performance, but it is promised from both vendors.
The readback SPU also supports bounding box reads if you can supply them. I believe it can do both object space as well as screen space bounding boxes. This can help limit the cost of readbacks to only the pixels that actually matter.
Here is the list of things to try:
Q: I'm getting a: "An operation on socket was not performed because system lacks sufficient buffer space or a queue was full" "Bad juju: 0 ... on sock 728"
I am getting this on the crserver side, when try to run an application through chromium using crdemo.conf. I start crappfaker, the application tries to execute (it's initial window appears), then it crashes with the above message.
I attempted to make the MTU smaller and larger, but this did not affect the results. I believe it is an MTU size issue. Is that right? or are there other things which would cause this error?
One thing I do not understand is that the same application, running with crdemo.conf, works fine on a machine with an old nVidia card. My problems occur when running on a different machine with an ATI card, as well as with a 3000G card. I am running Windows on each of these machines.
The application I am trying to run is a simple texture map of a single bitmap. I have 2 different bitmaps, one is 28.6 MB, the other is 151 MB.
I can run the application using my 28.6 MB .bmp through Chromium just fine. The 151 MB .bmp is the one that bails out. Both cases run fine without Chromium. Thus, I believe it must be an issue with Chromium's handling of such a large image.
It is not a video card issue, as I first thought. The large image doesn't run through Chromium on either of my setups.
Is there a known limit to image sizes Chromium can handle? I am also wondering if it has to do with the Windows OS?
A: gluBuild2DMipmaps
will scale your image down to the max
supported texture size. Otherwise, passing images that are too large to
glTexImage2D
will generate a GL error.
A: The best way to profile Chromium is to use OProfile. Read the docs carefully before proceeding. Will Cohen at Red Hat has very kindly put up copies of his OProfile tutorials at http://people.redhat.com/wcohen/.
There are several things to remember when using OProfile (from Will's FAQ):
When I run crdemo.conf on a Fedora Core 2 system running Xorg, the default crdemo.conf puts up two render SPU windows + the app window. Under Fedora Core 1 running XFree, the normal app window + one single render SPU window comes up. If I run crappfaker on the FC1 system and the server on the FC2 system, I get the app window on the FC1 system and a single render SPU window on the FC2 system. However, if I run crappfaker on the FC2 system and crserver on the FC1 system, I get the app window on the FC2 system and two render SPU windows on the FC1 machine. This gets really funky with tilesort spus and multiple hosts, so I though I would start with the simple case.
Has anyone else seen this behavior? Any ideas on how to fix it? I'm currently running ATI boards with 3.7.6 drivers on all systems involved.
A: Say NO to FreeGLUT! Everything is working again with RH9's glut package. What the heck is wrong with how freeglut works?... grumble, grumble.
Looks like the second window/context comes from the FreeGLUT menu code. The original GLUT used Xlib to draw its pop-up menus but FreeGLUT uses OpenGL. A fix was applied on June 27/2004.
The next problem I've come into now is that chromium really doesn't like closing through myrinet. Whenever I try exiting a program (which have only been the bundled test programs so far) all the cluster machines freeze and I have to manually ssh to each of them and kill all user tasks. At one point there were 2 computers that I couldn't even log into and had to reboot them manually. In TCP/IP, all I did to close the program was use the command through the gui, or a simple ctrl+c on the python script and the app faker and all was well. Is there something i'm doing wrong when closing the program? Surely using the simple "exit" command from the program itself shouldn't be causing this big a problem. It's getting to be a pain to manually go in each server and end all tasks every time I do a test run. What could be going wrong?
A: Alas, this is "normal behavior". The main issue here is that GM layer can't figure out for sure when a connection has been dropped like TCPIP does. GM is a "connectionless" layer. Eventually, the boards will timeout on receives and the apps will exit, at least they do on my cluster after about 15 seconds. There is not a clean way that I have found to handle bailing out. The problem is that all the nodes end up waiting, blocked or spinning, on a receive. There is no easy way to send a tcpip packet to them and get them to bail out. One easy fix is to setup autostart in the conf file as well as an autokill with the mothership goes way. This is how we kill things off on our cluster.
There are several options if you want to move away from the GM layer. You can run IP over GM using the IP support in the driver and then use the tcpip layer. Or, you can try running the new SDP layer with Sockets GM, which is also in the (new?) driver.
The main issue with the GM layer is that very few people currently use it and the documentation from Myricom on their hardware is really bad. There are a few people trying to work out MPI layers for Chromium, so that might solve our problems in general with highspeed layers as they tend to ship with solid MPI support.
The head done of our Chromium cluster (display wall) is a node inside a Beowulf cluster. This way we can run code and visualize it at the same time. We just upgraded our Beowulf cluster from Athlons to Opterons. The Chromium cluster, except for the head node, is still running P4s. I compiled Chromium 1.7 on the Opteron box, running Fedora Core 2 for X86_64, and tried to run the demos on the Chromium cluster to some interesting results. The "X" that represents the mouse on the X-displays of the cluster disappears once I run Atlantis but no image is displayed on the wall. On the head node I see the Atlantis demo running and I get stats on fsp from Atlantis.
I'm wondering if it might just be that the head node, being 64 bit, is not playing nice with the 32 bit cluster behind it. Any suggestions of fixes that I might be able to make?
A: We had to change our python script with Chromium 1.7. We have something like the following:
render_spu = SPU ('render')
render_spu.Conf ('display_string', 'catacomb00:0')
render_spu.Conf ('window_geometry', [0, 0, 1600, 2400])
render_spu.Conf ('system_gl_path', '/usr/lib/tls')
Are you setting the display string with each render spu?
Has anyone been able to compile Chromium under Fedora Core 2 on an Opteron box? I think the problem might be that gcc is trying to link against the 32 bit libraries instead of the 64 bit ones.
A: Yep. Worked for me once I installed freeglut and set the CR_SYSTEM_GL_PATH environment variable to /usr/lib64/tls. I've had strange problems with Fedora 2 and the nVidia drivers, though... after installing the NVIDIA patches from minion.de, I get hardware accelerated 3D (i.e., NVIDIA reported as the vendor by glxinfo and reasonable speeds for OpenGL applications) but horrible 2D performance. The only exception seems to be Chromium, which is as slow as most 2D apps (14 fps on atlantis). I guess I'll have to wait for nVidia's next driver release because they don't officially support 2.6 kernels on the AMD64 yet. But if anyone has suggestions...
Hi. I have just installed Chromium 1.6 (due to segmentation faults with 1.7) and am attempting to run an OpenGL application with a simple vertex shader across a mural. My experience with Chromium is inimal, so please excuse my naiveness.
A: Try running glxinfo. Are you sure the right extensions are available on your cluster ?
Erm... You're right. The cluster machines don't support any vertex_program extensions. (I'm sorry to not have known this) This makes sense -- I've been compiling on a machine that does support vertex programs, but when the program is propagated by the mothership out to the cluster, it's just rendering normally. That is, without the vertex shader.
A: Look at simplemural.conf make sure that you have:
node.Conf('vertprog_projection_param', 'ModelViewProj')
I'm trying to bring up Chromium on a RedHat 9 system, and apparently they have gone from glibc version 2.2 to 2.3 in this transition. Apparently thread level storage (TLS) is handled differently at the new revision, and this is causing serious confusion for the fit between Chromium and libGLcore for NVIDIA support. I've appended a chunk of the NVIDIA README text on the subject. Has anyone successfully run Chromium with NVIDIA drivers under RH 9?
A: Set your CR_SYSTEM_GL_PATH to /usr/lib/tls.
Has anybody any experience compiling and using Chromium on an SGI Altix? I am here trying to do so now on SGI ProPack Linux 3 [Red Hat Advanced Server 2.1] and am getting the following problems:
Linking ../built/crunpacker/Linux/libcrunpacker.so
/usr/bin/ld: ../built/crunpacker/Linux/unpack.o: @gprel relocationagainst
dynamic symbol cr_unpackData
A: The only system that this appears on is on an IA64.
It usually means that an object not compiled with -fPIC is linked into a shared library.
Richard Henderson's rationale for this is as follows:
"Such relocations cannot be resolved by the dynamic linker,since it doesn't know what the GP base value is. Also, there is the implicit intent that the symbol was _supposed_ to be resolved locally."
Add an -fPIC to CFLAGS
in config/Linux.mk and see if that works.
So before I plunge into trying to port Chromium over to Mac OS X, I thought it would be prudent to ask if anyone has done this already. Or maybe the Chromium team has plans to release a Mac OS X version?
A: It's on the list of things to do. Much
of the port has already been done.
--
Greg Humphreys, Assistant Professor
Department of Computer Science, University of Virginia
http://www.cs.virginia.edu/~humper/
I was wondering if anyone can help, I have been experimenting with a number of different configurations and on some I get a strange flickering when my image gets rebuilt. I have created a sort last config and it would seem that every other frame is white so the image flickers. I hacked about with reassemble.conf to mirror my config and the same happens there.
A: Use Barriers to synchronize the application instances. See progs/psubmit/psubmit.c
If you are trying to use sort-last under windows, this is a known problem. Something fishy happens with the SwapBuffers call under windows, so Chromium tries to composite twice. This behavior happens with the readback and binaryswap SPUs. If you run a 'Chromium' app like psubmit, everything seems fine. The issue appears to be related to the application using putting up a window as well as the render SPU. Nobody has really tracked this bug down since people mostly use Linux. Under Linux, everything should be fine since we run a volume renderer on a regular basis in this setup. It is an unmodified glut app at themoment. Atlantis works for me as well.
A: Chromium supports Infiniband both with the native (Verbs) interface and SDP. When you call AddServer() in your Python configuration script, you can specify "ib" or "sdp" as the protocol, respectively. You must have first compiled Chromium with InfiniBand/SDP support. See the top-level options.mk file for more information.
Socket Direct Protocol (SDP) uses InfiniBand through ordinary Unix network sockets. The advantage of using SDP over the Verbs interface is that connections between Chromium nodes should close cleanly upon exit. With the "ib" protocol, you'll typically have to manually kill crservers, etc. after you exit because the Verbs interface is bad at detecting broken connections.
If you're using SDP, Chromium will need to address hosts by their "SDP hostname" which is typically the hostname suffixed with something like "sdp" or "ib". Chromium defaults to using "sdp". A different suffix can be specified with the CR_SDP_SUFFIX environment variable. Check your /etc/hosts file to see what your system uses.
Make sure your InfiniBand/SDP environment is working properly before trying to use it with Chromium. InfiniBand vendors typically have test programs for this.
A: Usually, this is a problem with your /etc/hosts files. If you run the 'hostname' program and don't get a fully-qualified domain name, this is the likely problem. Make sure your /etc/hosts file(s) looks like this:
127.0.0.1 localhost.localdomain localhost 192.168.0.1 foobar.baz.com foobar
Alternately, you might try editing the mothership/server/mothership.py script and add your hostnames and domains to the HostPrefixPairs list.