Alpaca: Interactive Debugging Tools

Motivations

Systems are expected to exhibit more faults and errors when their size or complexity is increased. This happens not only because of larger number of constituent parts, but also due to many more interactions between different parts of the system. For a system which consists of N mutually interconnected components, the number of interactions grows at least as N2. All the errors, which for a small-size application have very low probabilities and therefore can remain virtually undetected for long time, are tested at much higher event rates in a large system. This is particularly true for the errors, related to the interactions between the components of the system, so it is expected that such errors will be dominating large systems. In order to address this entirely new class of errors, system engeneers and researchers will need appropriate high-level debugging tools.

There are several reasons why low-level debuggers are not quite suitable for this task. First of all, application to be debugged needs to be compiled with debugging flags included, and optimization options reduced or dropped. This alters the application itself, and in parallel environment can lead to completely different behaviours. The problem becomes even more apparent at large scales: there are undesired behaviours which only appear at such scales and are very sensitive to the interconnection speed. Secondly, debugging with tools such as TotalView requires submitting an interactive jobs. On some machines, debugging queues for interactive jobs are limited to only a few nodes, thus a large-scale error could not be reproduced. Finally, universal tools don't take into account specifics of the problem at hand. Having a flexible open-source tool, which can be embedded directly into the application without hampering its performance, is essential and necessary for successful development of large-scale parallel programs.

Description of the approach within Cactus/Carpet framework

Alpaca debugging tools are based on a self-written secure webserver HTTPS and its extension Visualisation. Together they allow a remote user to connect to a running parallel simulation from a browser through a secure SSH connection, and interact with it in various ways:

The thorn HTTPS implements a secure web browser and can be run with an X.509 certificate authorization. If this security option is turned on, unauthorized users will not be able to get access to or interfere with the simulation. The procedure of X.509 certificate installation and prerequisites is described below in section "Using X.509 certificates" section.

Schedule Tree
Schedule tree

Cactus contains a rule-based scheduler, that determines which functions of which thorns should be called and in what order. The calling schedule consists of a few basic scheduling bins, outlining general structure of a simulation (startup, parameter check, initialization, evolution, analysis etc.) Thorns can register new schedule groups and put their functions to be called in specific time bins or groups. This nested structure is called schedule tree.

Schedule tree is visualized by the HTTPS thorn. The image on the left shows an example of the schedule tree, currently paused at HTTPS_Work function, which is scheduled in CCTK_POSTSTEP timebin. The image also shows a breakpoint, set in function CarpetIOHDF5_CloseFiles. User can single-step through individual function calls and set breakpoints to pause the simulation before a specific call. This allows to monitor the simulation on a fine-grained level and determine a specific function responsible for undesired behaviour.

Installation

Basic installation of Alpaca tools is described in detail in the Installation section. Debugging interface is implemented in the thorns HTTPS and Visualisation, they can be downloaded from subversion repository:

 svn co --username cvs_anon https://svn.cct.lsu.edu/repos/cactus/Alpaca/HTTPS
 svn co --username cvs_anon https://svn.cct.lsu.edu/repos/cactus/Alpaca/Visualisation

Thorn HTTPS depends on the Socket interface, which is implemented, for example, in CactusConnect/Socket. The latter needs SSL library installed in the system (which is already there for most of the systems).

This is all that's necessary to install Alpaca debugging tools. If you are using X.509 authorization and your certifying authority (CA) is not in the list of builtin CAs, you will need to copy your CA's certificate to HTTPS/src/builtin-trusted-CAs.

Usage
Connecting to a webserver

A simulation with thorn HTTPS activated will announce the port it will be using for incoming HTTP requests to standard output:

INFO (HTTPS): HTTPS web server started on:

        https://numrel09.cct.lsu.edu:5555

INFO (HTTPS): No PBS_O_HOST or SGE_O_HOST environment variable is set.  Assuming direct connection to the internet, no HTTPS proxy server will be launched.
Simulation Home Page
Simulation Home Page

If you cannot directly access the location, announced by HTTPS, you will need to create an SSH tunnel (see below). For example, if you have created an SSH tunnel between the announced location and localhost:5555, simply enter https://localhost:5555/ in your browser's address line.

You will be presented with a simulation home page, showing general statistics and information about your simulation. On the left, there is a user menu with different options.


Creating an SSH tunnel

Most of the supercomputing centers don't allow direct HTTP connections to their cluster nodes from outside. Therefore, in order to connect to a running simulation, you will first need to establish an SSH tunnel between your local machine and the root node of your simulation.

On a UNIX/Linux machine, the following command will create an SSH tunnel, mapping local port 5555 to a port 5555 on remote machine:

ssh -f -L 5555:localhost:5555 user@remote.host.org -N
X.509 authorization

To use an X.509 certificate authorization, you will first have to obtain such a certificate. The thorn HTTPS contains a few self-signed certificates of the root CAs (certifying authorities), located at HTTPS/src/builtin-trusted-CAs. All certificate files in this subdirectory are assumed to be trustworthy, i.e. they will not be checked for validity during SSL negotiations in the HTTPS handshake protocol.

Builtin root CA certificates

Currently, the thorn HTTPS contains the following certificates by default:

dd4b34ea.0
This is a GermanGrid user certificate, which can be used by physicists at AEI. It is issued by the Forschungszentrum Karlsruhe (which acts as a root CA). More information on this CA can be found on http://grid.fzk.de/ca .
a3bf9f3c.0
A LONI certificate file, used by physicists at CCT/LSU. It is issued by the Louisiana Optical Network Initiative. Information on how to request a LONI certificate can be found on https://docs.loni.org/wiki/Requesting_a_LONI_Grid_Certificate .
617ff41b.0
A KEK Grid certificate, used by physicists at University of Tokyo. Information on how to request a KEK Grid certificate can be found on https://gridca.kek.jp/ .

You can view the full information contained in a certificate file with:

openssl x509 -in  -text -noout
Converting your certificate files to PKCS12 format

The certificate you requested might come in .pem format (supplied in two files, usercert.pem and userkey.pem). Some browsers, such as Firefox, cannot import certificates in .pem format, so you will need to convert your .pem certificate into PKCS12 format (.p12). Converting requires OpenSSL installation; you can either install it on your machine, or copy the usercert.pem and userkey.pem files to a machine which already has it.

The following command can be used to convert .pem into .p12:

cd ~/.globus
openssl pkcs12 -export \
    -in usercert.pem -inkey userkey.pem \
    -out usercert.p12 \
    -name "LONI Cert"    # Choose a name for your certificate

You will be asked your grid certificate password. You will also have to invent another password to use when importing the certificate into a browser (the "export password"):

Enter pass phrase for userkey.pem: *******
Enter Export Password: *****
Verifying - Enter Export Password: *****

This will produce the usercert.p12 file.

Importing X.509 certificates in Firefox (v.2.0.0)

To access a website protected by X.509 authorization, your browser needs:

Firefox preferences dialog

To import your certificate into firefox, open menu Edit > Preferences > Advanced > Encryption > View Certificates and choose tab Your Certificates.


Certificates manager in Firefox

Click on Import to import your file usercert.p12. You will be prompted for your export password, which you created when converting your certificate files into PKCS12 format.


Adding new CA certificate

To import your CA's certificate, open the same dialog and choose tab Authorities. Click on Import and supply filename of you CA's certificate file. The same certificate file needs to be placed into the list of CAs that are trusted by HTTPS webserver; if it's not already there (see list of builtin trusted CAs), you need to place it to HTTPS/src/builtin-trusted-CAs/ and recompile your Cactus executable.


After this, you should be able to present your certificate in your browser by choosing it from a list, whenever you are challenged by a webserver with X.509 security type.

Troubleshooting

TODO