[Developers] Timing Inaccuracies

John Shalf jshalf at lbl.gov
Thu Aug 12 12:45:50 CDT 2004


OK,
here is an example of crummy timing on the Cray X1.  Included in this  
message is the execution profile collected by the hardware performance  
counters.  This is followed by the Cactus timers.

The profile, collected by pat_hwpc (hardware performance counters), is  
completely inconsistent with the data collected by the Cactus timers.   
The PAT profile indicates that the boundary condition calculations are  
taking 15% of the execution time whereas Cactus reports that it is only  
taking < 2% of the evolution time.  A more detailed profile indicates  
that most the cactus timing information for events that sum to <  
10seconds is completely inconsistent with the hardware performance  
counters.

The inconsistencies were far worse on the Earth Simulator, but I don't  
have the ftrace profiles available to show you the inconsistencies  
because the ES is not connected to the network and the SX6a I've been  
using for testing is no longer available.  (hopefully the Stuttgart  
accounts will arrive soon)  The SX-6 timing is inaccurate in the  
opposite direction as the X1.  It reports that boundary condition  
calculations take 50% of the runtime on the SX6a whereas they actually  
took about 20-30% of the runtime.

As a sanity check (for minimum capability), the sum of times collected  
for individually timed events should match the total simulation time.   
Cactus timers flunk these tests consistently on faster machines like  
the X1 and the Earth Simulator.  I find that it is largely accurate on  
slower systems, but once you start running on systems that hit  
4Gigaflops/processor (actual performance for BSSN), you run into  
serious timing inaccuracies.

With this, I would like to close any debate as to whether  
gettimeofday() provides accurate timing -- it does not.  The use of  
gettimeofday() reasonable given its portability (its nice to have a  
baseline implementation), but it is important to support some more  
contemporary approaches to accurate timing. Otherwise, the timing  
information collected by Cactus cannot be trusted consistently across  
platforms.

  I don't advocate eliminating the gettimeofday() approach.  Rather, I'm  
asking that the implementation of the timers be more modular so that  
you can use a number of more accurate timer implementations where  
available.  The API for the cactus timers needn't change, but it must  
be easier to select from a menu of different implementations for the  
guts.  However, the timer API should report (to the best of its  
ability) the granularity of the clock so that you can automatically  
determine whether the timers were used to measure events that are too  
short to time accurately.  The timers should be able to perform such a  
consistency check internally.

-john

----------PAT_HWPC Hardware timer profile-----------------

Table 1:  -d samples%,cum_samples%,samples,counters
           -b pe,ssp,function

   Samp% | Cum.Samp% |  Samp |PE=0
                             |SSP=0
                             |Function

  100.0% |    100.0% | 10745 |Total
|-------------------------------------
|  44.9% |     44.9% |  4825 |MoL_ICNAdd
|  29.8% |     74.7% |  3204 |adm_bssn_sources_
|  15.8% |     90.6% |  1702 |ApplyBndRadiative
|   1.9% |     92.5% |   205 |adm_bssn_init_
|   1.9% |     94.4% |   202 |%__exp
|   0.9% |     95.3% |    98 |gethostname

|  Truncated because cumulative % of Samp exceeds 95.


---------------Cactus Timers--------------------------------------------

======================================================================== 
===========================
Thorn           | Scheduled routine in time bin           |  
gettimeofday [secs] | getrusage [secs]
======================================================================== 
===========================
BSSN_MoL        | Register provided slicings              |           
0.00054500 |       0.00020700
CartGrid3D      | Register GH Extension for GridSymmetry  |           
0.00027000 |       0.00007600
CoordBase       | Register a GH extension to store the coo|           
0.00028900 |       0.00012800
PUGH            | Startup routine                         |           
0.00028100 |       0.00010300
IOUtil          | Startup routine                         |           
0.00034000 |       0.00013900
MoL             | Startup banner                          |           
0.00033100 |       0.00012000
IOBasic         | Startup routine                         |           
0.00029000 |       0.00011300
PUGHInterp      | PUGHInterp startup routine              |           
0.00042200 |       0.00026700
PUGHReduce      | Startup routine                         |           
0.00175100 |       0.00134000
------------------------------------------------------------------------ 
---------------------------
                 | Total time for CCTK_STARTUP             |           
0.00451900 |       0.00249300
------------------------------------------------------------------------ 
---------------------------
CartGrid3D      | Register coordinates for the Cartesian g|           
0.00315000 |       0.00232500
MoL             | Set up the MoL bookkeeping index arrays |           
0.00183200 |       0.00050000
BSSN_MoL        | Register evolution variables for BSSN   |           
0.02648300 |       0.02166700
MoL             | Report how many of each type of variable|           
0.01363000 |       0.00185900
------------------------------------------------------------------------ 
---------------------------
                 | Total time for CCTK_WRAGH               |           
0.04509500 |       0.02635100
------------------------------------------------------------------------ 
---------------------------
ADMBase         | Check consistency of parameters         |           
0.00103300 |       0.00075500
ADMConstraints  | Check that we can deal with this metric_|           
0.00263600 |       0.00137400
BSSN_MoL        | Check the metric type and conformal stat|           
0.04588400 |       0.03757400
Boundary        | Check dimension of grid variables       |           
0.00034800 |       0.00018700
CartGrid3D      | Check coordinates for CartGrid3D        |           
0.00035000 |       0.00015600
IDSimple        | Construct parameters for simple initial |           
0.00451400 |       0.00146800
MoL             | Basic parameter checking                |           
0.00047900 |       0.00035000
------------------------------------------------------------------------ 
---------------------------
                 | Total time for CCTK_PARAMCHECK          |           
0.05524400 |       0.04186400
------------------------------------------------------------------------ 
---------------------------
ADMBase         | Set the shift_state variable to 1       |           
0.00111100 |       0.00076100
ADMBase         | Set up GF symmetries                    |           
0.00233700 |       0.00181900
ADMConstraints  | Register GF symmetries for ADM Constrain|           
0.00684200 |       0.00542400
ADMMacros       | Initialize the local_spatial_order      |           
0.00125400 |       0.00018000
BSSN_MoL        | Set symmetries for ADM_BSSN grid functio|           
0.00821900 |       0.00661200
Boundary        | Register boundary conditions that this t|           
0.00055200 |       0.00030500
CartGrid3D      | Set up spatial 3D Cartesian coordinates |           
0.33083000 |       0.26644000
CoordGauge      | Initialize slicing, setup priorities for|           
0.00049900 |       0.00025900
PUGH            | Report on PUGH set up                   |           
0.00338600 |       0.00052300
StaticConformal | Set the conformal_state variable to 0   |           
0.00082500 |       0.00059800
Time            | Initialise Time variables               |           
0.00033200 |       0.00022500
Time            | Set timestep based on Courant condition |           
0.00185500 |       0.00024300
------------------------------------------------------------------------ 
---------------------------
                 | Total time for CCTK_BASEGRID            |           
0.35804200 |       0.28338900
------------------------------------------------------------------------ 
---------------------------
IDSimple        | Construct initial data for (conformal) M|           
0.21684700 |       0.17187000
ADMBase         | Set the lapse to 1 at all points        |           
0.09755700 |       0.08066300
ADMBase         | Set the shift to 0 at all points        |           
0.10267300 |       0.08431300
IOBasic         | Initialisation routine                  |           
0.00029800 |       0.00009300
------------------------------------------------------------------------ 
---------------------------
                 | Total time for CCTK_INITIAL             |           
0.41737500 |       0.33693900
------------------------------------------------------------------------ 
---------------------------
ADMConstraints  | Set conformal_state to 0 for physical, o|           
0.00168900 |       0.00125900
BSSN_MoL        | Initialize ADM_BSSN system              |           
0.89401800 |       0.74063000
BSSN_MoL        | Trace A removal                         |           
0.33600700 |       0.31932300
BSSN_MoL        | Boundary enforcement for BSSN           |           
0.07182400 |       0.05724400
Boundary        | Apply all requested local physical bound|           
9.51307300 |       7.81301400
Boundary        | Unselect all grid variables for boundary|           
0.00535800 |       0.00259400
BSSN_MoL        | Updates the standard ADM variables      |           
0.18774900 |       0.17696400
BSSN_MoL        | Updates the time derivative of the lapse|           
0.01288600 |       0.00858200
------------------------------------------------------------------------ 
---------------------------
                 | Total time for CCTK_POSTINITIAL         |          
11.02260400 |       9.11961000
------------------------------------------------------------------------ 
---------------------------
CoordGauge      | Identify the slicing for the next iterat|           
0.00145600 |       0.00063600
------------------------------------------------------------------------ 
---------------------------
                 | Total time for CCTK_PRESTEP             |           
0.00145600 |       0.00063600
------------------------------------------------------------------------ 
---------------------------
MoL             | Set the counter for the ODE method to lo|           
0.00235600 |       0.00150000
MoL             | Ensure the correct time and timestep are|           
0.00218200 |       0.00126200
BSSN_MoL        | Update slicing condition                |           
0.00553300 |       0.00331600
MoL             | Ensure the data is in the correct timele|           
0.06825300 |       0.06501500
MoL             | Initialise the RHS functions            |           
0.13097700 |       0.12267500
BSSN_MoL        | Calculate the update term for BSSN      |         
463.31695000 |     378.60876100
MoL             | Updates calculated with the efficient IC|           
0.42503600 |       0.34695300
MoL             | Alter the counter number                |           
0.00744500 |       0.00402500
MoL             | If necessary, change the time           |           
0.00704000 |       0.00400600
BSSN_MoL        | Trace A removal                         |           
0.33600700 |       0.31932300
BSSN_MoL        | Boundary enforcement for BSSN           |           
0.07182400 |       0.05724400
Boundary        | Apply all requested local physical bound|           
9.51307300 |       7.81301400
Boundary        | Unselect all grid variables for boundary|           
0.00535800 |       0.00259400
BSSN_MoL        | Updates the standard ADM variables      |           
0.18774900 |       0.17696400
BSSN_MoL        | Updates the time derivative of the lapse|           
0.01288600 |       0.00858200
MoL             | If necessary, change the timestep       |           
0.00719400 |       0.00433200
MoL             | Restoring the Save and Restore variables|           
0.00208200 |       0.00124700
------------------------------------------------------------------------ 
---------------------------
                 | Total time for CCTK_EVOL                |         
474.10194500 |     387.54081300
------------------------------------------------------------------------ 
---------------------------
ADMConstraints  | Evaluate ADM constraints                |           
0.00000000 |       0.00000000
Boundary        | Apply all requested local physical bound|           
9.51307300 |       7.81301400
Boundary        | Unselect all grid variables for boundary|           
0.00535800 |       0.00259400
BSSN_MoL        | Calculate the BSSN constraints          |           
0.00000000 |       0.00000000
Boundary        | Apply all requested local physical bound|           
9.51307300 |       7.81301400
Boundary        | Unselect all grid variables for boundary|           
0.00535800 |       0.00259400
------------------------------------------------------------------------ 
---------------------------
                 | Total time for CCTK_ANALYSIS            |          
19.03686200 |      15.63121600
------------------------------------------------------------------------ 
---------------------------
MoL             | Free the MoL bookkeeping index arrays   |           
0.00035100 |       0.00013900
PUGH            | Print time spent in communication       |           
0.00345000 |       0.00052500
PUGH            | Termination routine                     |           
0.01570900 |       0.01298200
------------------------------------------------------------------------ 
---------------------------
                 | Total time for CCTK_TERMINATE           |           
0.01951000 |       0.01364600
------------------------------------------------------------------------ 
---------------------------
------------------------------------------------------------------------ 
---------------------------
                 | Total time for simulation               |         
476.78597900 |     389.25786400
------------------------------------------------------------------------ 
---------------------------




More information about the Developers mailing list