[Developers] Re: [CactusMaint] Profiling of Bench_BSSN_PUGH benchmark (Long mail).

Tom Goodale goodale at cct.lsu.edu
Wed Mar 15 11:34:40 CST 2006

On Wed, 15 Mar 2006, Erik Schnetter wrote:

> On Mar 15, 2006, at 12:52:57, David Rideout wrote:
>> Actually I found some nice benchmark data generated by Thomas
>> Schweizer, which is enclosed.
> I don't know whether the main argument why marching planes should improve the 
> performance is correct.  The paper states that the auxiliary array (the 
> marching planes) is already in the cache, hence there should be fewer cache 
> misses.  However, the data has to be moved into that array, and that requires 
> accessing the original arrays, so this alone cannot explain anything.  If 
> anything, it should increase the cache footprint, since data are loaded and 
> stored to two different memory locations.
> I rather think that the real reason why this improves performance is cache 
> colouring (as was mentioned before), or that copying arrays triggers a memory 
> streaming unit (such as exist on the T3E) which leads to faster access to the 
> main memory.

You can set it up so there is less cache-line aliasing, and thus fewer 
cache misses, when accessing the planes.  Copying the data into the new 
arrays is an expensive operation, but will be optimised by prefetches with 
any decent compiler.

You will also reduce the TLB misses this way as the number of distinct 
pages will be less.  This can have a very significant performance benefit.

> The alignment of grid functions in PUGH to improve their location in the 
> cache is disabled in the benchmark.  It can be enabled by using the option 
> "PUGH::padding_active=yes", plus making sure that the cache size is detected 
> or set correctly at configuration time.

or at run time with the appropriate parameters.


More information about the Developers mailing list