[Developers] Re: [CactusMaint] Profiling of Bench_BSSN_PUGH benchmark (Long mail).

Tom Goodale goodale at cct.lsu.edu
Wed Mar 15 12:37:14 CST 2006


On Wed, 15 Mar 2006, John Shalf wrote:

>
> On Mar 15, 2006, at 9:42 AM, Erik Schnetter wrote:
>
>> On Mar 15, 2006, at 18:34:40, Tom Goodale wrote:
>> 
>>> On Wed, 15 Mar 2006, Erik Schnetter wrote:
>>> 
>>> You will also reduce the TLB misses this way as the number of distinct 
>>> pages will be less.  This can have a very significant performance benefit.
>> 
>> TLB misses and cache misses are very similar.  With marching planes, both 
>> the original arrays and the marching planes array needs to be accessed; 
>> this leads to more TLB misses altogether.
>
> For some reason, the TLB miss rates went down on the O2k (dim recollection on 
> my part) when the marching planes optimization was applied.  I'm not certain 
> why.

My thinking is that without the planes you get TLB misses several times 
during the calculation loop when accessing any particular piece of data 
from memory, whereas with the planes you only get a TLB miss when you load 
the page into the TLB to do the data copy into the plane.  If the data for 
the plane data is small enough, there should be very few (or none) TLB 
misses when accessing the planes.

Cheers,

Tom



More information about the Developers mailing list