[Developers] Re: [CactusMaint] Profiling of Bench_BSSN_PUGH benchmark (Long mail).

John Shalf jshalf at lbl.gov
Wed Mar 15 12:25:34 CST 2006


On Mar 15, 2006, at 9:42 AM, Erik Schnetter wrote:

> On Mar 15, 2006, at 18:34:40, Tom Goodale wrote:
>
>> On Wed, 15 Mar 2006, Erik Schnetter wrote:
>>
>>> I rather think that the real reason why this improves performance  
>>> is cache colouring (as was mentioned before), or that copying  
>>> arrays triggers a memory streaming unit (such as exist on the  
>>> T3E) which leads to faster access to the main memory.
>>
>> You can set it up so there is less cache-line aliasing, and thus  
>> fewer cache misses, when accessing the planes.  Copying the data  
>> into the new arrays is an expensive operation, but will be  
>> optimised by prefetches with any decent compiler.
>
> In principle, there should also be prefetching without the marching  
> planes, although the loops may be too complicated for the compiler,  
> so that it gives up and doesn't optimise.

Some modern micros depend on filters to sense streams and engage  
hardware prefetch (no explicitly inserted software prefetches). There  
are a limited number of filters available in that case.  For software  
prefetch, compilers usually depend on software pipelining in  
conjunction with the compiler-inserted prefetching (eg. the Itanium  
2).  However, with this complicated a loop body, the compiler will  
refuse to employ software pipelining, so it is unlikely the compiler  
will insert prefetches once the SWP optimization is dumped.

>> You will also reduce the TLB misses this way as the number of  
>> distinct pages will be less.  This can have a very significant  
>> performance benefit.
>
> TLB misses and cache misses are very similar.  With marching  
> planes, both the original arrays and the marching planes array  
> needs to be accessed; this leads to more TLB misses altogether.

For some reason, the TLB miss rates went down on the O2k (dim  
recollection on my part) when the marching planes optimization was  
applied.  I'm not certain why.

-john





More information about the Developers mailing list