[Developers] Re: [CactusMaint] Profiling of Bench_BSSN_PUGH benchmark (Long mail).
goodale at cct.lsu.edu
Wed Mar 15 12:37:14 CST 2006
On Wed, 15 Mar 2006, John Shalf wrote:
> On Mar 15, 2006, at 9:42 AM, Erik Schnetter wrote:
>> On Mar 15, 2006, at 18:34:40, Tom Goodale wrote:
>>> On Wed, 15 Mar 2006, Erik Schnetter wrote:
>>> You will also reduce the TLB misses this way as the number of distinct
>>> pages will be less. This can have a very significant performance benefit.
>> TLB misses and cache misses are very similar. With marching planes, both
>> the original arrays and the marching planes array needs to be accessed;
>> this leads to more TLB misses altogether.
> For some reason, the TLB miss rates went down on the O2k (dim recollection on
> my part) when the marching planes optimization was applied. I'm not certain
My thinking is that without the planes you get TLB misses several times
during the calculation loop when accessing any particular piece of data
from memory, whereas with the planes you only get a TLB miss when you load
the page into the TLB to do the data copy into the plane. If the data for
the plane data is small enough, there should be very few (or none) TLB
misses when accessing the planes.
More information about the Developers