Buffer cache monitor checklist

The following table summarizes the most common items to look for in monitor results, and suggests actions you may need to take if behavior is outside the normal range. The Statistic column lists the name you see in the standard monitor reports; if this statistic appears differently in the debug report, the debug statistic is also listed.

Remember that for any monitor statistic, a temporary anomaly may occur while the system changes state, such as when a new query is starting.

Table 5-11: Buffer cache monitor checklist

Statistic

Normal behavior

Behavior that needs adjusting

Recommended action

HR% (Cache hit rate)

Above 90%.

For individual internal data structures like garray, barray, bitmap (bm), hash object, sort object, variable-length btree (btreev), fixed-length btree (btreef), bit vector (bv), dbext, dbid, vdo, store, checkpoint block (ckpt), the hit rate should be above 90% while a query runs. It may be below 90% at first. Once prefetch starts working (PF or PrefetchReqs > 0), the hit rate should gradually grow to above 90%.

Hit rate below 90% after prefetch is working.

NoteSome objects do not do prefetching, so their hit rate may be low normally.

Try rebalancing the cache sizes of main versus temp by adjusting -iqmc and -iqtc.

Also try increasing the number of prefetch threads by adjusting PREFETCH_THREADS_PERCENT option.

Gdirty (Grabbed Dirty)

0 in a system with a modest cache size (< 10GB).

GDirty > 0

NoteSweeper threads are activated only when the number of dirty pages reaches a certain percentage of the wash area. If GDirty/GrabbedDirty is above 0 and the I/O rate (Writes) is low, the system may simply be lightly loaded, and no action is necessary.

Adjust SWEEPER_THREADS_PERCENT option (default 10%) or WASH_AREA_ BUFFERS_PERCENT option (default 20%) to increase the size of the wash area.

BWaits (Buffer Busy Waits)

0

Persistently > 0, indicating that multiple jobs are colliding over the same buffers.

If the I/O rate (Writes) is high, Busy Waits may be caused by cache thrashing. Check Hit Rate in the cache report to see if you need to rebalance main versus temp cache.

If a batch job is starting a number of nearly identical queries at the same time, try staggering the start times.

LRU Waits (LRUNum TimeOuts percentage in debug report)

20% or less

> 20%, which indicates a serious contention problem.

Check the operating system patch level and other environment settings. This problem tends to be an O.S. issue.

IOWait (IONumWaits)

10% or lower

> 10%

Check for disk errors or I/O retries

FLWait (FLMutexWaits)

20% or lower

> 20%

Check the dbspace configuration:

Is the database almost out of space?

Is DISK_STRIPING ON?

Does sp_iqcheckdb report fragmentation greater than 15%?

HTWait (BmapHTNumWaits)

MemWts (MemNtimesWaited)

(PFMgrCondVarWaits)

10% or lower

> 10%

Contact Sybase Technical Support.

CPU time (CPU Sys Seconds, CPU Total Seconds, in debug report)

CPU Sys Seconds < 20%

CPU Sys Seconds > 20%

If CPU Total Seconds also reports LOW utilization, and there are enough jobs that the system is busy, the cache may be thrashing or parallelism may be lost.

Adjust -iqgovern to reduce allowed total number of concurrent queries.

Check Hit Rate and I/O Rates in the cache report for cache thrashing. Also check if hash object is thrashing by looking at the hit rate of the has object in cache_by_type (or debug) report: is it <90% while the I/O rate (Writes) is high?

Check query plans for attempted parallelism. Were enough threads available?

Does the system have a very large number of CPUs? Strategies such as multiplex configuration may be necessary.

InUse% (Buffers in use)

At or near 100% except during startup

Less than about 100%

The buffer cache may be too large.

Try rebalancing the cache sizes of main versus temp by adjusting -iqmc and -iqtc.

Pin% (Pinned buffers)

< 90%

> 90 to 95%, indicating system is dangerously close to an Out of Buffers condition, which would cause transactions to roll back

Try rebalancing the cache sizes of main versus temp.

If rebalancing buffer cache sizes is not possible, try reducing -iqgovern to limit the number of jobs running concurrently.

Free threads (ThrNumFree)

Free > Resrvd

If the number of free threads drops to the reserved count, the system may be thread starved.

Try one of the following:

Increase the number of threads by setting -iqmt.

Reduce thread-related options: MAX_IQ_THREADS_ PER_CONNECTION, MAX_IQ_THREADS_ PER_TEAM.

Restrict query engine resource allocations by setting USER_RESOURCE_ RESERVATION.

Limit the number of jobs by setting -iqgovern.

FlOutOfSpace (debug only)

0, indicating that the free list for this store is not full; unallocated pages are available

1, indicating that this store (main or temporary) is fully allocated

Add more dbspace to that store