Tune Reorg/Update stat/skew

1. Reorg is stuck & not able to kill it. System is Hung
2. Update statistics taking longer time -- system very slow
3. Data skew up
4. parallesion benfits
5. Pros and cons between APL and DOL
============================================================

Reorg is stuck & not able to kill it. System is Hung

verison Adaptive Server Enterprise/15.0.3/EBF 17775 ESD#4/P/RS6000/AIX 5.3/ase1503/2768/64-bit/FBO

identified the below:
   
   
1. Had run the reorg rebuild on one table, usually this take 4 - 5 minutes to complete but this process went into sleeping mode.
2. After few hours the log segment become full and database went into log suspend mode
3. Had seen that reorg process was in syslogshold and in sleeping state, so kill could not take effect on reorg rebuild
4. Had increased some more space to logsegment assuming that process will go to rollback but this did not happen
5. Finally  shutdown the server and restarted


01:00000:00235:2012/10/17 13:05:51.26 server Error: 632, Severity: 20, State: 2
server Attempt to move memory with an incorrect length of 25600. Maximum allowed length is 16384


05:00000:00016:2012/12/17 13:02:40.33 server Undo pass for database 'xxxxx': 30489719 records done (64%); 16417545 records left.
05:00000:00016:2012/12/17 13:02:41.48 server Undo pass for database 'xxxb': 32835082 records done (69%); 14072182 records left.
01:00000:00235:2012/12/17 13:03:51.22 server DBCC TRACEON 3604, SPID 235
01:00000:00016:2012/12/17 13:04:34.19 server Undo pass for database 'xxxxb': 35180445 records done (74%); 11726819 records left.
01:00000:00235:2012/12/17 13:05:51.26 server Error: 632, Severity: 20, State: 2
01:00000:00235:2012/12/17 13:05:51.26 server Attempt to move memory with an incorrect length of -1. Maximum allowed length is 16384.


1> dbcc showrecovery(XXXXXXb)
2> go

RECOVERY STATISTICS FOR DATABASE 6

General:
Spid: 16
Command: Boot/Failover
Type: Boot Time
Pass: Undo
------------
-----------------
Recovery Log Markers:
Checkpoint Log Record: (59695797, 3)
First Log Record: (58807387, 10)
Last Log Record: (59754796, 22)

% of requests for log pages found in cache: 100.0
% of requests for log pages physically read: 0.00
Msg 632, Level 20, State 2:
Server 'XXXXXX_PROD', Line 1:
Attempt to move memory with an incorrect length of -1. Maximum allowed length is 16384.


What we did:-

1) Turn off statement caching
2) Ran dbcc indexalloc
3) Found error, we drop and create the index
4) Ran reorg on the table




======================================================================

Update statistics taking longer time -- system very slow

Situation:-

3 times a day, apps team is deleting 10 - 15 million rows and bcp in 10 - 15 million rows and then run update stats. the table is having 170+ million rows.
Table contains unique cluster index on col1,col2,col3,col4. col1 has more duplicate rows compare to col2,col3 and col4


Observed:-1) update stat is in sleep stat 2) sysmon outputs are showing 90% CPU utilization

What We did:-

1) refresh the prod data to test server
2) used Large i/o cache
3) used 16K i/o
4) increase procedure cache
5) Increase "number of sort buffers"
6) Ran update stat
7) it went fine in a short time
8) took the statistics from test
9) ftp the test server statistics to prod and load it as below..

optdiag binary statistics <dbname>..<tablename> -U<user> -P password -o<filename>

---->ftp

optdiag binary statistics <dbname>..<tablename> -U<user> -P password -i<filename>

======================================================================================================

Suggestion #1: we need to implement Large I/O cache. The size of default data cache also affects the "update statistics" performance.
You can gain performance by setting up large I/O pool. You can create a 16k I/O pool.


Suggestion #2. Ensure that you have enough Procedure Cache to supply what Update Stats wants AND what the rest of the system wants


Suggestion #3: Increase the value of "number of sort buffers" as your memory resources will support. Even though it eats procedure cache, if you have enough for everything, the more the better. Configuring more sort buffers increases the batch size, and makes Update Statistics run faster.
(NOTE: sort buffers use 2k I/O, so make sure your 2k pool I/O size is adequate)

A substantial amount of the time needed to process an update statistics is due to processing the sorts. So, ASE allows these sorts to be performed in parallel to help speed this process. To enable this, you have to configure: "max parallel degree" and "number of worker processes". Note that this parallel sorting can be achieved even if there is only one engine.


Suggestion #4: Enable parallelism such that parallel sorts can take place for update stats command.
use update statistics <table name> with consumers = <number_of_worker_processes>


Suggestion #5: There is one other factor that impacts updating statistics. It is related to the "enable housekeeper GC" configuration parameter. Basically, the more aggressive the Housekeeper task is allowed to be, the more it will be able to do periodic statistic updating during idle cycles. So, if your system can handle it, consider setting this parameter to either 4 or 5, instead of 1.


Suggestion #6: Optdiag Utility: Displays optimizer stats or loads updated statistics into system tables. One use of optdiag is to load statistics. Typically you would have a test server that you would run update stats on, then you would run optdiag to output stats to a file. You would then run optdiag to input the new stats to the production server.


If you want to load the statistics from the source database to the target database,
You can collect the optdiag output as:
optdiag binary statistics <dbname>..<tablename> -U<user> -P password -o<filename>

Ftp this file to the target server and modify the target server & dbname and then upload using -i option as:
optdiag binary statistics <dbname>..<tablename> -U<user> -P password -i<filename>

=========================================================================
Data Skew up:

 example:
In the table test_skew, col1 has 2 values (1 and 2) while col2 has 199 distinct values (numbers 1 through 199).
The col1 has data skew as there are a high number of duplicates.
1> select count(distinct(col1)), count(distinct(col2)) from test_skew
2> go
 ----------- -----------
 2 199
(1 row affected)
We can see this in the optdiag output. See the notes regarding the "Total density" values imbedded in the output below:

OptDiag/12.5.4/EBF 13381/P/Sun_svr4/OS 5.8/ase1254/2006/64-bit/OPT/Sat May 20 02:14:49 2006
Adaptive Server Enterprise/12.5.4/EBF 15400 ESD#7.1/P/Sun_svr4/OS 5.8/ase1254/2097/64-bit/FBO/Thu Jan 17 07:42:59 2008
Server name: "ASE_1254"
Specified database: "tempdb"
Specified table owner: not specified
Specified table: "test_skew"
Specified column: not specified
Table owner: "dbo"
Table name: "test_skew"
Statistics for table: "test_skew"
 Data page count: 3
 Empty data page count: 0
 Data row count: 398.0000000000000000
 Forwarded row count: 0.0000000000000000
 Deleted row count: 0.0000000000000000
 Data page CR count: 0.0000000000000000
 OAM + allocation page count: 2
 First extent data pages: 0
 Data row size: 10.0000000000000000
 Derived statistics:
 Data page cluster ratio: 0.9999900000000000
 Space utilization: 0.6580687830687830
 Large I/O efficiency: 1.0000000000000000
Statistics for index: "idx1" (nonclustered)
Index column list: "col1"
 Leaf count: 3
 Empty leaf page count: 0
 Data page CR count: 1.0000000000000000
 Index page CR count: 1.0000000000000000
 Data row CR count: 3.0000000000000000
 First extent leaf pages: 0
 Leaf row size: 11.0000000000000000
 Index height: 1
 Derived statistics:
 Data page cluster ratio: 1.0000000000000000
 Index page cluster ratio: 1.0000000000000000
 Data row cluster ratio: 1.0000000000000000
 Space utilization: 0.7238756613756614
 Large I/O efficiency: 1.0000000000000000
Statistics for index: "idx2" (nonclustered)
Index column list: "col2"
 Leaf count: 3
 Empty leaf page count: 0
 Data page CR count: 1.0000000000000000
 Index page CR count: 1.0000000000000000
 Data row CR count: 398.0000000000000000
 First extent leaf pages: 0
 Leaf row size: 11.0000000000000000
 Index height: 1
 Derived statistics:
 Data page cluster ratio: 1.0000000000000000
 Index page cluster ratio: 1.0000000000000000
 Data row cluster ratio: 0.0000000000000000
 Space utilization: 0.7238756613756614
 Large I/O efficiency: 1.0000000000000000
Statistics for column: "col1"
Last update of column statistics: Jun 11 2009 9:46:40:950AM
 Range cell density: 0.0000000000000000 <--- when "Range cell density" and "Total density" difference is above 0.1 it indicate data was skewed up.
 Total density: 0.5000000000000000 <--- this indicates data skew
 Range selectivity: default used (0.33)
In between selectivity: default used (0.25)
Histogram for column: "col1"
Column datatype: integer
Requested step count: 20
Actual step count: 3
Sampling Percent: 0
 Step Weight Value
 1 0.00000000 < 1
 2 0.50000000 = 1
 3 0.50000000 <= 2
Statistics for column: "col2"
Last update of column statistics: Jun 11 2009 9:46:55:953AM
 Range cell density: 0.0050251256281407 <---- these values are equal,
 Total density: 0.0050251256281407 <---- so there is no data skew -
 Range selectivity: default used (0.33) the data is evenly distributed
 In between selectivity: default used (0.25)
Histogram for column: "col2"
Column datatype: integer
Requested step count: 20
Actual step count: 20
Sampling Percent: 0
 Step Weight Value
 1 0.00000000 <= 0
 2 0.05025126 <= 10
 3 0.05025126 <= 20
 4 0.05025126 <= 30
 5 0.05025126 <= 40
 6 0.05025126 <= 50
 7 0.05025126 <= 60
 8 0.05025126 <= 70
 9 0.05025126 <= 80
 10 0.05025126 <= 90
 11 0.05025126 <= 100
 12 0.05025126 <= 110
 13 0.05025126 <= 120
 14 0.05025126 <= 130
 15 0.05025126 <= 140
 16 0.05025126 <= 150
 17 0.05025126 <= 160
 18 0.05025126 <= 170
 19 0.05025126 <= 180
 20 0.09547739 <= 199
Optdiag succeeded.

use the following method to overcome:
-----------------------------
1> sp_modifystats test_skew, col1, REMOVE_SKEW_FROM_DENSITY
2> go
(return status = 0)

unix% optdiag statistics tempdb..test_skew -o /tmp/test_skew_remove_skew.opt -Usa -Psybase2
unix% more /tmp/test_skew_remove_skew.opt
OptDiag/12.5.4/EBF 13381/P/Sun_svr4/OS 5.8/ase1254/2006/64-bit/OPT/Sat May 20 02:14:49 2006
Adaptive Server Enterprise/12.5.4/EBF 15400 ESD#7.1/P/Sun_svr4/OS 5.8/ase1254/2097/64-bit/FBO/Thu Jan 17 07:42:59 2008
Server name: "ASE_1254"
Specified database: "tempdb"
Specified table owner: not specified
Specified table: "test_skew"
Specified column: not specified
Table owner: "dbo"
Table name: "test_skew"
Statistics for table: "test_skew"
 Data page count: 3
 Empty data page count: 0
 Data row count: 398.0000000000000000
 Forwarded row count: 0.0000000000000000
 Deleted row count: 0.0000000000000000
 Data page CR count: 0.0000000000000000
 OAM + allocation page count: 2
 First extent data pages: 0
 Data row size: 10.0000000000000000
 Derived statistics:
 Data page cluster ratio: 0.9999900000000000
 Space utilization: 0.6580687830687830
 Large I/O efficiency: 1.0000000000000000
Statistics for index: "idx1" (nonclustered)
Index column list: "col1"
 Leaf count: 3
 Empty leaf page count: 0
 Data page CR count: 1.0000000000000000
 Index page CR count: 1.0000000000000000
 Data row CR count: 3.0000000000000000
 First extent leaf pages: 0
 Leaf row size: 11.0000000000000000
 Index height: 1
 Derived statistics:
 Data page cluster ratio: 1.0000000000000000
 Index page cluster ratio: 1.0000000000000000
 Data row cluster ratio: 1.0000000000000000
 Space utilization: 0.7238756613756614
 Large I/O efficiency: 1.0000000000000000
Statistics for index: "idx2" (nonclustered)
Index column list: "col2"
 Leaf count: 3
 Empty leaf page count: 0
 Data page CR count: 1.0000000000000000
 Index page CR count: 1.0000000000000000
 Data row CR count: 398.0000000000000000
 First extent leaf pages: 0
 Leaf row size: 11.0000000000000000
 Index height: 1
 Derived statistics:
 Data page cluster ratio: 1.0000000000000000
 Index page cluster ratio: 1.0000000000000000
 Data row cluster ratio: 0.0000000000000000
 Space utilization: 0.7238756613756614
 Large I/O efficiency: 1.0000000000000000
Statistics for column: "col1"
Last update of column statistics: Jun 11 2009 9:46:40:950AM
Statistics loaded from Optdiag.
 Range cell density: 0.0000000000000000 <--- after issuing - sp_modifystats test_skew, col1, REMOVE_SKEW_FROM_DENSITY
 Total density: 0.0000000000000000 <--- "Total denisty" has been changed to equal "Range cell density"
 Range selectivity: default used (0.33)
In between selectivity: default used (0.25)
Histogram for column: "col1"
Column datatype: integer
Requested step count: 20
Actual step count: 3
Sampling Percent: 0
 Step Weight Value
 1 0.00000000 < 1
 2 0.50000000 = 1
 3 0.50000000 <= 2
Statistics for column: "col2"
Last update of column statistics: Jun 11 2009 9:46:55:953AM
Statistics loaded from Optdiag.
 Range cell density: 0.0050251256281407
 Total density: 0.0050251256281407
 Range selectivity: default used (0.33)
 In between selectivity: default used (0.25)
Histogram for column: "col2"
Column datatype: integer
Requested step count: 20
Actual step count: 20
Sampling Percent: 0
 Step Weight Value
 1 0.00000000 <= 0
 2 0.05025126 <= 10
 3 0.05025126 <= 20
 4 0.05025126 <= 30
 5 0.05025126 <= 40
 6 0.05025126 <= 50
 7 0.05025126 <= 60
 8 0.05025126 <= 70
 9 0.05025126 <= 80
 10 0.05025126 <= 90
 11 0.05025126 <= 100
 12 0.05025126 <= 110
 13 0.05025126 <= 120
 14 0.05025126 <= 130
 15 0.05025126 <= 140
 16 0.05025126 <= 150
 17 0.05025126 <= 160
 18 0.05025126 <= 170
 19 0.05025126 <= 180
 20 0.09547739 <= 199

http://en.it-usenet.org/thread/19967/18585/
http://benohead.com/sybase-find-and-delete-duplicates-records/ ===============================================================

parallesion benfits:


select statements:-

1) select large number of pages
2) required sorting(order by,group by,distinct)
3) perform large joins

maintenance utilities:-

dbcc checkstorage
create index statements
update statistics
reorg

parallel scan can occure pages 20 more than
it will take lowest response time not lowest total work

                                                         parallel scan phases

Acces  phases
  1. exe process in parallel
  2. create buckets for each process

Merge phase
  1. merge the buckets to one set
  2. send the results to user
 =================================================================


Pros and cons between APL and DOL

Allpages Locking: Pros and Cons

----------------------------------------------------

*Advantages

- relatively simple locking scheme

- only one update transaction on a page at a time

- write locks held for the duration of the user's transaction

- few locks taken even if many rows modified

- result of page-level versus row-level locking

- simplifies recovery; only one transaction can modify a page at a time

*Disadvantages

- index and data page locks decrease concurrency

- index and data page splits mean more page locks

- index and heap table hot spots

- deadlock potential (most significant disadvantage)



Datapages Locking: Pros and Cons

--------------------------------------------------------

*Advantages

- no transaction-duration locks on index pages

- the single biggest improvement in data-only locking

- few locks taken even if many rows modified

- no more index hot spots

*Disadvantages

- page locks decrease concurrency

- may still see some data page hot spots, especially in heap tables, but there is much less contention than before



Datarows Locking: Pros and Cons

------------------------------------------------------

*Advantages

- no transaction-duration locks taken on index pages or rows

- row lock granularity gives greatest concurrency in most cases

- no more index or data page hot spots

*Disadvantages

- row locks will cause larger number of locks to be taken:

- mitigated by the fact that index locks are not taken



Data Only Locking Space Management Issues

--------------------------------------------------------------------------

- greater space overhead because

- page header size increased

- minimum row size increased

- table size may grow because

- pages can become fragmented with chunks of non-contiguous free space

- rows may be deleted but space on the page is left unused


- clustered indexes may become unclustered over time