by-nc-sa     개발자, DBA가 함께 만들어가는 구루비 지식창고!

통계정보 조작




I. 통계정보 조작

I.1. 개념 설명

  • 통계정보를 조작하여 Cardinality를 제어하는 기법
  • Cardinality는 Selectivity에 의해 결정되고, Selectivity는 주로 Density에 의해 결정됨
  • 따라서, Density를 조작할 수 있으면 Cardinality를 조작할 수 있으며 Histogram 또한 마찬가지임

I.2. 테스트 Object 생성

DROP TABLE T1 PURGE;
DROP TABLE T2 PURGE;

CREATE TABLE T1(ID1 INT, STATUS1 CHAR(5));
CREATE TABLE T2(ID2 INT);

CREATE INDEX T1_IDX ON T1(ID1);
CREATE INDEX T2_IDX ON T2(ID2);

INSERT INTO T1
SELECT -- column id1 is uniform
       MOD(R, 1000),
       -- column status1 is skewed!!!
       CASE WHEN R BETWEEN 1 AND 9000 THEN '-1'
            ELSE TO_CHAR(MOD(R, 1000)) END
FROM   (SELECT LEVEL AS R
        FROM   DUAL
        CONNECT BY LEVEL <= 10000)
ORDER BY DBMS_RANDOM.RANDOM
;

INSERT INTO T2
SELECT MOD(R, 100)
FROM   (SELECT LEVEL AS R
        FROM   DUAL
        CONNECT BY LEVEL <= 10000)
;

BEGIN
    DBMS_STATS.GATHER_TABLE_STATS(USER,
                                  'T1',
                                  CASCADE          => TRUE,
                                  ESTIMATE_PERCENT => 100,
                                  METHOD_OPT       => 'FOR ALL COLUMNS SIZE 1',
                                  NO_INVALIDATE    => FALSE
                                  );
END;
/

BEGIN
    DBMS_STATS.GATHER_TABLE_STATS(USER,
                                  'T2',
                                  CASCADE          => TRUE,
                                  ESTIMATE_PERCENT => 100,
                                  METHOD_OPT       => 'FOR ALL COLUMNS SIZE 1',
                                  NO_INVALIDATE    => FALSE
                                  );
END;
/

I.3. 통계정보 확인


-- T1 테이블의 컬럼 통계정보
SELECT S.TABLE_NAME,
       S.COLUMN_NAME,
       S.NUM_DISTINCT,
       S.NUM_NULLS,
       S.DENSITY,
       S.LOW_VALUE,
       S.HIGH_VALUE,
       S.HISTOGRAM
FROM   USER_TAB_COLS S
WHERE  S.TABLE_NAME = UPPER('T1')
;
TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY LOW_VALUE HIGH_VALUE HISTOGRAM
T1 STATUS1 1001 0 .000999001 2D31202020 3939392020 NONE
T1 ID1 1001 0 .001 80 C20A64 NONE

I.4. 테스트

SELECT /*+ GATHER_PLAN_STATISTICS */
       COUNT(*)
FROM   T1 T1,
       T2 T2
WHERE  T1.ID1 = T2.ID2
AND    T1.ID1 BETWEEN 1 
              AND     100
AND    T1.STATUS1 = '-1'
;

  COUNT(*)          
----------          
     89100          

1 row selected.
Elapsed: 00:00:00.12

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'IOSTATS COST LAST'));
              
-----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.10 |    1221 |
|   2 |   NESTED LOOPS      |        |      1 |    101 |    10   (0)|  89100 |00:00:00.18 |    1221 |
|*  3 |    TABLE ACCESS FULL| T1     |      1 |      1 |     9   (0)|    900 |00:00:00.01 |      31 |
|*  4 |    INDEX RANGE SCAN | T2_IDX |    900 |    100 |     1   (0)|  89100 |00:00:00.10 |    1190 |
-----------------------------------------------------------------------------------------------------
                                                                                                     
Predicate Information (identified by operation id):                                                  
---------------------------------------------------                                                  
                                                                                                     
   3 - filter(("T1"."ID1"<=100 AND "T1"."STATUS1"='-1' AND "T1"."ID1">=1))                           
   4 - access("T1"."ID1"="T2"."ID2")                                                                 
       filter(("T2"."ID2">=1 AND "T2"."ID2"<=100))                                                   
  • 'T1' 테이블에 -1 값이 9000로우가 있음에도 불구하고 통계정보에서는 이를 모르기 때문에 E-Rows를 1로 계산을 했다.
    만약 정확히 9000로우를 알고 있었다면 Hash Join을 사용했을 것이다.
    이를 해결하기 위해서는 히스토그램 생성과 통계정보 조작 2가지 방법이 있다.

I.5. 히스토그램 생성

1) 히스토그램 생성 전

SELECT TABLE_NAME,
       COLUMN_NAME,
       ENDPOINT_NUMBER,
       ENDPOINT_VALUE || '(' || ENDPOINT_ACTUAL_VALUE || ')' AS ENDPOINT_VALUE
FROM   USER_TAB_HISTOGRAMS
WHERE  TABLE_NAME = UPPER('T1')
ORDER  BY COLUMN_NAME,
          ENDPOINT_NUMBER
;
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
T1 ID1 0 0()
T1 ID1 1 999()
T1 STATUS1 0 234649741948204000000000000000000000()
T1 STATUS1 1 297121544231514000000000000000000000()

2) 히스토그램 생성

BEGIN
    DBMS_STATS.GATHER_TABLE_STATS(USER,
                                  'T1',
                                  ESTIMATE_PERCENT => 100,
                                  CASCADE          => TRUE,
                                  METHOD_OPT       => 'FOR COLUMNS STATUS1 SIZE SKEWONLY',
                                  NO_INVALIDATE    => FALSE);
END;
/

3) 히스토그램 생성 후

SELECT TABLE_NAME,
       COLUMN_NAME,
       ENDPOINT_NUMBER,
       ENDPOINT_VALUE || '(' || ENDPOINT_ACTUAL_VALUE || ')' AS ENDPOINT_VALUE
FROM   USER_TAB_HISTOGRAMS
WHERE  TABLE_NAME = UPPER('T1')
ORDER  BY COLUMN_NAME,
          ENDPOINT_NUMBER
;
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
T1 ID1 0 0()
T1 ID1 1 999()
T1 STATUS1 228 234649741948204000000000000000000000()
T1 STATUS1 229 255400627676803000000000000000000000()
T1 STATUS1 230 255481361174405000000000000000000000()
T1 STATUS1 231 255562173900169000000000000000000000()
T1 STATUS1 232 260612890032291000000000000000000000()
T1 STATUS1 233 260693702758056000000000000000000000()
T1 STATUS1 234 260755025355842000000000000000000000()
T1 STATUS1 235 265825231615942000000000000000000000()
T1 STATUS1 236 265886554213728000000000000000000000()
T1 STATUS1 237 265967366939493000000000000000000000()
T1 STATUS1 238 271018083071615000000000000000000000()
T1 STATUS1 239 271098895797379000000000000000000000()
T1 STATUS1 240 276167834406879000000000000000000000()
T1 STATUS1 241 276230424655266000000000000000000000()
T1 STATUS1 242 276311158152868000000000000000000000()
T1 STATUS1 243 281361953513152000000000000000000000()
T1 STATUS1 244 281442687010754000000000000000000000()
T1 STATUS1 245 281504088836703000000000000000000000()
T1 STATUS1 246 286574215868640000000000000000000000()
T1 STATUS1 247 286653840171967000000000000000000000()
T1 STATUS1 248 286716351192191000000000000000000000()
T1 STATUS1 249 291786557452291000000000000000000000()
T1 STATUS1 250 291847880050077000000000000000000000()
T1 STATUS1 251 291928692775842000000000000000000000()
T1 STATUS1 252 296979408907964000000000000000000000()
T1 STATUS1 253 297060221633728000000000000000000000()
T1 STATUS1 254 297121544231514000000000000000000000()

4) 플랜 다시 확인

SELECT /*+ GATHER_PLAN_STATISTICS */
       COUNT(*)
FROM   T1 T1,
       T2 T2
WHERE  T1.ID1 = T2.ID2
AND    T1.ID1 BETWEEN 1 
              AND     100
AND    T1.STATUS1 = '-1'
;

  COUNT(*)           
----------           
     89100           

1 row selected.
Elapsed: 00:00:00.07

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'IOSTATS COST LAST'));
          
--------------------------------------------------------------------------------------------------- 
| Id  | Operation           | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | 
--------------------------------------------------------------------------------------------------- 
|   1 |  SORT AGGREGATE     |      |      1 |      1 |            |      1 |00:00:00.04 |      62 | 
|*  2 |   HASH JOIN         |      |      1 |  14779 |    19   (6)|  89100 |00:00:00.01 |      62 | 
|*  3 |    TABLE ACCESS FULL| T1   |      1 |    908 |     9   (0)|    900 |00:00:00.01 |      31 | 
|*  4 |    TABLE ACCESS FULL| T2   |      1 |   9999 |     9   (0)|   9900 |00:00:00.01 |      31 | 
--------------------------------------------------------------------------------------------------- 
                                                                                                    
Predicate Information (identified by operation id):                                                 
---------------------------------------------------                                                 
                                                                                                    
   2 - access("T1"."ID1"="T2"."ID2")                                                                
   3 - filter(("T1"."ID1"<=100 AND "T1"."STATUS1"='-1' AND "T1"."ID1">=1))                          
   4 - filter(("T2"."ID2">=1 AND "T2"."ID2"<=100))                                                  

I.6. 통계정보 변경

1) Density 조작 전

SELECT S.TABLE_NAME,
       S.COLUMN_NAME,
       S.NUM_DISTINCT,
       S.NUM_NULLS,
       S.DENSITY,
       S.LOW_VALUE,
       S.HIGH_VALUE,
       S.HISTOGRAM
FROM   USER_TAB_COLS S
WHERE  S.TABLE_NAME = UPPER('T1')
;
TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY LOW_VALUE HIGH_VALUE HISTOGRAM
T1 STATUS1 1001 0 .000999001 2D31202020 3939392020 NONE
T1 ID1 1001 0 .001 80 C20A64 NONE

2) Density 조작

BEGIN
    DBMS_STATS.SET_COLUMN_STATS(OWNNAME => USER,                       
                                TABNAME => 'T1',                         
                                COLNAME => 'STATUS1',                      
                                DENSITY => 1
                                );
END;
/

3) Density 조작 후

SELECT S.TABLE_NAME,
       S.COLUMN_NAME,
       S.NUM_DISTINCT,
       S.NUM_NULLS,
       S.DENSITY,
       S.LOW_VALUE,
       S.HIGH_VALUE,
       S.HISTOGRAM
FROM   USER_TAB_COLS S
WHERE  S.TABLE_NAME = UPPER('T1')
;
TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY LOW_VALUE HIGH_VALUE HISTOGRAM
T1 STATUS1 1001 0 1 2D31202020 3939392020 NONE
T1 ID1 1001 0 .001 80 C20A64 NONE

4) 플랜 다시 확인

SELECT /*+ GATHER_PLAN_STATISTICS */
       COUNT(*)
FROM   T1 T1,
       T2 T2
WHERE  T1.ID1 = T2.ID2
AND    T1.ID1 BETWEEN 1 
              AND     100
AND    T1.STATUS1 = '-1'
;

  COUNT(*)           
----------           
     89100           

1 row selected.
Elapsed: 00:00:00.07

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'IOSTATS COST LAST'));
          
---------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE     |      |      1 |      1 |            |      1 |00:00:00.04 |      62 |
|*  2 |   HASH JOIN         |      |      1 |  15410 |    19   (6)|  89100 |00:00:00.01 |      62 |
|*  3 |    TABLE ACCESS FULL| T1   |      1 |   1011 |     9   (0)|    900 |00:00:00.01 |      31 |
|*  4 |    TABLE ACCESS FULL| T2   |      1 |   9999 |     9   (0)|   9900 |00:00:00.01 |      31 |
---------------------------------------------------------------------------------------------------
                                                                                                   
Predicate Information (identified by operation id):                                                
---------------------------------------------------                                                
                                                                                                   
   2 - access("T1"."ID1"="T2"."ID2")                                                               
   3 - filter(("T1"."ID1"<=100 AND "T1"."ID1">=1 AND "T1"."STATUS1"='-1'))                         
   4 - filter(("T2"."ID2">=1 AND "T2"."ID2"<=100))                                                 
;

I.7. 통계정보 조작 시 주의할 점

  • 변화가 전역적이다. 통계 정보를 참조하는 Query가 여러 개 존재할 수 있으며,
    통계정보의 변경은 이들 Query 모두에 대해 영향을 미칠 수 있다.
  • 통계 정보에 대한 지식이 필수적이다.
  • Oracle 10g 부터는 통계 정보를 변경해도 그 결과가 Query에 즉시 반영되지 않는다.
    DBMS_STATS.GATHER_XXX_STTS Procedure의 NO_INVALIDATE Parameter 때문이다.
    Oracle 9i까지는 FALSE, 즉 통계 정보 변경시 Query가 모두 Invalidate 된다.
    하지만 Oracle 10g 부터는 AUTO로 변경되었다. 이 경우 Oracle은 통계 정보 변경 후
    5시간(_OPTIMIZER_INVALIDATION_PERIOD Parameter)에 걸쳐 서서히 Query들이 Invalidation 되게끔 제어한다.
    만일 즉시 반영을 원한다면 NO_INVALIDATE Parameter의 값을 FALSE로 변경해야 한다.
    (급격한 hard parsing의 가능성)

문서정보

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.