Checking column usage information using dbms_stats

Oracle 11gR2 DBMS_STATS introduced useful function report_col_usage to report column usage i.e if column is being used for equality,like predicates. I came to know it while reading Optimizer whitepaper .As per document, DBMS_STATS.REPORT_COL_USAGE reports column usage information and records all the SQL operations the database has processed for a given object.

This information is used by Oracle gather stats job to decide whether to collect histograms and also number of buckets to be used.So if we specify method_opt =>’FOR ALL COLUMS SIZE AUTO’, oracle stats job will make use of this information. A column is a candidate for a histogram if it has been seen in a where clause predicate, e.g., an equality, range, LIKE, etc.Column usage tracking is enabled by default.

Apart from this, I see two more benefits

a)It can be used to see if all indexed column are being used. If not we can get rid of any extra indexes.

b) In case you are using method_opt =>’FOR ALL INDEXED COLUMNS SIZE AUTO’ in your stats collection script, it can tell additional columns which are being used in where clause. This is really important as this method_opt option does not capture statistics on non-indexed columns. Result can be wrong cardinality estimates and choice of wrong join methods. You can read detailed explanation on Greg Rahn’s post

To see this action I am using EMP,DEPT,SALGRADE table. You can find the scripts for creating these tables on scribd .

Since these are newly created tables, there was no column usage stats. Running the function confirms same

SELECT DBMS_STATS.REPORT_COL_USAGE('AMIT','EMP') FROM DUAL;
LEGEND:
 .......
EQ : Used in single table EQuality predicate
 RANGE : Used in single table RANGE predicate
 LIKE : Used in single table LIKE predicate
 NULL : Used in single table is (not) NULL predicate
 EQ_JOIN : Used in EQuality JOIN predicate
 NONEQ_JOIN : Used in NON EQuality JOIN predicate
 FILTER : Used in single table FILTER predicate
 JOIN : Used in JOIN predicate
 GROUP_BY : Used in GROUP BY expression
 ...............................................................................
###############################################################################
COLUMN USAGE REPORT FOR AMIT.EMP
 ................................
###############################################################################

I ran following set of queries once

select * from emp where empno=7782;
 select ename,job,dname from emp,dept where emp.deptno=dept.deptno;
 select e.ename,e.sal,s.grade from emp e,salgrade s where e.sal between s.losal and s.hisal;

Querying again gives following output now

###############################################################################
COLUMN USAGE REPORT FOR AMIT.EMP
 ................................
1. DEPTNO : EQ_JOIN
 2. EMPNO : EQ
 3. SAL : NONEQ_JOIN
 ###############################################################################

Ran two queries with like and range predicates

select * from emp where ename like 'A%';
 select * from emp where sal >3000;

You can see that ENAME column shows that we have like in query predicate and for SAL a range comparison has been made.

###############################################################################
COLUMN USAGE REPORT FOR AMIT.EMP
 ................................
1. DEPTNO : EQ_JOIN
 2. EMPNO : EQ
 3. ENAME : LIKE
 4. SAL : RANGE NONEQ_JOIN
 ###############################################################################

You can reset the column usage information using following procedure

exec dbms_stats.reset_col_usage('AMIT','EMP')

Note that this deletes the recorded column usage information from dictionary which can have impact on Stats job, so be careful while you are deleting it. There are two more procedures which can help to seed column usage information from other database.

SEED_COL_USAGE – This procedure iterates over the SQL statements in the specified SQL tuning set, compiles them and seeds column usage information for the columns that appear in these statements.

Syntax
DBMS_STATS.SEED_COL_USAGE (sqlset_name IN VARCHAR2, owner_name IN VARCHAR2, time_limit IN POSITIVE DEFAULT NULL);

MERGE_COL_USAGE – This procedure merges column usage information from a source database by means of a dblink into the local database. If column usage information already exists for a given table or column MERGE_COL_USAGE will combine both the local and the remote information.

Syntax
DBMS_STATS.MERGE_COL_USAGE (dblink IN VARCHAR2);

Installing Adobe Flash Player using Yum on Linux

I have been using firefox on a linux desktop since last week. Lot of sites on internet are using flash player on it , so everytime I hit the site, it reported missing plugin. When I tried to install the plugin through firefox, it errored out and asked me to install manually. Finally today I decided to install it using manual approach. To do so I went to http://www.adobe.com/go/getflashplayer to download the required binaries.

Surprisingly today I noticed a yum based install for it. I thought of  going with yum as it takes care of resolving dependency. On selecting yum based rpm, it downloaded adobe-release-x86_64-1.0-1.noarch.rpm. This rpm needs to be installed

 rpm -ivh adobe-release-x86_64-1.0-1.noarch.rpm
warning: adobe-release-x86_64-1.0-1.noarch.rpm: Header V3 DSA signature: NOKEY, key ID f6777c67
Preparing...                ########################################### [100%]
   1:adobe-release-x86_64   ########################################### [100%]

Next step is to use yum to install the flash-player. Before that I did search to check the plugin using yum search flash-plugin

flash-plugin.i386 : Adobe Flash Player 10.3 i386
flash-plugin.i386 : Adobe Flash Player 10.3 i386
flash-plugin.i386 : Adobe Flash Player 10.3 i386
flash-plugin.i386 : Adobe Flash Player 10.3 i386
flash-plugin.i386 : Adobe Flash Player 10.3 i386
flash-plugin.i386 : Adobe Flash Player 10.3 i386
flash-plugin.i386 : Adobe Flash Player 10.3 i386
flash-plugin.x86_64 : Adobe Flash Player 11.2
flash-plugin.i386 : Adobe Flash Player 10.3 i386
flash-plugin.i386 : Adobe Flash Player 10.2 i386
flash-plugin.i386 : Adobe Flash Player 10.3 i386

Finally installed it using yum install flash-plugin and restarted firefox

Parsing package install arguments
Resolving Dependencies
--> Running transaction check
---> Package flash-plugin.x86_64 0:11.2.202.233-release set to be updated
---> Package flash-plugin.i386 0:10.3.183.19-1.el5 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

=============================================================================
 Package                 Arch       Version          Repository        Size
=============================================================================
Installing:
 flash-plugin            x86_64     11.2.202.233-release  adobe-linux-x86_64  6.9 M
 flash-plugin            i386       10.3.183.19-1.el5  updates           4.9 M

Transaction Summary
=============================================================================
Install      2 Package(s)
Update       0 Package(s)
Remove       0 Package(s)         

Total download size: 12 M
Is this ok [y/N]: y
Downloading Packages:
(1/2): flash-plugin-10.3.183.19-1.el5.i386.rpm                                                                | 4.9 MB     00:07
(2/2): flash-plugin-11.2.202.233-release.x86_64.rpm                                                           | 6.9 MB     00:48
warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID f6777c67
Importing GPG key 0xF6777C67 "Adobe Systems Incorporated (Linux RPM Signing Key) " from /etc/pki/rpm-gpg/RPM-GPG-KEY-adobe-linux
Is this ok [y/N]: y
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing: flash-plugin                 ######################### [1/2]
  Installing: flash-plugin                 ######################### [2/2] 

Installed: flash-plugin.x86_64 0:11.2.202.233-release flash-plugin.i386 0:10.3.183.19-1.el5
Complete!

Using current_scn for RMAN incremental ? Think again..

I am bloging about experience we had using rman incremental backup for syncing standby database.

We had a standby database which lagged by considerable time and we decided to use rman incremental database to make it current. We have done this in past and used normal procedure of taking current_scn from V$database on standby and took rman incremental backup from this scn on primary database.

When we did the recovery it finished successfully but on starting MRP  it still required old archive log.
We tried manual recovery and still it was failing and asking for same old archive. This was strange and I ran following queries to find out the SCN

SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
 -----------
 10641805448174

Checking the minimum checkpoint_change# for V$datafile reported following SCN which was less then current_SCN.

SQL> select min(checkpoint_change#) from V$datafile;
 MIN(CHECKPOINT_CHANGE#)
 ------------------------------
 10641804610367

At this time I decided to check first_change# for the archive required by standby

select first_change# from GV$ARCHIVED_LOG where THREAD#=2 and SEQUENCE#=699;
 FIRST_CHANGE#
 ------------------------------
 10618920961649

This value was way less and was different then the initial current_scn which we used to take rman incremental backup. Checking on My Oracle Support , found note 836986.1 which recommended running following query

SQL> select min(fhscn) from x$kcvfh;
MIN(FHSCN)
 ----------------
 10618920961649

This matches with the scn from V$archived_log. Note recommends to take lower value of x$kcvfh and V$database.
So we took backup again with this SCN from primary. Since we have RAC database, we used channels on both instance

run
 {
 allocate channel c1 device type disk format '/VOL301/rmanbackup/stby_PROD_%U' connect 'sys/password@PROD1';
 allocate channel c2 device type disk format '/VOL301/rmanbackup/stby_PROD_%U' connect 'sys/password@PROD1';
 allocate channel c3 device type disk format '/VOL301/rmanbackup/stby_PROD_%U' connect 'sys/password@PROD2';
 allocate channel c4 device type disk format '/VOL301/rmanbackup/stby_PROD_%U' connect 'sys/password@PROD2';
 backup incremental from scn 10618920961649 database format '/VOL301/rmanbackup/stby_PROD_%U' tag 'Tag_INCR_PROD';
 }

We copied the backups and also restored the controlfile and cataloged the backups. Restoring now did the trick

run
 {
 allocate channel c1 device type disk ;
 allocate channel c2 device type disk ;
 allocate channel c3 device type disk ;
 allocate channel c4 device type disk ;
 recover database noredo;
 }

I think x$kcvfh should also be part of documentation as it wil help people check both scn before proceeding. Anyways this seems to be some special case as we have used this rman incremental approach lot of times and faced this first time.
If anyone has clue on what caused this, then I would be really interested to know it.

Poll on Sql Plan Management

Dominic is conducting poll on SPM and Sql Profiles usage on his website. Link can be found here 

I have been using SPM and SQL Profiles (using coe_xfr_profile.sql) to fix plans for queries  and believe me its very easy and quick way of fixing problems in production database. I have not yet used SPM to baseline everything and only using it on need basis for fixing sql with bad plans. I have been reading about  Automatic SQL tuning advisor in 11g and would be  trying to use it and  see the recommendations proposed by it (will keep accept_sql_profiles to false 🙂 )

Oracle Direct connector for HDFS

Some time back oracle had announced Oracle Big Data appliance. Along with this Oracle released Big Data Connectors to facilitate data access between data stored in a Hadoop cluster and Oracle Database. I tried installing Oracle Direct Connector for HDFS (Hadoop distributed file system) and documenting steps below.

To use Oracle Direct Connector we need to install hadoop software on the system where Oracle Database is running. We access HDFS using preprocessor feature of external table (available in 11g)
You can download the software (from edelivery.oracle.com) and unzip it to directory. This provides a file hdfs_stream which will be used in preprocessor clause

To get started , we need to create directory for external table file location and oracle direct connector bin directory

SQL> CREATE OR REPLACE DIRECTORY hadoop_test_dir as '/home/oracle/ext_table';
SQL> CREATE OR REPLACE DIRECTORY hdfs_bin_path AS '/home/oracle/product/orahdfs-1.0.0.0.0/bin';

I will be using scott user and we need to provide following permissions to it

GRANT CREATE SESSION TO scott;
 GRANT EXECUTE ON SYS.UTL_FILE TO scott;
 GRANT READ, WRITE on DIRECTORY hadoop_test_dir TO scott;
 GRANT EXECUTE ON DIRECTORY hdfs_bin_path TO scott;

Create the external table

drop TABLE "SCOTT"."TEST_DATA_EXT_TAB" ;
CREATE TABLE "SCOTT"."TEST_DATA_EXT_TAB"
 (OBJECT_ID NUMBER,
 OBJECT_NAME VARCHAR2(128),
 OBJECT_TYPE VARCHAR2(19)
 )
 ORGANIZATION EXTERNAL
 ( TYPE ORACLE_LOADER
 DEFAULT DIRECTORY HADOOP_TEST_DIR
 ACCESS PARAMETERS
 ( RECORDS DELIMITED BY NEWLINE
 PREPROCESSOR HDFS_BIN_PATH:'hdfs_stream'
 FIELDS TERMINATED BY ','
 )
 LOCATION ( 'test_data1.txt','test_data2.txt','test_data3.txt','test_data4.txt')
 );

Selecting from table gives error as no file is present

select count(*) from "SCOTT"."TEST_DATA_EXT_TAB"
 *
 ERROR at line 1:
 ORA-29913: error in executing ODCIEXTTABLEOPEN callout
 ORA-29400: data cartridge error
 KUP-04040: file test_data1.txt in HADOOP_TEST_DIR not found

Create a configuration files with location of files on hdfs and database connection string. This will be used to populate the dummy external table files

[oracle@oradbdev01]~/product/orahdfs-1.0.0.0.0/log% cat ../myconf.xml
<?xml version="1.0"?>
 <configuration>
 <property>
 <name>oracle.hadoop.hdfs.exttab.tableName</name>
 <value>SCOTT.TEST_DATA_EXT_TAB</value>
 </property>
 <property>
 <name>oracle.hadoop.hdfs.exttab.datasetPaths</name>
 <value>/user/oracle/TEST_DATA/part*</value>
 </property>
 <property>
 <name>oracle.hadoop.hdfs.exttab.connection.url</name>
 <value>
 jdbc:oracle:thin:@oradbdev01:1521:tintin1
 </value>
 </property>
 <property>
 <name>oracle.hadoop.hdfs.exttab.connection.user</name>
 <value>SCOTT</value>
 </property>
</configuration>

Set HADOOP_CLASSPATH in hadoop_env.sh to include $ORACLE_HOME/jdbc/lib/ojdbc6.jar

export CLASSPATH=$CLASSPATH:${OLH_HOME}/jlib/oraloader.jar:${OLH_HOME}/jlib/ojdbc6.jar:${OLH_HOME}/jlib/oraclepki.jar

Run the ExternalTable command as below to create the files for external table. It will prompt for password for database user mentioned in conf file

[oracle@oradbdev01]~/product/orahdfs-1.0.0.0.0/jlib% hadoop jar orahdfs.jar oracle.hadoop.hdfs.exttab.ExternalTable -conf ../myconf.xml -publish
 [Enter Database Password:]
 Oracle Direct HDFS Release 1.0.0.0.0 - Production
Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.
SCOTT.TEST_DATA_EXT_TAB publish operation succeeded
Details:
HADOOP_TEST_DIR:test_data2.txt file created with the following paths
 hdfs://oradbdev03:9000/user/oracle/TEST_DATA/part-m-00000
HADOOP_TEST_DIR:test_data1.txt file created with the following paths
 hdfs://oradbdev03:9000/user/oracle/TEST_DATA/part-m-00001
HADOOP_TEST_DIR:test_data3.txt file created with the following paths
 hdfs://oradbdev03:9000/user/oracle/TEST_DATA/part-m-00002
HADOOP_TEST_DIR:test_data4.txt file created with the following paths
 hdfs://oradbdev03:9000/user/oracle/TEST_DATA/part-m-00003

test_data*.txt files will be populated by above data.

Now let’s select from the table

SQL> select /*+ parallel (a,4) */ count(*) from "SCOTT"."TEST_DATA_EXT_TAB" a;
COUNT(*)
 ----------
 153450496
Elapsed: 00:02:13.69

As you can see we can also use parallel hint in query.

Oracle documentation had mentioned setting following parameter in conf file

<property>
 <name>oracle.hadoop.hdfs.exttab.datasetCompressionCodec</name>
 <value>org.apache.hadoop.io.compress.DefaultCodec</value>
 </property>

But setting this causes no rows to be returned and following error is recorded in logfile

cat directhdfs-log_13387-0126121327579379.log
 12/01/26 12:03:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 Error reading data from hdfs://oradbdev03:9000/user/oracle/TEST_DATA/part-m-00002: unknown compression method

Solution is to unset CompressionCodec parameter.

[oracle@oradbdev01]~/product/orahdfs-1.0.0.0.0/log% cat /home/oracle/ext_table/test_data1.txt
 CompressionCodec=
 hdfs://oradbdev03:9000/user/oracle/TEST_DATA/part-m-00001

You need to set this parameter only if the hdfs files are compressed. In case files are compressed (mostly we have compressed files on hadoop) we need to use appropriate codec.e.g for gz format use org.apache.hadoop.io.compress.GzipCodec