In one of my databases, one application query suddenly started to pick Nested-Loop joins instead of Hash-Joins and took almost 6 hours to complete which gets completed in less than 10 secs with Hash-Joins.
The same query in another similar database with same configuration and same data is doing fine and using hash joins. There is no difference in data/stats/OS/init parameter etc. (Though I know that no two databases are same)
About the query:
— It is a simple select statement which selects data from a complex view.
— The view comprises of 5 different tables, four of which have more than 15K rows and one have less then 50 rows.
— Statistics are up-to-date in both databases.
I can see the optimizer behavior using 10053 event for the next run of this query but want to know what else can be checked to know why the plan changed suddenly, in this case, before using 10053 event.
Your valuable inputs on this!!!!
This Post Has 10 Comments
Have a look on PGA usage, but not sure about direct impact on Join methods because of PGA usage.
But the pga usage does not impact the hash joins, if the hash table is not able to fit in the memory, optimizer will break it into partitions and move the partitions into disk.
Conceptually, it should not cause change in optimizer behavior.
I am curious if you find anything. Having almost identical issues, CBO chose MJ vs HJ.
Hope you have checked the global stats on those tables
Verify DBA_TAB_COL_STATISTICS in both database for those tables to get information on histograms.
select owner,job_name,schedule_name,enabled,state,last_start_date from DBA_SCHEDULER_JOBS
@Ganesh: The global stats were up-to-date for both the databases, though I did not check the col stats.
But the automated gather_stats_job ran next day as its schedule and after that the optimizer picked the correct plan..
So it was still unknown why the optimizer changed its behavior suddenly???
This post explain how to get Actual number of rows from the explain plain.
Try to compare A-rows with the Estimated number of rows to spot the “breaking point” in the explain plain.
Yes, it can be tried, but will it show the results when even selecting count(*) from this view is not coming out?
Hi Saurabh, we are having a similar problem. Using Hash takes a couple of minutes.
In a DB copy to another environment the same query uses nested loops and takes an hour.
What kind of stats were run to fix the problem?
You can try exporting table stats from good env and import into instance running query with bad plan. You can also use sql profiles to do same. You can use http://askdba.org/weblog/2011/12/plan-stability-using-sql-profiles-and-sql-plan-management/ for sql profile usage