Thursday, July 26, 2007

The Oracle Cost Based Optimizer: A Webinar

Earlier today, I attended a webinar by Hotsos. The webinar, titled Cost Based Optimizer: 1 of 2, was a basic introduction to the Cost Based Optimizer (CBO) in Oracle and the 10053 trace file. The CBO optimizes SQL queries and that optimization can be seen in a 10053 trace. It was not, what I would call, an advanced webinar but was done with plenty of examples and was very accessible for new comers to performance tuning. The webinar was given by Hotsos employee, Mahesh Vallanpati.

Because the webinar was filled with plenty of examples, it doesn't lend itself to a blog post. I'm pretty sure they said an archive of the webinar would be available on the Hotsos site. I did take some notes though so here are a few things that do lend themselves to a blog post. If any of this is wrong, it is probably a typo on my part.

The CBO is a complex, mathematical based optimizer. The CBO was introduced in Oracle 8, was improved in 9 and replaced the RBO completely in 10. While the CBO is driven by statistics many things affect it:

  • Database Parameters

  • Database Statistics

  • CPU and I/O Stats

  • DB Schema Configuration

  • Stored Outlines

  • The SQL Code Itself

  • The Oracle Query Cost Model (based on DB version)

Statistical data points collected by the CBO include (you can get this from the columns in stats data dictionary views):

  • Table Statistics

    • Number of rows

    • Number of blocks below the high water mark

    • Empty blocks

    • Average free space

    • Row length
  • Column Statistics

    • Distinct Values (very important)

    • Density

    • Low Value

    • High Value

    • Histogram values if they exist

    Note: If you use a default date rather than null, make sure your date is not way off base as that may skew the optimizer

  • Index Statistics

    • Clustering

    • Depth of index

    • Leaf blocks

    • Distinct Values (very important)

    • Other averages

  • System Statistics

    • Average Block Read sizes (multiblock and singleblock)

    • CPU Speed

    • I/O Speed

    • Table Statistics

    • Table Statistics

Note: The frequency and % of stats collected should not be a drag on the system. You need to evaluate just what % makes the most sense as well as how frequently you need to collect them.

Less logical I/O is not always better. In the session, Mahesh went over an example where, due to the dbfile_multi_block_read_count setting of 16, a full table scan with a LIO of 722 was actually chosen by the optimizer in lieu of an index scan with a LIO of 166. That means that in a full table scan, Oracle was reading 16 blocks at a time. By reading that much data at once, Oracle was actually getting more data faster than it would have by reading the index sequentially. Had the parameter been set at 8k, the index might have had better performance. Very nice detail.

One good tip that Mahesh provided is: The first step of your statistics gathering should be to backup your old stats. That way you can recover if something goes wrong.

Finally, Mahesh talked about a hint that I did not know existed, the CARDINALITY hint. Using the cardinality hint in a dev or test environment, you can test scalability when the amount of data in your tables grows. The hint tells Oracle that you have a different number of rows than you really have. You can run explain plans with various values and see how that impacts your plan.