I’ve been working on a project where Oracle suddenly stopped using a perfectly valid index. It is a unique index, so if you run a query for 20 different key values, there will be (at most) 20 rows. But instead of using this index, Oracle started doing a full table scan on the table, but it has a few hundred million rows, so it was game over for the application and it was totally incomprehensible. I am 95% developer and 5% DBA, I know my way around Oracle, SQL and simple optimizations, but I couldn’t find any explanation for this issue, so I had to start digging deeper and deeper. I learned a lot about Oracle’s internals until I finally found the solution. Instead of just writing about the end result, I decided to tell my whole story from the very beginning. Not only was it a very interesting investigation for me, but it can be useful to anyone in a similar situation.
Every row in my table has a varchar key (other than the primary key), which tends to be quite long. The first part of the key is highly repetitive among rows, then it is followed by an @ sign and a number, and in the end the key is unique. (In this example, the letters are repeated, but the keys are not repetitive in that sense in reality of course.) Some key first parts appear in only a few rows, and others in tens of thousands of rows. I’ll give you an example table script, some repetitive data that mimics the key structure and distribution, and let’s gather some statistics right away. (Note that I’m using Oracle 11g, and I’m doing my best to give you reproducible SQL’s but YMMV.)
CREATE TABLE mytab ( id NUMBER(20) NOT NULL, PRIMARY KEY(id), mykey nvarchar2(50) NOT NULL, mydata DATE ); CREATE UNIQUE INDEX ux_mytab_mykey ON mytab (mykey); -- 'connect by level <= N' generates N rows on the fly INSERT INTO mytab (id, mykey, mydata) SELECT 000000 + level, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa@' || level, DATE '2001-01-01' + level FROM dual CONNECT BY level <= 75000; INSERT INTO mytab (id, mykey, mydata) SELECT 100000 + level, 'cccccccccccccccccccccccccccccccccccccccc@' || level, DATE '2002-01-01' + level FROM dual CONNECT BY level <= 1000; INSERT INTO mytab (id, mykey, mydata) SELECT 200000 + level, 'eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee@' || level, DATE '2003-01-01' + level FROM dual CONNECT BY level <= 15000; INSERT INTO mytab (id, mykey, mydata) SELECT 300000 + level, 'oooooooooooooooooooooooooooooooooooooooo@' || level, DATE '2004-01-01' + level FROM dual CONNECT BY level <= 1000; INSERT INTO mytab (id, mykey, mydata) SELECT 400000 + level, 'uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu@' || level, DATE '2005-01-01' + level FROM dual CONNECT BY level <= 45000; commit; BEGIN dbms_stats.gather_table_stats(USER, 'mytab', cascade=>TRUE); END; /
There’s nothing to explain here, it is just perfect. All the estimated numbers are correct, the index is being used, let’s move on.
How it goes wrong
Now we could start using our table for profit and whatnot, and wait to see what happens. Or rather, let’s speed things up and do something crazy to make it all wrong. The following is a very contrived/artificial/accidental way of reproducing the issue. As you will see later, there are more direct methods to reproduce (and also to avoid) it. But it can also just happen on its own, as time goes by, as it did in my case. For me at least, this seems to do the trick every time:
SELECT COUNT(1) FROM mytab WHERE mykey LIKE 'eeeeeeeeeeee%'; BEGIN dbms_stats.gather_table_stats(USER, 'mytab', cascade=>TRUE); END; /
Whoah, all hell breaks loose! Refer to the title: why wouldn’t Oracle use a perfectly valid index? This is the problem I was faced with and I was baffled. Oracle would still use the index when querying for only one key value, but only then and not above one.
As you can see, there’s the very first trick to use when something seems slow: index hints. With the index hint, Oracle tells me what it thinks about using the index, and apparently it thinks that it’s more expensive than a full table scan, but that’s completely wrong.
Cost based optimization
Let’s take a step back and see what Oracle (and pretty much every other RDBMS) does. Usually there are many ways to execute a given SQL query. Oracle has to find the most optimal way of executing it. It uses statistics, estimations, heuristics, guesses, etc. Every step in the query needs a certain amount of CPU processing and disk I/O (cost) and produces a certain amount of rows (cardinality). You can read a whole lot about it here for example, but here are the basics in my own words.
What is cost in simple terms? Disk I/O is expensive, CPU processing is cheap. Usually, the more blocks a query has to read, the slower it will become. A full table scan reads through all of the data blocks of the table. If there is an index that tells you where you can find certain rows that match your criteria, then it is possible that you have to read only a fraction of the blocks, and that’s good for you. But don’t forget that the index is also stored on the disk. If you have to read a lot of data blocks from the table anyway, then an index can actually make it worse: first because you have to read many data blocks plus many index blocks, second because if the table data is not ordered the right way, then the index makes you read the data blocks in a scattered order, and that’s usually slower than a sequential read.
What is cardinality in simple terms? Simply, the number of matching rows after a given query step. Let’s say you have 20 cars, 5 red, 5 green, 5 blue, 5 yellow. When Oracle gathers statistics, it finds out that there are 20 rows in total and 4 unique colors. When you run a query and you want to find all cars of a given color, then Oracle expects the cardinality to be 20/4=5. In this example, it would be exactly right. If the distribution wasn’t uniform, a simple guess wouldn’t be dead on, but it can still be pretty good. In my big table, the key column is not null and unique, so every row has a unique value. 137000 rows, 137000 distinct values, thus the cardinality for any given key value is 1 (or 0 if it doesn’t exist, but we try to measure the worst case scenario).
In the first image, Oracle estimates that one key will yield one row, two keys will yield two rows, and it needs to read just a few index and data blocks from the disk. This is a good plan. In the second image, the calculations are way off. Oracle estimates there are 2208 matching rows for 2 key values (this number may vary a bit randomly for you). There are actually 1000 rows that start with ooo…, and if that’s the best guess you have, then it can make a little sense to think that 2 keys will yield around 2000 rows, 3 keys will yield 3000, and so on, but of course it’s far from the truth in our case. The cost also seems to be bigger with the index than without: there’s the overhead of reading the index blocks plus reading the data blocks in a scattered order. Generally you can just run a part of your query and see how many rows it actually produces, and this way you can verify how good or bad Oracle’s estimations are. We won’t do that now because we know how bad the situation is.
The usual solution: statistics
Oracle uses quite simple statistics to do its calculations. You can verify yourself if you read from the USER_TABLES and USER_INDEXES views. When you modify a lot of data, statistics may become out of date. They may be refreshed automatically by a background job, or you can do it yourself. We did it already above, but there are a few more tricks here. You can see for yourself if you look at the documentation of DBMS_STATS.
During statistics gathering, Oracle uses sampling, meaning that it doesn’t read the entire table, but only a fraction of it, and it extrapolates from that. In my case, block level sampling would be a lot worse than row level sampling, because key values are highly correlated, but it’s not the default, so it cannot be the problem. Another important parameter is the sampling percentage. Let’s see what happens if I force it to be 100%, making it 100% accurate:
BEGIN dbms_stats.gather_table_stats(USER, 'mytab', cascade=>TRUE, estimate_percent=>100); END; /
The plan isn’t a lot better, but if you look closely, it actually became more accurate. Now the cardinality estimations are exactly number of key values times 1000. It’s still stupid in this case, but at least we learned that we can tweak the statistics gathering if the automatic settings yield bad results.
The actual solution: histograms
Then I asked myself the question: how does Oracle estimate that there are 1000 rows that start with ooo…? After a lot of searching, I found the answer: histograms. At first I didn’t even know what they were. We can read them from the USER_HISTOGRAMS view, but actually I needed an extra piece of SQL to populate the last column:
analyze TABLE mytab compute statistics FOR COLUMNS mykey;
This says exactly that 75000 rows start with aaa…, 1000 start with bbb…, etc. It is cool, isn’t it? The statistics mentioned above (number of rows, distinct values, etc.) are quite simple, but they cannot give good estimations when the data distribution is far from uniform. That’s why histograms were invented, they contain a few (not necessarily all) key values and their cardinalities.
Two things to note here. First, by default, they are generated based on not only the data distribution, but also the workload. That contrived query above caused a workload which caused histograms to be created. There were no histograms before that query. If you couldn’t reproduce the issue before, then now I can tell you that you can explicitly create or delete histograms with the method_opt parameter of the statistics gathering procedures. Second, the histogram values are limited to 32 bytes (note: in a double-wide national character set, that’s only 16 characters). Our data was modeled after real life, and it just happened to be more than 32 characters long.
Now the puzzle is complete. As you can see, histograms can come and go as Oracle sees fit, and one day they just appeared. When we were querying for more than one key value, Oracle stopped using the simple statistics (which would have yielded a cardinality of exactly 1 for each key value) and started using the histogram instead. I do believe that generally this works fine, my case is too special. Sadly, the histogram values were cut off before the numbered part in the keys, and the estimations blew up. Run a full table scan on a few hundred million rows, and a 20 millisecond query suddenly takes 20 minutes.
Generally, you could adjust the histograms the way you see fit because they can be awesome. But it won’t work for keys like in this example because they are long, unevenly distributed and their first 32 characters are not distinctive enough. So my solution was quite simple, and turned out to be very effective too: ditch the histograms and make sure they don’t come back. That blog post talks about a case that is very similar to mine, too bad I didn’t find it earlier, but it was more fun this way .