tisdag 10 december 2013

Range access: now in an IN predicate near you.

Several users have reported that certain queries with IN predicates can't use index scans even though all the columns in the query are indexed. What's worse, if you reformulate your query without IN, the indexes are used. Let's take some example query. Suppose we have a table with two indexed columns:

  col1 INTEGER,
  col2 INTEGER,
  KEY key1( col1, col2 )

Let's take a look at some queries that could take advantage of the key1 index to read rows without accessing the table.

  1. SELECT col1, col2 FROM t1 WHERE col1 = 100;
  2. SELECT col1, col2 FROM t1 WHERE col1 > 100 AND col1 < 200;
  3. SELECT col1, col2 FROM t1 WHERE col1 > 100 AND col1 < 200 OR col1 > 300 AND col1 < 400;
  4. SELECT col1, col2 FROM t1 WHERE col1 = 100 AND col2 > 100 AND cold2 < 200;

These queries will use what MySQL calls Index Range Scans. (although the first query could also use Ref Scan). This access method will fetch rows from the index trees given a start and end value. It's also possible to read multiple intervals, each with a start and end value, as we saw in query 3 above.

A special case of intervals is when the endpoints are the same value. Range scans can be used for conditions such as col1 = 100 because it's equivalent to the interval 100 <= col1 <= 100. This way we can use range scans for a broader class of queries.

Armed with multiple-interval scans, a.k.a. multi-range reads, or MRR for short, we can use the range access for queries such as 

SELECT col1, col2 FROM t1
WHERE col1 = 100 or col1 = 200 or col1 = 300;

We can use all columns in the index of course:

SELECT col1, col2 FROM t1
WHERE col1 = 100 AND col2 = 100
   OR col1 = 200 AND col2 = 200 
   OR col1 = 300 AND col2 = 300;

At some point, this syntax becomes unwieldy. And this isn't just aesthetics, for really big queries, we get a combinatorial blowup which can cause parsing to take a long time. This is the reason why SQL has IN predicates to say the same thing:

SELECT col1, col2 FROM t1
WHERE col1 = 100 OR col2 = 200 OR col2 = 300;

means the same as

SELECT col1, col2 FROM t1
WHERE col1 IN (100, 200, 300);

And for rows it gets even more convenient:

SELECT col1, col2 FROM t1
WHERE col1 = 100 AND col2 = 100
   OR col1 = 200 AND col2 = 200 
   OR col1 = 300 AND col2 = 300;

can be written as

SELECT col1, col2 FROM t1
WHERE (col1, col2) IN ((100, 100), (200, 200), (300, 300));

The problem that users saw is that suddenly MySQL doesn't use MRR any more, and resorts to scanning the entire index. This is because the range optimizer ignored IN conditions over rows. The range optimizer is the sub-optimizer that analyzes conditions and translates them into a multi-range structure that can be handed to the storage engine to fetch the rows from the index. It handled IN predicates as long as they were over scalars or just a single row, but completely ignored lists of rows.

As of 5.7.3 this hole in the net is stitched up. The range optimizer gladly opens the door for queries with IN predicates as long as
  • The predicate is only IN, not NOT IN.
  • The row on the predicate's left-hand side is only indexed column references, in the same index.
  • The rows contain only constants or come from a previously read table in nested-loops join.
Note that 'constants' is a pretty broad category. It consists of pre-evaluated expressions, even some sub-queries, SQL variables and similar beings.

9 kommentarer:

  1. And by that, MySQL bug 31188 and all its duplicates (16247, 35819, 40029, 50571, 61541, 63167 and 64706) are fixed. Thanks Martin!

  2. Did you test that those are actually fixed, or just guessing?

    1. They were all marked as duplicates without any objections from the feature request reporters, but since you asked I just verified that they were indeed duplicates :-)

  3. I like how 16247 is a duplicate of 31188. Time machine!

  4. Yep... but as MarkC says: "This is the same as http://bugs.mysql.com/bug.php?id=16247, but the author of this feature request is much more convincing." :-)


  5. Awesome article. It is so detailed and well formatted that i enjoyed reading it as well as get some new information too.

    datawarehousing Training in Chennai

    Base SAS training in Chennai

  6. Very nice… i really like your blog............visit our website for more information and you can get online training facilities in R12 Procurement Training

  7. we have almost all country students as our subscribers for online course.We have 10+ years of experience we can serve various ascent people. oracle fusion Cloud HCM online training at erptree.com is worlds best online training center. we have excelent knowledge sharing Platform we have user friendly website where you will be provided with all the required details and Self-paced DEMO videos. we have our branches in pune,
    gurgaon, noida, india, usa, uk, uae, oracle fusion hcm training, fusion Procurement training, fusion hcm, scm training

  8. Thanks for such a great website which is helping people who is new to oracle apps and professional also.Your site is very impressive and you are doing an amazing job.
    Oracle Fusion Cloud HCM Online Training