onsdag 15 maj 2013

The Outer Join to Inner Join Coversion


It is a central part of the MySQL philisophy to try and help you as much as you can. There are many occasions when it could tell you that what you are asking for is utterly stupid or give you a bad execution plan because "you asked for it". But we're friendly here. We don't do that. One of those cases is when you run a query with an outer join but you really meant an inner join, or you don't care.

Inner joins are almost always cheaper to execute than outer joins and that is why MySQL rewrites them to inner joins whenever it can.

An outer join may have WHERE conditions on any of the columns in the result set. In fact, it is a common trick to find non-matching rows using the IS NULL predicate. Here's an example:

TABLE person

name  id
Tom   1
Dick  2
Harry 3

TABLE car

brand owner_id mfg_year
Chevy 1        2007
Volvo 3        2008

The query SELECT name FROM person LEFT OUTER JOIN car ON id = owner_id WHERE brand IS NULL would give you the names of people that don't have a car. The IS NULL predicate is TRUE for an UNKNOWN value. Nearly all other predicates are what we refer to as null-rejecting. A null-rejecting predicate is one that is UNKNOWN if any of its arguments is UNKNOWN. This goes for most SQL predicates: >, <, =.

On the top-level of the where clause, there is an implicit IS TRUE predicate. This way you never get to see rows where it is unknown if they match your search condition or not. So the where clause in our query above is equivalent to

WHERE brand IS NULL IS TRUE

We could also query

SELECT name
FROM person LEFT OUTER JOIN car ON id = owner_id
WHERE mfg_year = 2008 IS UNKNOWN

The result of which would be "Dick", because it is indeed unknown if Dick's car was made in 2008 if he doesn't have a car. Computation wise, the condition is evaluated bottom-up: UNKNOWN = 2008 is UNKNOWN.

We are now ready to turn this notion into a rule-of-thumb: If there are predicates in the where clause that are not nested inside predicates that turn UNKNOWN into TRUE, for example IS NULL or IS UNKNOWN (i.e. null-rejecting), then we can turn the outer join into an inner join before we optimize, because the result set is the same. Unfortunately this translation does not show up in EXPLAIN, but it does in the optimizer trace. Here's how to see it.

SET optimizer_trace="enabled=on";


SELECT name
FROM person LEFT OUTER JOIN car ON id = owner_id
WHERE mfg_year = 2008;
+-------+
| name  |
+-------+
| Harry |
+-------+
SELECT trace FROM information_schema.optimizer_trace;
...

    {
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            "transformations_to_nested_joins": {
              "transformations": [
                "outer_join_to_inner_join",
                "JOIN_condition_to_WHERE",
                "parenthesis_removal"
              ],
              "expanded_query": "/* select#1 */ select `person`.`name` AS `name` from `person` join `car` where ((`car`.`mfg_year` = 2008) and (`person`.`id` = `car`.`owner_id`))"

...