A few weeks ago, we had a query optimization request from one of our customer.
The query was very simple like:
PLAIN TEXT CODE:
- SELECT * FROM `table` WHERE (col1='A'||col1='B') ORDER BY id DESC LIMIT 20 OFFSET
0
This column in the table is looks like this:
PLAIN TEXT CODE:
- `col1` enum('A','B','C','CD','DE','F','G','HI') default NULL
The table used to have rows more then 500k and of course, there is an index on the
col1 column but the cardinality of that index is around 100 what is very low (what explains ENUM
type
).
PLAIN TEXT CODE:
-
+----+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------------+
- | id | select_type | table | type | possible_keys | keyÂ
| key_len | ref | rows  |
Extra                     Â
|
-
+----+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------------+
- |Â 1 |
SIMPLE     | table |
range | col1
       |
col1 | 2Â Â Â Â Â Â | NULL |
606920 | Using where; Using filesort |
-
+----+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------------+
This query was running for more then 5 minutes!
When you want to run this query mysql first will try to find each row where col1 is A or B. Then
its going to order by the ID in a temp table and then throw away all result and return the first
20 match.
So to make this query faster without touching the architecture of the system, I had to re-think
the goals.
The customer mentioned and I also verified, without the ORDER BY its runs fast. So basically,
find each row where col1 is A|B is fast.
As ID column is your primary key, executing queries against it will be fast also.
But the combination of this two query in one is slow (because of the reasons I mentioned above).
So I tried this:
PLAIN TEXT CODE:
- select * from table where id in (SELECT id FROM `table` WHERE
(col1='A'||col1='B')) ORDER BY id DESC LIMIT 20 OFFSET 0;
In this case we are about to fool the mysql. The subquery is dependent, meaning that it'll be run
once for every row found in the outer query. Normally this is very bad for performance. But in
this special case, we have a small LIMIT so it won't be executed very many times.
PLAIN TEXT CODE:
-
+----+--------------------+-------+-----------------+---------------+---------+---------+------+--------+-------------+
- | id |
select_type      Â
| table |
type          Â
| possible_keys | key    | key_len |
ref | rows  |
Extra      |
-
+----+--------------------+-------+-----------------+---------------+---------+---------+------+--------+-------------+
- |Â 1 |
PRIMARYÂ Â Â Â Â Â Â Â Â Â Â
| table |
index         Â
|
NULLÂ Â Â Â Â Â Â Â Â
| PRIMARY | 4Â Â Â Â Â Â |
NULL | 765105 | Using where |
- | 2 | DEPENDENT SUBQUERY | table | unique_subquery |
PRIMARY,col1Â | PRIMARY |
4Â Â Â Â Â Â | func
|Â Â Â Â Â 1 | Using where |
-
+----+--------------------+-------+-----------------+---------------+---------+---------+------+--------+-------------+
As you can see, the explain says this query will run probably very slow, but lets see if its
true.
PLAIN TEXT CODE:
- (20 rows in set (0.01 sec))
As you can see, this a special and unique case when the subquery runs much faster using less
resources. I have to mention that this case is more like publication of an interesting case like
recommending you the way you can optimize queries . In any other case when the subquery is more
sophisticated, this won't work. But queries like the one on the top could be easily rewritten
like this when you have a poorly selective column to query against.
You always must consider that MySQL is amazingly bad with subqueries, it nearly always consider
then as dependent even if they are not. If the outer query returns 1000 rows, the subquery will
be executed 1000 times. So in this case, this is much faster and costs a way less to execute
because of the limit in the main query. But always be very careful with similar queries.
Entry posted by Istvan Podor | No
comment
Add to:
|
|
|
|