I have not caused a fist fight in a while, so it's time to take off the gloves. I claim that
somewhere around of 99% of advice about tuning MySQL's key cache hit ratio is wrong, even when
you hear it from experts. There are two major problems with the key buffer hit ratio, and a host
of smaller ones. If you make some assumptions that are very hard to prove, there actually is a
very limited use for the statistics from which the ratio is derived (but not the ratio itself,
which you should ignore). Read on for the details.
In this article, I will use key buffer, key_buffer, and key cache interchangeably. However, I
will be careful about the difference between "rate" and "ratio". In this article, the key
cache miss rate is defined as the number of misses per unit of time, with the units of
operations per second. The key cache miss ratio is the ratio between reads from the disk
and reads from the cache; it is a dimensionless number because the units are the same in the
numerator and denominator, and they cancel out. The key cache hit ratio is simply one
minus the miss ratio.
The key_buffer hit ratio
The metrics we're interested in are all defined in terms of counters you can get from SHOW GLOBAL
STATUS. I'll start out by copying and pasting from the
MySQL manual:
So far, so good. All of the above is mostly factual (more on this later). Here's another fact
from the manual:
The cache miss rate can be calculated as Key_reads/Key_read_requests. [Note: their use of
'rate' doesn't match mine; I would call this the 'miss ratio'].
The problem occurs when you start to assign any importance to this ratio. The MySQL manual
doesn't fall into this trap, but if you search Google for Key_read_requests, you will find lots
of advice on "tuning by ratio," including phrases such as "The ratio of key_reads to
key_read_requests should be as low as possible, no more than 1:100" or "your key cache hit ratio
should be very high, ideally above 99.9%" or "if your key cache hit ratio is low, your
key_buffer_size is too small."
So here's a summary of two pieces of bad advice:
-
Bad advice #1: you should care about the key cache hit ratio.
-
Bad advice #2: you should set your key_buffer_size according to this ratio.
Tuning by ratio is one of those things that is widely accepted because of "proof by repeated
assertion," but as you know, that doesn't make it valid. Let's see why the above two things are
bad advice.
Problem 1: Ratios don't show magnitude
Ratios of counters are virtually meaningless for two major reasons. The first is that ratios
obscure magnitude. Look at the following and tell me what you think:
- Server #1 has a key cache miss ratio of 23%
- Server #2 has a key cache miss ratio of 0.001%
Which server is badly tuned? Maybe you're objecting that you don't know enough about the servers.
OK, I'll give you more information. Both servers are real production servers, with powerful
hardware and heavy query workloads. Both have a key_buffer_size of 4GB. Now can you tell which
server is badly tuned? -- No, you can't tell anything meaningful based on a ratio of counters,
because the process of dividing one counter by the other to get the ratio has discarded vital
information. You don't know how many Key_reads and Key_read_requests those servers have done.
Just for fun, consider these fake but entirely possible scenarios: server #1 has 23 Key_reads and
100 Key_read_requests (23% miss ratio). Server #2 has one trillion Key_reads, and one hundred
quatrillion Key_read_requests (1/100th of a percent). Given that information, which server is
badly tuned? If you said "I still can't tell," maybe you want more information, so I'll tell you
that both servers are identically tuned, and they have identical data, hardware, and workload.
Even that doesn't help, though.
Problem 2: Counters don't measure time
The reason you still can't tell which server is badly tuned is because, even if you know
the absolute numbers, you are missing the element of time, in two important ways. First, you
don't know how long of an interval I used to measure the statistics on those two servers. Maybe I
measured the first server immediately after starting it, and that's why its counters are so
small. The second server has been online practically forever, and that's why its counters are
big. Let's say this is the case. Now, you've got all the information you need to form an opinion,
right? Instead of asking the same annoying question, let me ask it a different way: is either of
these servers badly tuned?
There's still not enough information -- I hope you're beginning to appreciate that tuning by
ratio is a waste of time! The Oracle folks arrived at this conclusion a long time before the
MySQL world started to come around. There are even tuning utilities (anti-tuning anti-utilities?)
for Oracle, specifically designed to mock and frustrate those who would tune by ratio. They are
capable of creating any buffer hit ratio the user desires by running silly queries that do
nothing but cause buffer hits, skewing the result towards "this ratio looks great!"
The second kind of time information you're lacking is how much time each buffer hit or miss
takes. If you approach application performance optimization from the standpoint of response time
measurements, which you should, you will eventually arrive at this question. "I have a query I
know is slow and is a problem for my application. I have profiled it with SHOW STATUS and I know
it causes a great many Key_reads to occur. How much of this query's execution time is consumed by
those operations? Should I try to reduce Key_reads?"
There is no way to know. All you get is counters -- you don't get the time elapsed. In technical
terms, counters are surrogate
measures. They are not helpful. And as Cary Millsap says, the unfortunate problem is that
surrogate measures work sometimes, simply because there is sometimes a correlation (but not a
cause) relationship between the counter events and the query's execution time. Alas, that
correlation fools us into thinking it's a cause, and we optimize-by-surrogate-measure a time or
two and it appears to work -- so we turn into little Pavlovian DBAs and try to do that every
time. It would be better if optimizing-by-counter never worked!
A partially valid use of Key_reads
There is a partially valid reason to examine Key_reads, assuming that we care about the number of
physical reads that occur, because we know that disks are very slow relative to other parts of
the computer. And here's where I return to what I called "mostly factual" above, because
Key_reads actually aren't physical disk reads at all. If the requested block of data isn't in the
operating system's cache, then a Key_read is a disk read -- but if it is cached, then it's just a
system call. However, let's make our first hard-to-prove assumption:
-
Hard-to-prove assumption #1: A Key_read might correspond to a physical disk
read, maybe.
If we take that assumption as true, then what other reason might we have for caring about
Key_reads? This assumption leads to "a cache miss is significantly slower than a cache hit,"
which makes sense. If it were just as fast to do a Key_read as a Key_read_request, what use would
the key buffer be anyway? Let's trust MyISAM's creators on this one, because they designed a
cache hit to be faster than a miss.
-
Hard-to-prove assumption #2: A key cache miss is probably slower than a hit,
maybe.
What else? Maybe this physical I/O operation is randomly positioned, which is a
worst-case scenario for spinning disks. This is also very hard to prove, but seems reasonable
based on the structure of a B-tree index, so let's assume anyway:
-
Hard-to-prove assumption #3: A Key_read might cause a random I/O operation,
maybe.
Now, given those assumptions, we can further assume the following:
-
It is good to minimize Key_reads because they are slow and cause random disk
I/O.
Notice that we still don't know anything about any relationship between Key_reads and the
execution time of our query. All we can do is guess, like good Pavlovian DBAs, that there is a
relationship. However, we can again reason that random I/O can cause collateral damage: if the
disk head is seeking all over for random I/O, then other I/O (including non-random I/O) is likely
to be impacted. If we manage to reduce Key_reads, we might make the database server faster
overall, and perhaps the query of interest will accidentally get faster too, and we'll get a
treat.
There is one interesting question that we haven't really addressed yet. How bad is bad? This is
where we return to the notion of the key cache miss rate in units of operations per
second. Given our assumed correlation between a Key_read and a random physical disk I/O, it is
partially valid to say that we are going to get in trouble when Key_reads gets close to the
number of random I/Os our disk can do. Here's another formula for you:
Key_cache_miss_rate = Key_reads / Uptime
Note the conspicuous absence of Key_read_requests in the formula. The number of requests is
absolutely irrelevant -- who cares how often the key is requested? What's relevant is that our
assumed connection between Key_reads and random I/Os means that Key_reads/Uptime is assumed to be
the same as "random I/Os per second."
And now, I would finally like to show you something partially useful you can do with Key_reads:
PLAIN TEXT CODE:
- [baron@localhost ~]$ mysqladmin ext -ri10 | grep Key_reads
- | Key_reads
| 6030962 |
- | Key_reads
| 98 |
- | Key_reads
| 89 |
- | Key_reads
| 104 |
This server is doing approximately 100 Key_reads every ten seconds, so we can assume Key_reads
are causing about ten random I/Os per second. Compare that to what your disks are capable of, and
draw your own conclusions about whether this is a performance problem. I know what I'd like: I'd
like to ask the disk itself how much random I/O it's doing. But alas, that's virtually impossible
on most systems I work on. So there you have it -- yet another surrogate measure.
How to choose a key_buffer_size
Let's recap. So far I've shown you the fallacy of tuning by ratio, and told you to ignore the
ratio and in fact, ignore Key_read_requests altogether. I've explained that counters are a
surrogate measure, but the fact that they're easy to get and sometimes correlated with the true
problem causes people to mistake counter analysis for a true performance optimization method.
I've shown that if we make some assumptions that are hard to prove, we can compare Key_reads to
the disk's physical capacity for random I/O and get an idea of whether index I/O might be causing
a performance problem.
But I haven't shown you how to choose an appropriate key_buffer_size. Let's look at that now.
This topic deserves an entire blog post, because there are many subtleties including the
possibility of having multiple key caches. But I'll give the simple version here. In my opinion,
you should choose a key_buffer_size that is large enough to hold your working set -- the index
blocks that are frequently used. How large is that? This is yet another thing that's really hard
to measure, alas! So we need to either pick a surrogate, or pull a number out of thin air. Here
are some suggestions that are about as good as any:
- Just set it really big and forget it. If you have enough memory, who cares. The memory isn't
allocated until it's used -- if you set it to 4GB, that doesn't mean that 4GB is actually used.
This is not as much of an abdication of responsibility as it might sound like on a machine that's
dedicated to MyISAM tables.
- Consider your mixture of storage engines (some InnoDB, some MyISAM, which is more important
to you, etc) and choose an amount of memory based on how important those tables are to you, how
big the indexes are on disk, and so on.
- Raise the key_buffer_size until, when the buffer is full, Key_reads/Uptime reduces to a
number you're comfortable with.
- Set key_buffer_size really big, and then measure its size as it fills up, in something like 1
minute intervals. Pull this into a spreadsheet and graph it as a curve. When the curve's growth
tapers off, pick that point on the curve and use it as a heuristic for how big your working set
is. Set the key_buffer_size to that.
If the above methods shock you with their unscientific-ness, they shouldn't. The reality is that
this server setting is very subjective, and there is no good instrumentation in MySQL to guide
your decisions. It is also not the be-all and end-all of MySQL performance, and people frequently
obsess over it far out of proportion. But again, 99% of the advice I've seen is based on
something much worse: a red herring that only sounds scientific and authoritative -- the
"key cache hit ratio." This is a shame. When you are new to MySQL, trying to configure my.cnf,
and you have heard guidance that seems so definite, mathematical, and authoritative, but still makes no sense, why wouldn't you obsess
over it?
What about InnoDB tuning?
You might be wondering, what about InnoDB tuning? What is the best way to choose an
innodb_buffer_pool_size setting? This is a topic that deserves its own article too, but the short
version is: ratio-based tuning is just as wrong for InnoDB as it is for MyISAM. Ratio-based
tuning is invalid and wrong in general, not just for specific things. All of the above points
(loss of magnitude, lack of timing information, etc) apply to all types of ratio-based and
counter-based tuning techniques.
Summary
Major points in this article:
- Counter ratios are meaningless as a performance analysis metric because they are a) ratios of
b) counters.
- You need to look at absolute magnitude, not ratios.
- For performance analysis, you need to measure elapsed time, not just the number of times
something happens.
- In the absence of timing information, and if you trust the cache's creator and assume that
misses are more expensive than hits, then you care about cache misses, not cache hits.
- A Key_read is not guaranteed to be a random physical I/O, but it might be. Gather
Key_reads/Uptime over 10-second or 60-second intervals and compare the results to your IO
system's capabilities.
- MySQL doesn't have good instrumentation for scientifically choosing a key_buffer_size
setting, but there are many unscientific approaches that are better than ratio-based tuning.
- Counter ratios suck for everything, not just for MyISAM tuning.
I've had some heated arguments over these points, so I don't expect the above to pass without
controversy. But really, it's time to stop with the bad advice about counter ratios. The sooner
we do that, the sooner we can move on to better things.
Entry posted by Baron Schwartz |
No comment
Add to:
|
|
|
|