Go to the first, previous, next, last section, table of contents.

How to get maximum performance out of MySQL

How does one change the size of MySQL buffers?

You can get the current buffer sizes with:

> ./mysqld --help

This should result in a list of all mysqld options and configurable variables like the following.

Possibly variables to option --set-variable (-O) are: 
back_log              current value: 5
join_buffer           current value: 131072
key_buffer            current value: 1048568
max_allowed_packet    current value: 65536
max_connections       current value: 90
max_join_size         current value: 4294967295
max_sort_length       current value: 1024
net_buffer_length     current value: 8192
record_buffer         current value: 131072
sort_buffer           current value: 2097144
table_cache           current value: 64
tmp_table_size        current value: 1048576
thread_stack          current value: 65536

back_log	How many outstanding connection requests may MySQL have. This comes into play when the main MySQL thread gets VERY many connection requests in a very short time. It then takes some time (but very short) for the main thread to check the connection and start a new thread. The back_log is how many connects can be stacked during this short time before MySQL momentarily stops answering new requests. You only need to increase this if you expect a large number of connections in a short period of time. In other words, the size of the listen queue for incoming tcp/ip connections. The manual page for the unix system call listen(2) should have more details. Check your OS documentation for the maximum value for this variable.
join_buffer	This buffer is used for full joins (without indexes). It is allocated one time for each full join between two tables. Increase this to get a faster full join when adding indexes is not possible. Normally the best way to get fast joins is by adding indexes.
key_buffer	Buffers index blocks and are shared by all threads. You might want to increase this when doing many delete/inserts on a table with lots of indexes. To get even more speed use `LOCK TABLES`. See section `LOCK TABLES` syntax.
max_allowed_packet	Max size of one packet. This allows the message buffer to grow up to this limit when needed (it is initiated to `net_buffer_length`). May be set very big because this is mainly to find erroneous packets. You must increase this if you are using big BLOBS. It should be as big as the biggest BLOB you want to use.
max_connections	How many simultaneous clients are allowed. If you increase this you probably has to increase the number of file descriptors mysqld has. This is Operating system depended so look at you OS documentation.
max_join_size	Joins that touch more records than max_join_size return an error. Set this if you have users to tend to make joins without a `WHERE` that take a long time and return millions of rows.
max_sort_length	The number of bytes to use when sorting on `BLOB` or `TEXT` columns.
net_buffer_length	The communication buffer is reset to this size between queries. This should not normally be changed, but if you have very little memory you can set it to the expected size of a query.
record_buffer	Each thread that is doing a sequential scan allocates a buffer of this size for each table it scans. If you do many sequential scans you may want to increase this.
sort_buffer	Each thread that needs to do a sort allocates a buffer of this size. Increase this for faster `ORDER BY` or `GROUP BY`. A sort also allocates one or two temporary files. The maximum disk-space needed is `(length_of_what_is_sorted + sizeof(database_pointer)) * number_of_rows * 2`. `sizeof(database_pointer)` is usally 4 but may grow in the future for really big tables.
table_cache	Number of open tables for all threads. If this is increased you must see to that the number of open file descriptor is also increased. MySQL needs two file descriptors for each unique table.
tmp_table_size	If a temporary table gets bigger than this a `The table ### is full` error will be generated. Increase this if you do many advanced `GROUP BY` queries.
thread_stack	How big will each threads `C` stack be. A lot of the limits detected by crash-me are dependent on this. The default is normally enough.

MySQL uses algorithms that are very scalable so one can usually run with very little memory or give MySQL more memory to get better performance. If you have much memory and many tables and want maximum performance with a moderate number of clients you should use something like:

> safe_mysqld -O key_buffer=16M -O table_cache=128 \
        -O sort_buffer=4M -O record_buffer=1M &

If you have little memory with lots of connections, use something like:

> safe_mysqld -O key_buffer=512k -O sort_buffer=100k -O record_buffer=100k &

or even

> safe_mysqld -O key_buffer=512k -O sort_buffer=16k -O table_cache=32 \
        -O record_buffer=8k -O net_buffer=1K &

Note that if you change an option to mysqld it is only for that instance. To see the effects of a parameter change do something like this mysqld -O keybuffer=32m --help. If there are very many connections, 'swapping problems' may occur if mysqld has not been configured to use very little memory for each connection. It also works better if you have a enough memory for all connections of course. For example, for 200 open connections one should have a table cache of at least 200 * (max_number of tables in join).

How does MySQL use memory ?

The key_buffer is shared by all threads, the rest are allocated when needed.
Each connection uses some thread space, a stack (64K) and a connection buffer (variable net_buffer_length).
All threads share the same base memory.
Nothing is memmapped yet (except compressed tables but that's another story). This is because the 32-bit memory space of 4GB is not large enough for most large tables. When we get a system with 64-bit address space we may add general support for memmaping.
When starting mysqld one can specify a key buffer. This will buffer all indexes in all tables at FIFO basis (variable keybuffer).
Each request doing a sequential scan over a table allocates a read buffer (variable record_buffer).
Each request doing a sort allocates a sortbuffer and one or two temporary files. The maximum extra disk-space needed is @math{(sort_key_length +sizeof(long))*2}.
All joins are done in one pass and most joins can be done without even using a temporary table. Most temporary tables are memory based (HEAP) tables. Temporary tables with a big recordlength (= sum of all column lengths) or that contains BLOBs are stored on disk. One current problem is that if the HEAP table exceeds the size of tmp_table_size, one will get the error: 'The table ### is full'. In the future we will fix this by automatically changing the in memory (HEAP) table to a disk based (NISAM) table if needed. To go around this problem one can increase the -O tmp_table_size=# option to mysqld or use the SQL option SQL_BIG_TABLES. See section SET OPTION syntax.. In MySQL 3.20 the maximum size of the temporary table was recordbuffer*16, so if you are using this version you have to increase recordbuffer. There also exists a patch to always store temporary tables on disk, but this will affect the speed of all complicated queries.
Almost all memory used when parsing and calculating is done in a local memory store. No memory overhead is needed for small items and the normal slow memory allocation/freeing is avoided. Memory is only allocated for unexpectedly large strings (this is done with malloc/free).
Each index file is opened once and the data file is opened once for each concurrently-running thread. For each concurrent thread a table structure, column structures for each column, and a buffer that has the size of 3 * (maximum row length not counting BLOBs) is allocated. A BLOB uses 5 to 8 bytes +length of blob data.
For each table having BLOBs, a buffer is enlarged dynamically to read in larger BLOBs. If one scans a table there will be a allocated buffer as large as the largest BLOB.
All used tables are saved in a cache and used by FIFO. Normally the cache is 64 tables. If a table has been used by two running threads at the same time, there will be two entries of the table in the cache.
A mysqladmin refresh closes all tables that are not in use and marks all used tables to be closed when the running thread finishes. This will effectively free most used memory.

When running mysqld, ps and other programs may report that it uses a lot of memory. This may be caused by thread-stacks on different memory addresses. For example, the Solaris ps calculates the unused memory between stacks as used memory. You can verify this by checking available swap with 'swap -s'. We have tested mysqld with commercial memory-leakage detectors so there should not be any memory leaks.

How does MySQL use indexes?

All indexes, PRIMARY, UNIQUE and INDEX(), are stored in B trees. Strings are automatically prefix- and end-space compressed.

INDEX(col1, col2) creates a multiple index over the two columns. The index can be seen as a concatenation of the given columns. If you use INDEX(col1), INDEX(col2) instead of INDEX(col1,col2) you get two separate indexes instead.

SELECT * FROM table WHERE col1=# AND col2=#

In a case of INDEX(col1,col2) the right row(s) can be fetched directly. In a case of INDEX(col1), INDEX(col2) the optimiser decides which index will find fewer rows and this index will be used to fetch the rows.

If the table has an index INDEX(col1,col2,col3...) the prefix of this can be used by the optimiser to find the rows. This means that the above gives you search capabilities on: INDEX(col1) and INDEX(col1,col2) and INDEX(col1,col2,col3)...

MySQL can't use a portion of an index to locate rows through an index.

With the definition INDEX(col1,col2,col3):

SELECT * FROM table WHERE col1=#
SELECT * FROM table WHERE col2=#
SELECT * FROM table WHERE col2=# and col3=#

only the first query will use indexes.

What kind of optimisation is done on the WHERE clause?

(Incomplete, MySQL does a lot of optimisations.)

The first issue about making a slow SELECT ... WHERE faster is to check if one could add an index. All references between different tables should usually be done with indexes. One can use the EXPLAIN command to check which indexes are used in a select. See section EXPLAIN syntax. Get information about a SELECT.. See section How does MySQL use indexes?

Brace removal (all unnecessary braces are removed). ((a AND b) AND c OR (((a AND b) AND (c AND d)))) -> (a AND b) OR (a AND b AND c AND d)
Constant folding. (a<b AND b=c) AND a=5 -> b>5 AND b=c A=5
Constant condition removal (needed because of constant folding). (b>=5 AND b=5) OR (b=6 and 5=5) or (B=7 and 5=6) -> B=5 or B=6
All constant expressions used by indexes are evaluated only once.
Return zero rows if impossible select.
HAVING is merged with WHERE if one doesn't use GROUP BY or group functions.
For each sub join a simpler WHERE is constructed to get a fast WHERE evaluation for each sub join and also to skip records as soon as possible.
Find all indexes that may be used. Use the index which finds least records. An index is used for the following cases: =, >, >=, <, <=, BETWEEN and a LIKE with a character prefix like 'something%'.
Remove indexes that don't span all AND levels.
- have all preceding index_parts specified.
- index = 1 or A = 10 -> NULL (can't use index.)
- index = 1 or A = 10 and index=2 -> index = 1 OR index = 2
- index_part_1 = const and index_part_3 = const -> index_part_1 = const
Read all constant tables. A constant table is:
1. A table with 0 or 1 record.
2. A table which uses only other const tables and constants on a full unique index.
- const_table.index_part_1 = const_table2.column and const_table.index_part_2 = constant
Find the best join combination to join the tables, by trying all possibilities :(. If all columns in ORDER BY and in GROUP come from the same table, then this table is preferred first when joining.
If there is an order clause and a different group clause, or if the order or group contains columns from other tables than the first table in the join queue, a temporary table is created.
For each table use a ranged index, if possible, to read records. Each table index is queried and if there is an index range that spans < 30% of the records then an index is used. If no such index can be found a quick table scan is used.
Before each record is output, skip those that match the HAVING clause.

How does MySQL open & close tables?

The cache of open tables can grow to a maximum of table-cache (default 64, changeable with -O table_cache=#). A table is never closed, except when the cache is full and another thread tries to open a table or if one uses 'mysqladmin refresh'.

When the limit is reached, MySQL closes as many tables as possible, until the cache size has been reached or there are no more unused tables. This means that if all tables are in use by some threads, there will be more open tables than the cache limit, but the extra tables will be closed eventually. Tables are closed according to last-used order.

A table is opened (again) for each concurrent access. This means that if one has two threads running on the same table or access the table twice in the same query (with AS) the table needs to be opened twice. The first open of any table takes two file descriptors, each following use of the table takes only one file descriptor.

If MySQL notices that a table is a symbolic linked it will resolve the symlink and use the table it points to instead. MySQL will not notice if the database directory is symbolic linked! If you symlink a database to another and use the same table trough both databases, the tables will be inconsistent after any update to the tables.

What are the drawbacks of creating possibly thousands of tables in a database?

Each table is actually three files. If you have many files in a directory open, close and create will be slow. If you also do selects on many different tables there will be a little overhead because when the table cache is full, for every table that has to be opened another has to be closed. One can make the overhead smaller by making the table cache larger.

How does MySQL lock tables?

All locking in MySQL is deadlock free. This is managed by always requesting all needed locks at once at query start and always locking the tables in the same order.

The locking method MySQL uses for WRITE lock works as follows:

If there is no locks on the table, put a write lock on it, else put the lock in the write lock queue.

The locking method MySQL uses for READ locks works as follows:

If there is no write locks on the table, put a read lock on it else put the lock in the read lock queue.

When a lock is released first use give the lock to the threads in the write lock queue and after this to the threads in the read lock queue.

This means that if you have many updates on the same table, select statements will be waiting until there is no more updates.

To fix this in the case where you do many inserts and many selects on the same table you could insert rows in another table and once in a while update the other table with all records from the temporary table.

This can be done with the following code:

LOCK TABLES real_table WRITE, insert_table WRITE
insert into real_table select * from insert_table
delete from insert_table
UNLOCK TABLES

One could also change the locking code in mysys/thr_lock.c to use only one queue. In this case write locks would have the same priority that read locks and this could help some applications.

How should I arrange my table to be as fast/small as possible?

Use NOT NULL if possible. It makes everything faster and you save one bit per column.
All columns have default values. Only insert when the default value isn't acceptable. You don't have to insert the columns of a timestamp or an autoincremented index in the insert statement. See section How can I get the unique ID for the last inserted row?.
Use the smaller INT types if possible to get smaller tables. For example, MEDIUMINT is often better than INT.
If you don't have any VARCHAR columns, a fixed size record format will be used. This is much faster but may unfortunately waste some space. See section What are the different row formats? Or when to use VARCHAR/CHAR?.
To make MySQL optimise queries better, run isamchk -a on the table once it is loaded with relevant data. This updates a value for each index that tells how many rows that have the same value for this index on average. Of course, this is always 1 for unique indexes.
To sort an index and data according to an index use isamchk -Sir1 (if you want so sort on index 1). If you have a unique index from which you want to read all records in numeric order, this is a good way to make that faster.
When loading a table with data use LOAD DATA FROM INFILE. This is usually 20 times faster than using a lot of INSERTs. If the text file isn't on the server, rcp it to the server first. See section LOAD DATA INFILE syntax. You can even get more speed when loading data to tables with many indexes by doing:
- Create the table in mysql or perl with CREATE TABLE...
- Do mysqladmin refresh.
- Use isamchk -k0 database/table_name. This will remove all use of indexes from the table.
- Insert data into the table with LOAD DATA INFILE....
- If you have pack_isam and want to compress the table, run pack_isam on it.
- Recreate the indexes with isamchk -rq database/table_name.
- Do mysqladmin refresh.
The other possibility to get some more speed from both LOAD DATA FROM INFILE and INSERT is to enlarge the key buffer. This can be done with the -O key_buffer=# option to (safe)mysqld. For example 16M should be a good value if you have much RAM :)
When dumping data as textfiles to other programs, use SELECT ... INTO OUTFILE. See section LOAD DATA INFILE syntax.
When doing many inserts/updates in a row, you can get more speed by using LOCK TABLES on the tables. ...FROM INFILE... and ...INTO OUTFILE... are atomic so you don't have to use LOCK TABLES when using these. See section LOCK TABLES syntax.

To check how you are doing, run isamchk -evi on the .ISM file. @c See section The MySQL table check, optimise and repair program.

What affects the speed of the INSERT statement?

The time to insert a record consists of:

Connect: (3)
Sending query to server: (2)
Parsing query: (2)
Inserting record: (1 x size of record)
Inserting indexes: (1 x indexes)
Close: (1)

Where (number) is proportional time. This does not take into consideration the initial overhead to open tables (which is done once for each simultaneous running query).

The size of the table slows down the insert of indexes with N log N (B-trees).

A way of speeding up inserts is to lock your table during the inserts.

LOCK TABLES a WRITE;

INSERT INTO a VALUES (1,23)
INSERT INTO a VALUES (2,34)
INSERT INTO a VALUES (4,33)
INSERT INTO a VALUES (8,26)
INSERT INTO a VALUES (6,29)

UNLOCK TABLES;

The main speed difference is that the index buffer is only flushed once to disk for all inserts. Normally there would be as many index buffer flushes as there are inserts.

Locking will also lower the total time of multi-connection test but the maximum wait time for some threads will go up.

For example:

thread 1 does 1000 inserts
thread 2, 3, and 4 does 1 insert
thread 5 does 1000 inserts

If you don't use locking, 2, 3 and 4 will finish before 1 and 5. If you use locking 2,3,4 may finish before 1 or 5 but probably not, but the total time should be about 40% faster.

As INSERTs, UPDATEs and DELETEs are very fast in MySQL, one will obtain better overall performance by adding locks around everything that does more than about 5 inserts/updates in a row. If one does very many inserts in a row one could do a UNLOCK TABLES followed by a LOCK TABLES once in a while (about each 1000 rows) to give other threads access to the table. This would still give a nice performance gain.

Of course LOAD DATA INFILE is much faster still.

What affects the speed of DELETE statements?

The delete time of a record is exactly proportional to the number of indexes. To increase the speed of deletes you can increase the size of the index cache. The default index cache is 1M and to get faster deletes it should be increased a couple of times (try 16M if you have enough memory).

How do I get MySQL to run at full speed?

Start mysqld with the right options. More memory gives more speed if you have it. See section How does one change the size of MySQL buffers?.
Create indexes to make your SELECTs faster. See section How does MySQL use indexes?
Optimize your column types to be as efficient as possible. Use NOT NULL on all columns. See section How should I arrange my table to be as fast/small as possible?
--skip-locking disables file locking between SQL requests. This gives a greater speed but has the following consequences:
- One MUST flush all tables with mysqladmin refresh before one tries to check/repair tables with isamchk. (isamchk -d table_name is always allowed).
- One can't run two MySQL servers one the same data files, if both are going to update the same tables.
The --skip-locking is on by default when compiling with MIT threads. This is because flock() isn't fully supported by MIT threads on all platforms.
If updates are a problem, one can delay updates and then do many updates in a row later. Many updates done in a row are much quicker than one at a time.
If the problem is with MIT threads and one is using FreeBSD, upgrading to FreeBSD 3.0 (or higher) should help. This gives a possibility to use sockets (quicker than the current TCP/IP with MIT threads) and the thread package is much more integrated.

What are the different row formats? Or when to use VARCHAR/CHAR?

MySQL dosen't have true SQL VARCHAR() types.

MySQL has instead 3 different ways to store records and uses this to emulate VARCHAR():

If one doesn't use any of the VARCHAR, BLOB or TEXT column types a fixed row size is used, otherwise a dynamic row size is used. CHAR() and VARCHAR() are treated identically from the applications point of view; Both truncates end space from the column when the column is accessed.

You can check the format used in a table with isamchk -d.

MySQL has three different table formats:

Fixed length tables.
- This is the default format. It's used when there is no VARCHAR(), TEXT or BLOB column types in the table.
- All CHAR(), NUMERIC() and DECIMAL() columns are space filled.
- Very quick.
- Easy to cache.
- Easy to reconstruct after a crash as records are in fixed positions.
- Don't have to be reorganised (with isamchk) unless a huge number of records are deleted and one wants to free space to the operating system.
- This will usually take more disk space than dynamic tables.
Dynamic tables
- Is used if there exists any columns with a VARCHAR, TEXT or BLOB type in a table.
- All strings are dynamic (except if length < 3).
- Each record is preceded with a bitmap for which columns are empty (") or zero (this isn't the same as null columns). Each string is saved with a length byte + string. If string is zero length or is number zero it is marked in the bit map and not saved to disk.
- Each record is uses the exact record space required. If a record becomes larger it is split into as many pieces as required.
- Takes usually much less disk space than fixed length records.
- If one updates rows with information that extends the row length the row will be fragmented. In this case one may have to run isamchk -r from time to time to get better performance. Use isamchk -ei table_name for some statistics.
- Not as easy to reconstruct because a record may be in many pieces and a link may be missing.
- The expected row length for dynamic sized records is: 3 + (number_of_columns + 7) / 8 + (number of char columns) + packed_size_of_number_columns + length_of_strings + (null_columns + 7) / 8. There will be a penalty of 6 bytes for each link. A dynamic record will be linked whenever an update causes an enlargement of the record. Each new link will be at least 20 bytes, so the next enlargement will probably go in the same link. If not, there will be another link. You may check how many links there are with isamchk -ed. All links may be removed with isamchk -r.
Compressed tables:
- A read only table made with the pack_isam utility. All customers with extended MySQL email support are entitled to a copy of pack_isam for their intern usage.
- The uncompress code exists in all MySQL distributions so even customers that doesn't have pack_isam can read tables compressed with pack_isam (on the same platform).
- Takes very little disk space. Minimises disk usage.
- Each record is compressed separately (very little access overhead). The header for a record is fixed 1-3 bytes depending on the biggest record in the table. Each column is compressed differently. Some of the compression types are:
  - There is usually a different Huffman table for each column.
  - Suffix space compression.
  - Prefix space compression.
  - Numbers with value 0 is stored with 1 bit.
  - If a integer is used with a smaller range the integer column is stored with the smallest possibly type. For example a LONGLONG (8 bytes) column may be stored as TINYINT column if all values are in the range 0-255.
  - If a value has only a small set of possible values, the value is converted to enum().
  - A column may use a combination of the above compressions.
- Can handle fixed or dynamic length records, but no BLOB or TEXT columns.
- Can be uncompressed with isamchk.

Why so many open tables?

When you run mysqladmin status you get something like:

Uptime: 426 Running threads: 1 Questions: 11082 Reloads: 1 Open tables: 12

This can be somewhat perplexing if you only have 6 tables.

As MySQL is multithreaded it may have many queries on the same table at once. To minimise the problem with two threads having different states on the same file, I open the table again for each concurrent thread. This takes some memory and one extra file descriptor for the data file. The index file descriptor is shared between all threads.

Go to the first, previous, next, last section, table of contents.

Casa de Bender