MySQL Query Optimisation

Intro

Some notes on MySQL (v4.1.13) optimization where I managed to reduce the time of performing a task from 1 hour to 9.4 minutes.

Background

While working for one customer, a task I had was to consolidate (mergE) data from a number of separate databases into one. This involved importing a relatively large quantity of data into a central MySQL database.

Because the data is being consolidated, it's necessary to perform a small amount of data processing on it to maintain record uniqueness and relationships - i.e. most primary key fields need changing to preserve uniqueness. The time cost of this renaming is constant throughout.

The server in question is a dual processor Pentium IV, runs RedHat EL3, and has 512mb of RAM. The database data files are on a separate SATA disk. The kernel and all libraries are standard RedHat packages. MySQL v4.1.13 had been compiled from source. The database in question used the MyISAM storage backend.

Naive Approach

i.e. What you do without any thought of optimisation -

Executing single inserts for each record, i.e

INSERT INTO consolidated_table (list,of,fields) VALUES ('a','b',...)

In my situation this took 1 hour to process 450,000 rows.

Slightly faster - grouping insert statements

MySQL allows multiple insert statements to be grouped together (I don't think this is a very 'standards' compliant practice, but nevertheless...). Statements can therefore look similar to:

INSERT INTO my_table (key1, key2, key3) VALUES ('a', 'b', 'c'), ('a2', 'b2', 'c2'), ....;

This approach reduced the time to 35 minutes when batching 500 queries together in one single insert statement.

The draw back to this is that it makes it slightly difficult to discover syntax errors in your SQL!

Increasing the batch size to 1000, reduced the time down to approximately 25 minutes.

Disabling Index Keys

Index keys are normally updated on every insert, unfortunately this slows down the process of inserting itself. They can be disabled, using the following SQL

ALTER TABLE my_table DISABLE KEYS

Combining the (1000) batched queries and disabling index keys during the import itself resulted in a time of 9.5 minutes. These times include the time required to rebuild the index keys afterwards with ALTER TABLE my_table ENABLE KEYS

Talking directly through a socket

In my case, I found the overhead of talking to MySQL via tcp/ip was minimal; rerunning the previous test but using the socket file for communication saved approximately an extra 20 seconds.

These aren't scientific numbers, but should help for simple optimisation.

my.cnf values

These are slightly optimised above the default values, and are roughly tuned to the system in question.
I found it necessary to increase the mysql_data_pointer_size value setting (see below) to allow for table sizes above 4gb.

[mysqld]
port = 3306
socket = /tmp/mysql.sock
skip-locking
key_buffer = 256M
max_allowed_packet = 1M
table_cache = 256
query_cache_size = 32M
sort_buffer_size = 2M
net_buffer_length = 256k
read_buffer_size = 2M
read_rnd_buffer_size = 2M
myisam_sort_buffer_size = 64M
# this will allow for larger tables (>4gb of data)
myisam_data_pointer_size = 8
thread_concurrency = 4
datadir=/u1/mysql/data
max_heap_table_size=450M
log-slow-queries = /var/log/mysql/mysql-slow-queries.log
long_query_time = 1
wait_timeout = 30
thread_cache = 150

Technorati Tags:

Comments

Fantastic

I too had a large volume of INSERTS into a mysql database, and it was running far too slowly.

I had no idea you could chain INSERT statements, so thank you very much for a great guide to optimization!

Load data infile

If you have mot already then you might want to take a look at LOAD DATA INFILE as it is very speedy at that sort of thing.

thanks

I hadn't come across that before (I'd assumed that piping stuff to the mysql command would be as quick anyway).

(The article is now relatively old; I'm in the slow process of converting howto's I wrote using flexinode into drupal pages, so the formatting can be re-applied (else they look horrible)).

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
We don't take kindly to automated nonsensible adverts around here.