Wednesday, November 19, 2008

DB optimization

Someone asked me to help him optimize a SQL operation. He said he is using SQLite database but his query is very slow. He tried indices as well, but it takes 1 hour to insert 20 users into his database. It's just 1 minute to insert 7 users without indices. Finally he sent me a query that runs awfully slow (for hours):

SELECT SUM(size) AS s FROM (SELECT DISTINCT hash, size FROM files);


If you define an index on a column, it takes time to maintain it. In this case it took a lot of time to insert new data into the index structure (usually something like a B* tree). He had to insert hundreds of users in one batch. It can worth disabling or deleting indexes before inserting such an amount of new records once. You can recreate or regenerate the indices after running the whole batch.


The SQL query above ran very slow because of the DISTINCT keyword. I advised to try it without DISTINCT and the query ran very fast.


Keep in mind that indices can make insert operations very slow and projections and joins can make a select very slow. It worths to measure performance and use real usage statistics to tune your database and queries.