load data csv performance

While using liquibase loaddata (http://www.liquibase.org/manual/load_data) we are noticing performance degradation. For a 30-40mb file it takes upwards of 1.5 - 2 hours to load to a local mysql instance. If we use mysql load data infile it takes less than a minute.


What can be done to tune/enhance this process? The attraction of using liquibase is that it works across multiple databases. Though we are forced to fall back to native vendor specific sql due to the amount of data we need to bulk load.

There are the changes listed in your other thread (http://forum.liquibase.org/#topic/49382000000024035) around not loading the entire CSV into memory first and/or doing batch inserts.


The slowdown may also be due to indexes on the table.  Standard inserts (which liquibase uses currently) evaluate the indexes and constraints after each index whereas load_data will load it all in and then run the constraint checks and index building.  You may be able to improve performance considerably by adding a call before and after the load data to disable/enable constraints and indexes.  It would be good to get those calls into the standard loadData tag, but they are not in there currently.


Nathan

Thank  you for your reply. We really appreciate the features of liquibase.


My concern is what is happening under the hood. I’ve tried dropping the unique constraints and foreign keys, though to no avail.

Because several databases (oracle, mysql, derby,etc) offer commands to load bulk data, my expectation would be that liquibase maps to the vendor db’s command. Though that doesn’t seem to be happening. I just am curious how much optimization can be done on the liquibase side to have performance on par with the db vendor sql. At least looking at the code I didn’t see for example mysql “load data infile” anywhere.


Is there a particular reason that the db vendor sql isn’t executed rather than liquibase doing it’s own insert statements?

It’s mainly done because (prior to 2.0) vendor-specific versions were difficult to manage, so I tried to keep everything as standard as possible.  Now that 2.0 is out, we can go back and start adding in more vendor-specific versions either in the liquibase core or as an extension.  


There may be some logic in the loadData, however, such as supporting changeSet parameters in the CSV file and converting values such as dates to the correct database type that requires at least some pre-processing of the file before passing it to the database.  Nothing that cannot be done, just not something I have gotten to yet, unfortunately.


Nathan