load data csv performance

joshjdevl · February 21, 2011, 12:58am

While using liquibase loaddata (http://www.liquibase.org/manual/load_data) we are noticing performance degradation. For a 30-40mb file it takes upwards of 1.5 - 2 hours to load to a local mysql instance. If we use mysql load data infile it takes less than a minute.

What can be done to tune/enhance this process? The attraction of using liquibase is that it works across multiple databases. Though we are forced to fall back to native vendor specific sql due to the amount of data we need to bulk load.

nvoxland · February 21, 2011, 12:58am

There are the changes listed in your other thread (http://forum.liquibase.org/#topic/49382000000024035) around not loading the entire CSV into memory first and/or doing batch inserts.

The slowdown may also be due to indexes on the table. Standard inserts (which liquibase uses currently) evaluate the indexes and constraints after each index whereas load_data will load it all in and then run the constraint checks and index building. You may be able to improve performance considerably by adding a call before and after the load data to disable/enable constraints and indexes. It would be good to get those calls into the standard loadData tag, but they are not in there currently.

Nathan

joshjdevl · February 21, 2011, 12:58am

Thank you for your reply. We really appreciate the features of liquibase.

My concern is what is happening under the hood. I’ve tried dropping the unique constraints and foreign keys, though to no avail.

Because several databases (oracle, mysql, derby,etc) offer commands to load bulk data, my expectation would be that liquibase maps to the vendor db’s command. Though that doesn’t seem to be happening. I just am curious how much optimization can be done on the liquibase side to have performance on par with the db vendor sql. At least looking at the code I didn’t see for example mysql “load data infile” anywhere.

Is there a particular reason that the db vendor sql isn’t executed rather than liquibase doing it’s own insert statements?

nvoxland · February 21, 2011, 12:58am

It’s mainly done because (prior to 2.0) vendor-specific versions were difficult to manage, so I tried to keep everything as standard as possible. Now that 2.0 is out, we can go back and start adding in more vendor-specific versions either in the liquibase core or as an extension.

There may be some logic in the loadData, however, such as supporting changeSet parameters in the CSV file and converting values such as dates to the correct database type that requires at least some pre-processing of the file before passing it to the database. Nothing that cannot be done, just not something I have gotten to yet, unfortunately.

Nathan

Topic		Replies	Views
liquibase bulk upload General Discussion	5	3068	February 15, 2011
Liquibase Loading Data Performance General Discussion	2	1692	May 16, 2019
Loading large CSV files is hugely inefficient and slow Liquibase Development	1	1026	June 22, 2016
Maximum csv file size for loadData General Discussion	2	2508	March 11, 2015
What does loadData utility uses for Oracle? Liquibase Development oracle , sql , loaddata	6	1004	September 29, 2022

load data csv performance

Related topics