liquibase bulk upload

It should be possible to speed up liquibase’s loadData CSV import significantly with a few change.  Currently it loads the entire CSV file into memory before processing it which is not necessary.  Also, most JDBC drivers support bulk imports which we are not using either.  


I’m planning on implementing those changes, and anything else profiling turns up in 2.1, but if anyone wants to submit a patch I would appreciate it.


Nathan

Hi,


We would like to use liquibase to handle large file insert. We have dozens of changeset files averaging around 75mb. What is the best way to have liquibase run these changesets? We currently receive out of memory errors. Also, the performance of liquibase is quite slow. It takes over an hour for a single 75mb file to go to a local mysql database.


As to why we are using liquibase, the validation and verification of liquibase is a nice feature. Especially as this is static data its nice to use liquibase to manage all of it.

What are the methods to properly clear the resources?


So far I’ve tried the following, though the memory is being held up somewhere


                    ExecutorService.getInstance().clearExecutor(database);

                    ChangeLogParserFactory.getInstance().getParsers().clear();

                    ChangeLogParserFactory.reset();

                    cleanup(database);

                    database = null;

                    liquibase = null;

                    resourceAccessor = null;

There isn’t any work in progress so far.  I don’t have a release target for 2.1 yet.  Still making sure 2.0.1 is nice and stable before starting the next round of features.  


I think that CSV import will be able to be made considerably faster than XML. It also has the advantage of being more readable (IMHO).  The reason that CSV can be better optimized is that the way liquibase works is to parse the entire changelog into a single in-memory representation which is then sent to the changelog executor.  If all the data is in changesets, it all needs to be included in the in-memory changelog.  With CSV files, all that is in the changeLog itself is a reference to the .csv file which is read and inserted at run time.  It is that reading and inserting of the CSV file which can be optimized.


That being said, it would be worth running liquibase and your changelog through a profiler such as the one that comes with netbeans.  I have found that performance problems are rarely from where you expect.  There could be other parts of the liquibase code that are causing the bulk of the problem in your case.  Without profiling, you really don’t know for sure.


Nathan



Thank you. Is there a patch in progress? Perhaps I can take a look and possibly help. 


Also, when is the target for 2.1?

Also wanted to point out. We arent using CSV. We are actually using liquibase xml changeset files. Is there a way this mechanism can be sped up/improved?