The changeset is using loadData to import a csv file. The csv file I’m importing is 650Mb. Whenever I run the update, I get OutOfMemoryErrors (after a couple of hours of waiting).
Both the changeset and csv were generated with generateChangeLog. So full marks to Liquibase for at least being able to extract the large table!
I’m running with -Xmx2048m, and Liquibase 3.3.2.
I’m considering breaking the csv into multiple smaller files and multiple changesets, but I’m unsure how small the csv’s would need to be. Does anyone have any info on the largest csv filesize Liquibase loadData can handle?
Has anyone else encountered and resolved this kind of problem?
Alternatively I can exclude this table from my Liquibase scripts as it’s the only one causing issues.
The reason for doing this is that I’m migrating a legacy application to Liquibase, and it would be convenient if I could build the entire database from liquibase, rather than using multiple tools. The main benefit Liquibase gives me is versioning with changesets, and running an import from another tool won’t easily fit into the changset model.
Currently my legacy application has many unmanaged and unversioned database snapshots, which makes it very difficult to run repeatable CI cycles of: develop, build, test, release, deploy.
Like you say, there are more efficient ways to load large datasets, and I may need to take that approach instead.
Seems like something that should and could be fixed in Liquibase.
Just curious - why you are using Liquibase to load that much data? What is the use case?
I would recommend that you use native tools if possible when loading that much data into a database. Liquibase is really intended for managing the structure of a database rather than the contents. It is able to work with the data also, but it is intended mainly for loading small sets of data - tables full of constants for example, or small test data. Since Liquibase works at the JDBC level, what it ends up doing is generating tons of INSERT statements, which is an extremely inefficient way to load bulk data.