What reason is there to use a changeset for each database statement ?

(Firstly, I apologize for my bad English)
(Secondly, congratulations for this fabulous tool)

I don’t understand the reason to write a different changeset for each database statement. Why to put different ids over the same author and the same install/update/rollback activity?

In software version control like svn, I think the changes done by a developer are referred to unique id. These changes can affect to many methods, many classes and many files but while they are committed in the same activity, they always have the same ID.

In my case, I use liquibase Ant tasks to install/update/rollback a DDL software unit and I need to do all the changes (sometimes hundreds of statements) in a transaction, when I use generateChangeLog (to generate liquibase file over hibernate mappings) and diffDatabaseToChangeLog, these processes generate changeSets with theirs different id for each statement.

Problems:

  • It’s very awful and error-prone to generate sql script (updateDatabase / rollbackFutureDatabase) with lots of changes on databasechangelog table.
  • If a DBA wants to view and optimize the generated sql file, I think he is confused to observe each databasechangelog statement (with its different id) and this situation obligates DBA to know about liquibase operation.
  • The databasechangelog table is left in awful and confused state. The table is filled with tens or hundres of rows with only one activity. In my case, I only want to annotate the activity with a simple databasechangelog row (author = DDL software unit name, id = version of the software unit, filename = any).
  • LiquiBase attempts to execute each changeSet in a transaction that is committed at the end, but I want to do all the activity in a transaction, not each statement.

My solution:
To put a option in the liquibase ant tasks (generateChangeLog / diffDatabaseToChangeLog) to can choose the two options:

  • only one changeset for all the activity.
  • a changeset per statement.

The limit of what should be in a changeset is what can be contained in a single transaction.  Each changeset tries to run as a single transaction with the following flow:

  1. start transaction
  2. run all changes
  3. insert into databasechangelog
  4. commit transaction

That works fine until you get the transaction auto-committing during step #2.  For example, if you have a changeSet with two create tables and for some reason the second create table fails, you will be in a state (on most databases) where your liquibase updates will always fail because table 1 has been created in the database and so the create table change will fail, but it cannot be marked as ran because the second table is not created.

If your changes are not auto-committing (normally just insert/update/delete statements) you can include as many of them as you want in your changeSet, and you often want to include many to perform them atomically.  But for generated changelogs, we don’t want to assume what operations are atomic and which are not.  I was thinking that the generateChangeLog actually collapses the insert tags into a loadData tag from csv in a single changeSet, and there should not be any update or delete statements generated.

Does that make sense?

Nathan

Sorry, I am a noob in database.

I didn’t know DDL statements (CREATE - DROP - ALTER) are auto-committed so I didn’t understand why a set of DDL statements cannot be in the same changeSet.

Now, all is clear for me.

Many thanks.