Creating insert data without hardcoded IDs

Hey all,

I would like to allow Liquibase to fully manage not only database migrations, but also the data within it.

I have a question pertaining to the usage of keys and foreign keys when using Say I have a table ‘b’ that references another table ‘a’.  In insert syntax, i have to do something like this in changeSet id=‘1’:

That’s fine for initial production data.  But say that tables a and b will, once run in production, have more data that ‘pollutes’ available IDs. 

So say in changeSet 2, I want to add another entry into table a, and would like to also add another entry into b that references that table a.  How would I know what id to use, safely?

I’m trying to avoid using sql directly.


I have usually solved the problem by dropping to sql.  Something like “insert into b (a_id) select id from a where …”

We don’t support storing state from one change to the next since we need to support updateSQL mode where changes will not be run against the database at all.

You could create a custom change or extension that would select max(id) from the table before inserting, or it may work if you have valueNumber="(select id from a where…)" as a parameter.


Hey Nathan,

Thanks for the suggestion.

I think the best solution for me, given that I want to avoid SQL, is to create a custom precondition, which will be responsible for inserting the right data…  that way I can leverage all my database service layer + hibernate in populating the db.

Be curious as to what you think. 


A precondition is probably not what you are looking for.  They are designed to be executed before the changes and simply return a true or false value.  A custom Change class would allow you to have access to your database and would allow inserting data. 

I’m not sure if you would be able to leverage your data service layer or hibernate as most of the time liqubase needs to be run first in order to get the database into the state that hibernate expects.  You may be able to do it depending on your usage, however.


I understand what your are saying now, thanks. Aside from just being wrong (using preconditions), there was no way to do rollbacks.

I see that the CustomChange interface wants SqlStatement[];  my Hibernate powers are not great enough to generate SQL statements instead of actually communicating with the database. 

So, I’m thinking of flipping the responsibility of control to my own code. 

Instead I’m thinking of using the core liquibase object; specifically, using ‘update(int changesToApply, String contexts)’ and listUnrunChangeSets(String contexts) to iterate through changes, and define my own mechanism of knowing when to do custom, hibernate-driven inserts.

I’d also have to implement the same, in terms of rollback; but all possible.

As always, I would be interested to hear what you think about the approach =P

The SqlStatement array is a liquibase object.  There are a bunch of built in ones like InsertStatement, CreateTableStatement, or you can just do new RawSqlStatement(“insert into …”)  Liquibase will handle sending it to the database etc.  There is also a CustomTask interface that just passes the database object and you do whatever you want against the database directly.

I believe there is also a mechanism for defining how to generate rollback statements, but I don’t have the docs or code in front of me at the moment.

Using the liquibase object should work as well.  We have tried to give you enough control to do whatever you want.  If you can get your code to work within the standard liquibase update cycle it would mean less external coding that you will have to do, however.

The upcoming 2.0 release (beta 2 currently is available) give more control yet, allowing you to override the normal insert change and create full change implementations, not just CustomChange implementations.  The final release will hopefully be toward the end of the month to early sept.


Thanks Nathan,

Since we have so much time & engineering invested in our Java APIs around our data layer, it makes alot of sense for us to use Hibernate for the data populate.  I have no idea how I go from hibernate method calls to SqlStatements–or if it’s even possible.

So, the ‘I control liquibase’ approach probably makes the most sense.  Also, between ‘update(1, context)’ calls, a transaction open and closes, which is actually good; that way when I issue a hibernate call, I should operate within my transaction after liquibase has done it’s work.


Sounds good.  That is why we try to provide a good API, there are so many variations of what works well for different people. 



I think this was a bad path to go down. 

The reason is this… the data layer APIs we build are of course not versioned… there are implicitly the latest version of the product (HEAD/TRUNK, whatever).  So if in ChangeSet 1 I do some hibernate calls, that’s fine and dandy.  But if in ChangeSet 2 I make some schema changes, then that implies I made some changes to my hibernate layer to mirror those schema changes.  So, then, the hibernate API calls for ChangeSet 1 wrong because they expect the schema post ChangeSet 2.

Guess I just need to go do some SQL.

That was what I was concerned about.  SQL isn’t that bad…

Well, I’ll say this about SQL:

  1. Hardcoded IDs are an issue, as mentioned at the beginning of the post. (I wish I could name any insert statement with a ‘friendly’ name, that is a unique key
    to that insert and usable in other XML statements in liquibase)
  2. With raw SQL, you must be careful about database inconsistencies (i.e., I’d sure like Hibernate to do that for me. That’s not a trivial thing necessarily…)
  3. For test data, if you want to generate a bunch of data using some sort of algorithm, you have to first generate SQL with ur algorithm.  Not straightforward when compared with using data layer APIs.
  4. Rollback isn’t for free (but to be fair, the same is true with using a data layer API)
  5. Certain sql statements aren’t easy to hand-write.  For instance, a hash-coded password that uses salt or something of that nature may be handled perfectly by your data layer API, but by hand, you end up running your data layer API once and grabbing the data values for the statement via the debugger or some other way.

I like what Liquibase does; don’t get me wrong.  This is a sore point with any db tool, as far as I’ve seen.

Three alternative suggestions:

  • Start numbering from the tail end of your primary key field, to avoid conflicts with auto-incement.
  • Start auto-increment at some ‘enough’ number, like 100000, so that 0-100000 are under your control.
  • One could create a column on every table that’s basically a change_set handle. It’s sole purpose for being is to uniquely identify columns for change sets.  In mysql, if this was varchar(255) (which I really like–that way I can create change sets that are readibly), then any field that doesn’t use this (created programmatically) would waste 1 byte per row.  Not perfect but perhaps not terrible for many apps.

Hopefully my last post on this topic, with the solution we should be taking:

We will create a table solely for this problem scenario, call it say ‘CHANGE_HANDLES’.

column ‘entity_id’ bigint not null
column ‘handle’ varchar(255) unique not null

The idea would be that, when you do an insert in an auto-increment table, you would then get that ID in a second insert (maybe via mysql LAST_INSERT_ID, or some other way of getting the latest ID, probably & unfortunately db specific), and save that ID in the ‘entity_id’ with a ‘handle’ of your choosing. From then on, you can ‘select entity_id from change_handles where handle = ‘SOME_ARBITRARY_NAME’’ as the value of inserts that need to refer to this entity ID, or rollback statements could too.

We decided to not normalize this table, so that we can have just one table referring to the ID of any other table in the system (I really don’t want create one of these tables per table in the db just for this change set ID issue).

I doubt it’s the last post, it is a very difficult problem :slight_smile:

The CHANGE_HANDLES isn’t a bad idea, but I generally don’t like requiring an addition to the schema to support any liquibase features, but there are times that it is necessary.  As an extension rather than a core feature, I feel much better about putting requirements on the end users because it is their choice to use the extension or not. 

If you are implementing something like this and could add it to the extension portal, that would be great. I know it is a problem many people have.  If you cannot, I will keep it in mind and maybe try to implement something around it in a few months.