Why does the change log contain the file name?

ekupcik · March 16, 2010, 2:59pm

Hi,

i just figured out that is a bad idea to use absolute filenames with Liquibase as they are stored in the change log and when i tried to run an update with the files being in a different directory then liquibase tried to run all the changesets again.

And now i wonder why the ID, the author and the hashcode are not enough. What is the reason for using the file name too?

nvoxland · March 16, 2010, 2:59pm

It is so you do not need to worry about unique ids across multiple changesets by the same author. There are many people who simply use an incrementing integer for their id for simplicity, and that works fine within a single file, but when you have multiple you need to start worrying about duplicating the id between them. The hashcode is not used as part of the identifier, it is used to detect if a given changeset has changed since it was originally ran against the database.

Nathan

ekupcik · March 16, 2010, 2:59pm

But doesn’t it mean that it is a bit dangerous to use Liquibase on files that are in the filesystem and not in the classpath?

User 1:

cd …/app_home/bin
…
liquibase …/changelog/update.xml

User 2:

cd …/app_home/bin
…
cd …/changelog
liquibase update.xml

Bad idea…

The documentation says that the filename is a part of the signature. It says nothing about the path (absolute or relative). This becomes less of a problem when the XML is contained in a JAR file. But even then it might become a problem if you want to refactor something (rename the package, split/merge files etc.) So i wonder whether it would make sense to add a switch that allows you to ignore the filename and/or update it. Maybe something like --clearCheckSums ? I understand that this wouldn’t work for those who haven’t used unqique IDs but it would solve my problem

nvoxland · March 16, 2010, 2:59pm

You can always include any arbitrary directory in the filesystem into your classpath and allow them to be relative. I updated the documentation to say “path” not filename, that is a good suggestion.

On the root element, there is a logicalFilePath attribute. That attribute allows you to control what liquibase considers to be the filename/path of the file, regardless of the physical location. So if you ever need to move a changelog file for some reason, you just need to set the logicalFilePath attribute to the old filename/path.

Splitting and merging files gets trickier. I thought about adding a logicalFilePath attribute to the changeSet tag, but if you have large changelogs that would be too many to set and would be too error prone. My theory has been that there is not really any need to split or merge changelogs. If you feel yours is getting too big simply create a new one and use the tag. If you really wanted to, there is a precondition you can use to control if a changeSet would run based on the old id/author/filepath.

Also, remember the databasechangelog table is a regular table in your database and you can do whatever you feel safe doing to it outside liquibase as a last resort. ClearCheckSums simply sets the md5sum column to null. In your case, you could run some update statements to set the filename where the filename=what it used to be.

Nathan

ekupcik · March 16, 2010, 2:59pm

Thanks for the information

The reason why i wanted to split/merge files is not that i am afraid of the size. I am already using includes and one file per DB Schema version. But because of the problem how the validation works today (http://liquibase.jira.com/browse/CORE-508) i wanted to use a workaround where i move certain changes to separate changelog files and use multiple main changelogs

version-1.xml
version-1-oracle.xml
version-2.xml
…

main.xml includes version-1.xml, version-2.xml, …
main-oracle.xml includes version-1.xml, version-1-oracle.xml, version-2.xml, …

I wanted to merge the files again once the problems with the preconditions and validation are solved. But it looks like that this is not possible, or at least it is not as easy as i thought it would be. But i’ll have a look at the things that you suggested.

ekupcik · March 16, 2010, 2:59pm

It looks like “logicalFilePath” is exactly what i need. Thanks for pointing me in the right direction.

Now i just wonder whether it could have any side effects if i use the same logical path in mutiple files. I already use a naming covention for the changeset IDs that ensures that they are unique. I ran i short test and everyting works fine but i think i’d better ask.

nvoxland · March 16, 2010, 2:59pm

It should be fine having the same logicalFilePath in multiple files. It should only be used for determining what is queried and stored in the databasechangelog table.

Nathan

pkabus · March 16, 2010, 2:59pm

I am new to Liquibase and while playing around with it I soon ran into the problem that the same changes were applied twice because of different file paths.
I see that there is a way to prevent this, but I still think using the path as a part of the change identifier is a rather dangerous default. We have many people applying changes to our databases ( developers working on development DBs, DBAs working on testing, consolidation and production DBs ) and if one of them forgets to set the path correctly, this could be quite dangerous to our data.
Or am I missing something?

taranenko · March 16, 2010, 2:59pm

Pat,
apply some organizational efforts and setup a central repository to keep all database-related activity in one place. I.e. create a separate project controlled under SCM (svn/git/etc…) Manage all people before execute any sql/liquibase scripts add one to the scm. Let the scm to use post hooks for added new scripts, that you could be notified about. At this point you could manually check pending scripts and allow or forbid ones for execution. If you have a huge project you can also take care of automatic checking of the appropriate attributes of the scripts.

Hope this helps, Oleg

pkabus · March 16, 2010, 2:59pm

Oleg,

there is no way for us to enforce that all DBAs always submit their changes to an SCM to check them before they are applied. They will hopefully agree to use a tool like liquibase, since it will make their life easier. Anyway, writing automatic attribute checks is probably more complex and error-prone than simply modifying the source code of liquibase.
I still don’t see why this behavior is not configurable and I haven’t seen any project where including the path is beneficial rather than dangerous.

nvoxland · March 16, 2010, 2:59pm

Even if we didn’t use the file name as part of the identifier, you would probably still run into the problem you are describing because the id and author tags would be different, and so liquibase would still see them as different changes. Liquibase does not do any sort of “does this look like this type of change has already been applied” logic and assumes that you have come up with a process that works for you to manage your change flow. It just has an identifier for each changeSet and sees if that identifier has been ran. The id/author/filename format is set up to ensure that duplicate identifiers are not accidentally created. It is sort of a manual GUID algorithm.

Nathan

pkabus · March 16, 2010, 2:59pm

AFAIK ( tell me if I’m wrong ) the file name is set automatically every time you apply an update if you do not override it. So if you forget to explicitely specify it, it will very likely be different every time someone else performs an update. The main reason why we want to use Liquibase is that updates that are included in different branchens are not applied twice on the same schema. If we include a change into a hotfix patch for a production database, we need to include it also into the development branch. Once the development branch goes productive, we need to apply all changes. But of course the changes which have already been applied as a hotfix should not be applied again.
So if two different people apply the hotfix and the changes for the new version, the updates will be applied twice.
However, ID and author do not change automatically, so I don’t see a problem here.

taranenko · March 16, 2010, 2:59pm

Thinking how this collision could be resolved and preserve current environment from breaking, I suppose to implement a new option on the execute level of the liquibase.

liquibase ... update ... -DlogicalFilePath=stripPath this setting would strips the path from the real file name and use it as "third" coordinate for every changelog input files. logicalFilePath attribute, explicitly defined in the xml has a precedence of course.

liquibase ... update ... -DlogicalFilePath=fullPath - is current and default behavior.

Oleg

pkabus · March 16, 2010, 2:59pm

I think using the file path as part of the id should be an option for those who need it.
Under the assumption that people make mistakes ( and I think this is a valid assumption and “making no mistakes” is not something you can enforce by any policy ), the default behavior shouldn’t pose the risk of somebody corrupting your data by mistake.
Applying an update twice may corrupt your data ( not all updates are idempotent ) and in the worst case it may go unnoticed since it doesn’t necessarily produce a visible error. Until you find out it may be too late.
If by default the path wasn’t a part of the id and you forgot to specify it explicitly, the update would simply fail because of a duplicate id. Nothing breaks, you could easily fix the problem and apply the update again.
So as far as I can see, the current behavior is quite dangerous. By making the file path an optional part of the id you don’t lose any functionality but significantly improve robustness.

taranenko · March 16, 2010, 2:59pm

Originally posted by: Pat
So as far as I can see, the current behavior is quite dangerous. By making the file path an optional part of the id you don't lose any functionality but significantly improve robustness.

Generally I’m agree with you. But excluding file name at all may also be dangerous. Do not forge about change sets, that has the attribute runOnChange=“true”. This attribute switch off the hash sum checking, and may accidentally fire undesired change sets from another file without notification of the user. It is pretty more elusive and worst bug.

From my side I’d suggest do liquibase … update … -DlogicalFilePath=stripPath behavior by default.

liquibase … update … -DlogicalFilePath=fullPath would be as an option not to break environment who really need it.
As for me I’m setting logicalFilePath always explicit.

May be add a poll to enumerate LB user using the full path to differ change sets?

Nathan, you opinion?

nvoxland · March 16, 2010, 2:59pm

Your file name+path should be able to never change, especially if you are using changelog files in a classpath style or relative path setup. If you use absolute paths you can get into trouble, which is why they are generally not recommended.

Rather than modify the liquibase core to support stripped pathnames, I would suggest creating it as an extension (http://liquibase.org/extensions). You should be able to just override the MarkChangeSetRanGenerator class, and maybe a method or two on the database classes to implement it. Then anyone who does not want the paths in the changelog file can just include the extension jar in their classpath.

Nathan

pkabus · March 16, 2010, 2:59pm

using changelog files in a classpath style or relative path setup

Probably that should be explained in the "Best Practices" section, I am not exactly sure how to do that.

nvoxland · March 16, 2010, 2:59pm

Good point, I’ll get that added in.

The classpath idea comes from java, where libraries and classes are deployed in a folder or set folders which are referenced by the virtual machine when it starts up. Within the code, you reference classes and files with relative paths and java checks all the folders in the classpath for the file.

With liquibase, you can use the --classpath parameter to set what directories to check for referenced changelogs. For example, if you have a changelog file at /home/asdf/db/changelog.xml, you can run “liquibase.bat --classpath=/home/asdf/db --changelog=changelog.xml update” and your database will be udpated with the changelog.xml file in /home/asdf/db but the filename used in the databasechangelog table is “changelog.xml”" You could also call “liquibase.bat --classpath=/home/asdf --changelog=db/changelog.xml” and you will get an update wth “db/changelog.xml as the filename”.

How you choose your root folder(s) for the classpath would depend on your particular needs and project structure.

Nathan

pkabus · March 16, 2010, 2:59pm

I was just asking because I already tried using the classpath but it didn’t work for me. But now I know the reason: I listed both, the JDBC driver JAR and the changelog dir, in the classpath. I tried to separate the paths with “;” and “:” but none did work. If I put the driver JAR into the lib directory so that I don’t need to include it in the classpath, it works.

ljnelson · March 16, 2010, 2:59pm

Ancient thread, I know, but I wanted to add to this discussion the little-known (and possibly undocumented?) fact that logicalFilePath is also present on the changeset itself.

Best,
Laird

–
http://about.me/lairdnelson

Topic		Replies	Views
changelog filename in database contains full OS path? General Discussion	10	2816	January 28, 2013
filename of change log is criteria to select updates General Discussion	2	384	April 6, 2010
Requesting a changeset ran identifier adjustment General Discussion	2	409	May 19, 2010
sql formatted changelog file and "filename" attribute General Discussion	3	833	January 27, 2014
Several identic changelog files (but from different folders) for one database General Discussion	3	786	August 19, 2010

Why does the change log contain the file name?

Related topics