Looking for Feedback on Liquibase 4.0 Roadmap

While I understand the convenience of

  • “Move non-core database support to extensions
    • What are core databases? I would suggest mysql, pgsql, oracle, mssql, db2”

I would consider pushing ALL DMBS support to extensions. This may make it easier for installations to create their own rules around whatever dbms they use and also make it easier to support major versions of DBMS product as separateextensions.

As I am wrapping up 3.3.x changes and looking forward to what changes and features people are asking for and what I would like improved, I am thinking it is time to look at larger changes around a Liquibase 4.0 release.

My most-likely overly-ambitious goal for Liquibase 4.0 is to do a major housecleaning of the Liquibase codebase to simplify it while increasing testability and test coverage. There have been changes in scope and requirements over the last 7 years and the codebase is getting a bit over-complex and haphazard. This makes it harder for me to maintain and a barrier to entry for contributors.

Previously I thought about trying to break this into a few smaller blocks/themes and focus on one for 3.4, one for 3.5, etc. until they are done and have 4.0 be just the final cleanup “no more changes” release. After doing some work with the testing and the snapshot logic, however, I think they all bleed into each other enough to make it much easier to do a single major release that just breaks everything.

The jump to 4.0 will signify compatibility-breaking API changes, but there should be no changes to the changelog format: 4.0 should be a drop-in replacement for anyone just using a changelog and not writing extensions etc.

The major themes/changes I’m looking to make are:

Improvement of how state is managed and high-level functions are called

Currently there is the liquibase.Liquibase façade object that wraps many common functions with method parameters, a variety of “configuration” objects (such as DiffOutputControl), and a heavy use of singletons. Ant/maven/command line etc. call out to the Liquibase façade as best they can.

The problems with this setup include:

  • Liquibase facace is getting overly large and complex
  • Difficult to add/override standard command functionality in extensions
  • There is a lot of duplicated setup/validation logic in

    ant/maven/command line/etc. before calling the façade objects

  • Singletons not always cleaned up between calls and/or run into

    each other

  • Difficult/impossible for extensions to rely on configuration that

    does not happen to be on the method parameters passed along

  • Method signatures get long but still don’t have all parameters we

    sometimes need

Planned changes:

  • Switch to using “Command” objects in favor of a monolithic Liquibase façade
    • Allows command logic to be encapsulated and more easily extended

      including validation and setup

    • Allows new commands to be written within extensions and exposed

      through command line, maven, etc.

  • Create a hierarchical “Scope” object
    • Works similar to AngularJS $scope where a root container object

      is created and passed along to sub-methods.

    • Along the call chain, new attributes can be added that are only

      visible to methods further down the call chain

    • Root Scope object created as part of the Command execution and builds from there
    • Replace use of singletons with objects added to the Scope
    • Ideally we can access the scope object without needing to

      include it in every method signature, but unsure on a good

      implementation yet

    • Configuration objects available from Scope object

Improved Testing/Testability

Liquibase currently doesn’t have a great way to handle testing of the interaction with the database. Traditionally, I’ve had the liquibase-integration-tests module which mainly uses a set of changeSets with example changes and scenarios and preconditions to test that they ran successfully.

Problems with this setup include:

  • Slow to execute
  • Contributors need special database setup to test and so cannot

    normally run the tests to validate their changes

  • Preconditions are not designed to be an assertion library
  • No structure to tests to know what is tested and what is not

Beyond the database interaction tests, there is some unit test coverage but not enough

Planned changes:

  • Fully implement my new “VerifiedTest” framework. The idea is

    that

    each test creates a simple text description of the interaction (such

    as the SQL string to execute) and a way to validate the test passed.

    Previous test run text descriptions and the results are stored in a

    markdown-formatted file. On each test run, the text description of

    the code is compared to the last run and if they are the same (most

    of the time) they are just marked as passed. If they are different,

    the validation code is used to make sure the new version is still

    correct and then the markdown file is updated. If the validation

    fails, the test fails. If the validation cannot run (database is unavailable) the markdown file is updated but marked as “not validated”

    • The hope is to allow for database tests in a more standard test

      framework, allow most integration tests to run in unit-test speed,

      allow contributors to know when they have potentially broken

      things even if they don’t have the database to test against, and

      provide a way for me to see how the interactions have changed from

      within the pull request system

  • Finish move to Spock testing
  • TDD develop new and changed 4.0 functionality to increase general coverage
  • Use generative testing to ensure all permutations are tested, both

    with standard unit test and “VerifiedTest”

Use more 3rd party libraries

In the past, I’ve tried to avoid the use of 3rd party libraries in order to avoid jar-hell for people using Liquibase. I think there are few places where there have been enough convergence on a “standard” and/or isolated-enough use cases that I should introduce some 3rd party libraries in order to simplify my codebase. In particular:

  • SLF4j instead of custom logging wrapper over java.logging
  • Apache Commons-CLI: Only really needed if running command-line

    version where jar-hell doesn’t really matter since it’s more of a

    packaged application

  • Considering but not decided (need to research more, don’t want to

    cause issues for users with different versions or technologies)

    • Serialize/Deserialize logic? Need to research options for XML

      and/or YAML/JSON

    • Dependency Injection/Class finding/Classloading logic? Maybe spring? Maybe OSGi?

Improve database snapshot functionality

Database Snapshot support was never really central to Liquibase, but it has become more and more used within Liquibase. The current implementation is overly complex and the way it abstracts the logic for extension doesn’t really fit with how extension is happening in real life leading to performance issues and excessive code writing. Furthermore, testing is slow and difficult to impossible.

Planned changes:

  • Change snapshot algorithm from starting with a single object

    (schema, table, etc.) and then recursively finding related objects

    to a process where we first just fetch all objects in the database

    and then connect up those objects in memory if needed. The base

    object to snapshot (a schema, a table, a column, etc.) is still

    passed to each of the fetch methods which can limit what is read

    from the database if it so chooses, but it will be a much less

    convoluted process

  • Ensure the snapshot interfaces and

    base classes do not make JDBC

    or even RDBMS assumptions. The

    snapshot process should be able to

    handle non-traditional

    “databases” such as hibernate mappings,

    changelog files, mongodb

    and other nosql database, etc. Ideally it

    would even be able to

    snapshot non-databases such as server

    configurations although

    that is less important.

  • Ability (or at least API hooks)

    to support data diff

  • Add a way to specify subsets of items to not snapshot. For

    example, don’t include tables that match the name “ADM_.*”

    • Needs to be able to do something like “snapshot all objects but

      only diff the data in “.*_lookup” tables and only include

      tablespace information for “.*_lob” tables

  • Use VerifyTest framework to ensure good testing of the snapshot functionality
  • Better model the connection between primary keys, foreign keys, unique constraints, and indexes

Improve Change and SQL Generation logic

Currently we have Change classes which represent what can be in a changeLog file. These generate one or more Statement objects which are a lower-level logical database change. The Statement objects are then fed to SqlGenerator objects which create the actual SQL based on the Statement and the Database.

Planned changes:

  • Remove Change/Statement distinction in favor of a more general

    purpose Action class. The current Change and Statement objects are

    mainly duplicates of each other and there doesn’t really need to be

    a distinction.

    • The new Action classes will also include things that are

      currently outside the scope of the Change/Statement objects such

      as the metadata lookup. Bringing the metadata lookup into the same “Action” framework will allow us to have just one code path for all “I want to do X against this database” logic

  • Change most SqlGenerator logic from building up SQL strings programmatically to using simple text files with templates of the SQL that can be filled in

Improve Cross-database Logic

Both update and snapshot logic currently have issues with cross-database functionality.

Data types is the major problem:

On update, sometimes people want to be able to specify a simple type like “text” and have that mean “clob” on one database and “nvarchar(max)” on another. Other times, people want “text” to mean “clob” on one database but “text” (not nvarchar(max)) on another. Or “int” should be the database default “int” on all but oracle where it should be number(23). Then there are boolean types where some databases don’t support Boolean so you need to use “bit” but you also sometimes need to specify a actual “bit” type which isn’t used as a boolean and so should be tinyint on the db that supports Boolean but not bit.

On snapshot, if you are comparing a mssql and a mysql database should you mark the columns as different if they data types are int vs. integer? What about nvarchar(10) vs varchar(10) when one doesn’t support nvarchar? Bit vs. Boolean? Text vs. nvarchar(max)? Text. Vs. Text (when mssql’s nvarchar(max) is more like mysql text?)

On generateChangeLog, do you generate generic types or database-specific types?

Case handling is the other major problem:

How should differences in case be handled in comparisons? When should case sensitivity be preserved and when should it not matter?

There are other issues too:

  • Auto-generated names vary, how do we best handle those?
  • If Mysql has an index on a FK column but oracle doesn’t, is that a

    difference to fix since mysql auto-generates the index?

  • If you try to create a sequence on a database that doesn‘t

    suppport sequences, should that be an error? Or expected to fail and skipped?

    • Are sequences different than other non-supported features like non-clustered PKs, full text indexes, etc.?

General Code Improvements

  • Handle multiple active connections
    • ChangeSet/Actions can target different connections
    • Allows multiple databases to be updated in concert
  • Want to further reduce duplication of code between XML and YAML/JSON parsers and serialize/deserialize logic
  • Simple SQL Parser
    • Enough to be able to handle strings vs. keywords vs. objects
    • May be helpful with new “Action Template” functionality
    • May be helpful with etc.

      validation and checksum

  • More granularity on checksum versioning
  • Currently there is a “version” as part of the checksum tag for when I need to make a change to the logic that affects how they are generated, but most often there changes just in individual tags or certain scenarios of certain tags. We need a better way to handle this to make updates more seamless for everyone.
  • UTF8 / Other Charsets
    • I need to better understand charset handling and ensure we are

      handling files correctly.

  • There are some placeholder hooks and naming in place to support

    non-java uses of the code. In particular I was hoping to be able to

    use ikvm to re-compile most of the Liquibase logic for .Net and just

    plug in particular classes to make it better integrate (.net-native

    connections, xml parsers, etc.) There has never been any traction on

    this and I think it should be pulled out to simplify things

  • Improved prepared statement logic: sometimes you need to use

    prepared statements, not simple statements. Liquibase has tried to

    avoid prepared statements and so places where they are needed are

    badly wedged in.

  • Safer modifySql logic: currently the modifySql just does a simple

    string replacement of the SQL, but if/when the generated SQL logic

    changes that can transparently break previously working modifySql.

    Need a way to make this safer

  • Better OSGi support: I don’t really know OSGi well enough to know

    if what we have is good or not

  • Separate SQL logging: Currently most SQL goes through the DEBUG

    level logging but people often want SQL logged but no other debug

    info and/or to log SQL to a separate location. Ensure all SQL is

    logged and handled separately

  • The tag table structure currently doesn’t support multiple tags at

    the same point and doesn’t always track all the changes in a tag

    well. Probably need a separate DATABASECHANGELOGTAG table

  • Refactor the Database API: It is currently a mix of “Dialect

    logic”, connection handling, and more. Some logic should be split

    out, other dialect logic scattered throughout the code should be

    brought into the Database class

  • Refactor ResourceAccessor API: I made some changes with 3.3 but

    ensure the APIs cover what is needed

  • Clean up multi-schema support: There is some support for managing

    multiple schemas but it is not consistently used and supported.

  • Move non-core database support to extensions
    • What are core databases? I would suggest mysql, pgsql, oracle,

      mssql, db2

  • Should not be using “database instanceof MysqlDatabase” etc. Should be using subclassing instead.

New Features

  • Postconditions: Like preconditions but ran before committing

    the changeSet

  • updateReference, rollbackReference, and

    other *Reference commands

    that perform the same logic as the

    normal version (update, rollback, etc) but against the

    “reference” database.

  • Improved DBDoc with an updated skin

    and new features

Infrastructure Improvements:

  • Split SDK from main Liquibase code and improve SDK
  • Improve extension portal
  • Improve generation of doc for website
  • Improve Javadoc
  • Consider Grade vs Maven
  • Testing of Liquibase in Java 7 and 8
  • Not yet ready to drop Java 6 support
  • Liquibase 3 compatibility layer: Is it possible? Is it needed?
  • Move Liquibase blog to github pages
  • Vet all classes with extensions and subclassing in mind

What are your thoughts on the 4.0 feature list? Anything you think

should be added or removed?

Nathan

Pushing everything to an extension is an option. My concern with that is that it adds one more step/dependency for people and therefore a chance for mistake. I run into issues with the few databases that have been pushed off into extensions already where they forget they need the extension. It is also easy to get confusion with version management between the database extensions and the core library.

However, if the core library contains support for nothing, it makes for an easier and helpful error message if there are no Database implementations found.

Also, there is probably confusion already since there is built in support for oracle, mssql and postgres but there are also extensions for them. The built-in support has support for the “standard” liquibase features and the extensions can add extra features such as a command, materialized view support, etc.

Maybe the best approach would be to split out the core databases into separate modules in the main liquibase codebase. So there would be liquibase-core, liquibase-maven, etc. like there is now but then add liquibase-oracle, liquibase-mysql, liquibase-db2, liquibase-mssql and liquibase-pgsql. The current “extension” code could go in those. Since it’s the same codebase, each new liquibase version would contain new builds for the standard database extensions and versions would stay in sync.

That will help force there to be no database-specific code in any of the rest of liquibase, for better or worse. There are time when it is nice to say “use this sql for mysql and oracle” vs. having to duplicate the different behavior in both extensions, but there are sometimes other options like flags or some duplication can be nice to not accidentally break other databases.

Nathan

There was apparently some Zoho issues which led people to reply directly to me rather than post to the forum. Hopefully it’s cleaned up. I’m going to summarize some and add more thoughts here rather than duplicating it to everyone.

All Database Implementations As Plugins

The idea of pushing all database implementations to plugins seemed well received and I agree makes sense. That is something we will do for 4.0. We’ll want to have an easily extendable abstract base in liquibase-core and probably UnknownDatabase but the rest will go to sub-modules.

One thing that would help would be sort of an “SDK Test Suite” that can be ran against any database extension to make sure it generally works and also follows whatever conventions make sense. We should develop this suite along with 4.0.

3rd Party Libraries

  • JCommander was suggested over commons-cli. I haven’t looked at JCommander before but will.
  • Guice was suggested for dependency injection, in part because spring is too heavyweight and more likely into version problems. This is a very good suggestion, I haven’t really looked at Guice (have just used Spring) but the version issues was a reason I wasn’t really wanting to use Spring. Classloading edge-cases continues to be a huge headache with 3.3.x so I’m all for a DI library that may help that. I looked quickly at Guice and it would probably work well. I’ll continue to look at Guice
  • No love for OSGi. It definitely brings its own issues and I agree that it probably isn’t the right tool for handling our dependency management. We should support plugging into OSGi well but shouldn’t be building off it.

Multiple Connections

Several people liked this feature. Definitely something to ensure is supported.

Java 6 Support: Why Still?

I try to ensure Liquibase can run in as many places as possible. That is why I’ve kept dependencies to a minimum and also build off old Java versions. However, at some point it is time to move on.

I had tried to look for stats on Java versions being used in and couldn’t find anything great. After some digging it semed like there was maybe still 20-30% of projects still running Java 6 which seemed enough of an install base to continue supporting it. If someone has better statistics, let me know though.

“Verified Test” Framework

I just spun the verified test framework off as a separate project (http://testmd.org) so it can grow independently and/or make dependency management cleaner in Liquibase. Take a look and let me know what you think.

Scope Objects

“Are they Necessary?” “Are they Like ServletRequest?” “Consider making them TreadLocal” “Just use @Inject

I’m still working through how to best handle the Scope object. If we use Guice, that may remove a lot (all?) of the need for it. But perhaps not as well. They do sort of behave like ServletRequest, especially its “get|setAttributes” methods.

What I want to do with Scope is to have a way for new configuration and other state objects to be managed without impacting the existing code. This is partly to keep backwards compatibility but also because we want extensions to be able to define new attributes that liquibase-core does not know about and they can simply flow though everything.

For example, liquibase-core may define a “connection” object which is normally used for reading/writing to the database but someone may want to attach a new “backupConnection” to the scope when they start their command and then in one of their extension Change classes they can use that connection to backup data before dropping a table.

@Inject may work for that, but I think a more contained Scope object would work better.

Short-Term Roadmap

I think the first steps toward 4.0 are:

  • New v4.0 branch [done]
  • Split off TestMD [done]
  • Guice and/or Scope object
    • Will be used by snapshot/action logic
  • Refactor Snapshot Logic / "Action" Logic
    • Includes splitting out databases to sub-modules
    • Use TestMD for testing it all
    • Big project but all wrapped together enough that it would probably be a mess to do in stages

Thoughts? (Assuming Zoho lets you post :slight_smile: )

Nathan

I looked into Guice a bit and posted my thoughts here: http://forum.liquibase.org/#Topic/49382000001230003

Short version: thinking Guice isn’t going to be helpful enough to go with.

Nathan