As I am wrapping up 3.3.x changes and looking forward to what changes and features people are asking for and what I would like improved, I am thinking it is time to look at larger changes around a Liquibase 4.0 release.
My most-likely overly-ambitious goal for Liquibase 4.0 is to do a major housecleaning of the Liquibase codebase to simplify it while increasing testability and test coverage. There have been changes in scope and requirements over the last 7 years and the codebase is getting a bit over-complex and haphazard. This makes it harder for me to maintain and a barrier to entry for contributors.
Previously I thought about trying to break this into a few smaller blocks/themes and focus on one for 3.4, one for 3.5, etc. until they are done and have 4.0 be just the final cleanup “no more changes” release. After doing some work with the testing and the snapshot logic, however, I think they all bleed into each other enough to make it much easier to do a single major release that just breaks everything.
The jump to 4.0 will signify compatibility-breaking API changes, but there should be no changes to the changelog format: 4.0 should be a drop-in replacement for anyone just using a changelog and not writing extensions etc.
The major themes/changes I’m looking to make are:
Improvement of how state is managed and high-level functions are called
Currently there is the liquibase.Liquibase façade object that wraps many common functions with method parameters, a variety of “configuration” objects (such as DiffOutputControl), and a heavy use of singletons. Ant/maven/command line etc. call out to the Liquibase façade as best they can.
The problems with this setup include:
Planned changes:
- Switch to using “Command” objects in favor of a monolithic Liquibase façade
- Allows command logic to be encapsulated and more easily extended
including validation and setup
- Allows new commands to be written within extensions and exposed
through command line, maven, etc.
- Create a hierarchical “Scope” object
- Works similar to AngularJS $scope where a root container object
is created and passed along to sub-methods.
- Along the call chain, new attributes can be added that are only
visible to methods further down the call chain
- Root Scope object created as part of the Command execution and builds from there
- Replace use of singletons with objects added to the Scope
- Ideally we can access the scope object without needing to
include it in every method signature, but unsure on a good
implementation yet
- Configuration objects available from Scope object
Improved Testing/Testability
Liquibase currently doesn’t have a great way to handle testing of the interaction with the database. Traditionally, I’ve had the liquibase-integration-tests module which mainly uses a set of changeSets with example changes and scenarios and preconditions to test that they ran successfully.
Problems with this setup include:
Beyond the database interaction tests, there is some unit test coverage but not enough
Planned changes:
- Fully implement my new “VerifiedTest” framework. The idea is
that
each test creates a simple text description of the interaction (such
as the SQL string to execute) and a way to validate the test passed.
Previous test run text descriptions and the results are stored in a
markdown-formatted file. On each test run, the text description of
the code is compared to the last run and if they are the same (most
of the time) they are just marked as passed. If they are different,
the validation code is used to make sure the new version is still
correct and then the markdown file is updated. If the validation
fails, the test fails. If the validation cannot run (database is unavailable) the markdown file is updated but marked as “not validated”
- The hope is to allow for database tests in a more standard test
framework, allow most integration tests to run in unit-test speed,
allow contributors to know when they have potentially broken
things even if they don’t have the database to test against, and
provide a way for me to see how the interactions have changed from
within the pull request system
- Finish move to Spock testing
- TDD develop new and changed 4.0 functionality to increase general coverage
- Use generative testing to ensure all permutations are tested, both
with standard unit test and “VerifiedTest”
Use more 3rd party libraries
In the past, I’ve tried to avoid the use of 3rd party libraries in order to avoid jar-hell for people using Liquibase. I think there are few places where there have been enough convergence on a “standard” and/or isolated-enough use cases that I should introduce some 3rd party libraries in order to simplify my codebase. In particular:
Improve database snapshot functionality
Database Snapshot support was never really central to Liquibase, but it has become more and more used within Liquibase. The current implementation is overly complex and the way it abstracts the logic for extension doesn’t really fit with how extension is happening in real life leading to performance issues and excessive code writing. Furthermore, testing is slow and difficult to impossible.
Planned changes:
- Change snapshot algorithm from starting with a single object
(schema, table, etc.) and then recursively finding related objects
to a process where we first just fetch all objects in the database
and then connect up those objects in memory if needed. The base
object to snapshot (a schema, a table, a column, etc.) is still
passed to each of the fetch methods which can limit what is read
from the database if it so chooses, but it will be a much less
convoluted process
- Ensure the snapshot interfaces and
base classes do not make JDBC
or even RDBMS assumptions. The
snapshot process should be able to
handle non-traditional
“databases” such as hibernate mappings,
changelog files, mongodb
and other nosql database, etc. Ideally it
would even be able to
snapshot non-databases such as server
configurations although
that is less important.
- Ability (or at least API hooks)
to support data diff
- Add a way to specify subsets of items to not snapshot. For
example, don’t include tables that match the name “ADM_.*”
- Use VerifyTest framework to ensure good testing of the snapshot functionality
- Better model the connection between primary keys, foreign keys, unique constraints, and indexes
Improve Change and SQL Generation logic
Currently we have Change classes which represent what can be in a changeLog file. These generate one or more Statement objects which are a lower-level logical database change. The Statement objects are then fed to SqlGenerator objects which create the actual SQL based on the Statement and the Database.
Planned changes:
Improve Cross-database Logic
Both update and snapshot logic currently have issues with cross-database functionality.
Data types is the major problem:
On update, sometimes people want to be able to specify a simple type like “text” and have that mean “clob” on one database and “nvarchar(max)” on another. Other times, people want “text” to mean “clob” on one database but “text” (not nvarchar(max)) on another. Or “int” should be the database default “int” on all but oracle where it should be number(23). Then there are boolean types where some databases don’t support Boolean so you need to use “bit” but you also sometimes need to specify a actual “bit” type which isn’t used as a boolean and so should be tinyint on the db that supports Boolean but not bit.
On snapshot, if you are comparing a mssql and a mysql database should you mark the columns as different if they data types are int vs. integer? What about nvarchar(10) vs varchar(10) when one doesn’t support nvarchar? Bit vs. Boolean? Text vs. nvarchar(max)? Text. Vs. Text (when mssql’s nvarchar(max) is more like mysql text?)
On generateChangeLog, do you generate generic types or database-specific types?
Case handling is the other major problem:
How should differences in case be handled in comparisons? When should case sensitivity be preserved and when should it not matter?
There are other issues too:
- Auto-generated names vary, how do we best handle those?
- If Mysql has an index on a FK column but oracle doesn’t, is that a
difference to fix since mysql auto-generates the index?
- If you try to create a sequence on a database that doesn‘t
suppport sequences, should that be an error? Or expected to fail and skipped?
- Are sequences different than other non-supported features like non-clustered PKs, full text indexes, etc.?
General Code Improvements
- Handle multiple active connections
- ChangeSet/Actions can target different connections
- Allows multiple databases to be updated in concert
- Want to further reduce duplication of code between XML and YAML/JSON parsers and serialize/deserialize logic
- Simple SQL Parser
- Enough to be able to handle strings vs. keywords vs. objects
- May be helpful with new “Action Template” functionality
- May be helpful with etc.
validation and checksum
- More granularity on checksum versioning
- Currently there is a “version” as part of the checksum tag for when I need to make a change to the logic that affects how they are generated, but most often there changes just in individual tags or certain scenarios of certain tags. We need a better way to handle this to make updates more seamless for everyone.
- UTF8 / Other Charsets
- There are some placeholder hooks and naming in place to support
non-java uses of the code. In particular I was hoping to be able to
use ikvm to re-compile most of the Liquibase logic for .Net and just
plug in particular classes to make it better integrate (.net-native
connections, xml parsers, etc.) There has never been any traction on
this and I think it should be pulled out to simplify things
- Improved prepared statement logic: sometimes you need to use
prepared statements, not simple statements. Liquibase has tried to
avoid prepared statements and so places where they are needed are
badly wedged in.
- Safer modifySql logic: currently the modifySql just does a simple
string replacement of the SQL, but if/when the generated SQL logic
changes that can transparently break previously working modifySql.
Need a way to make this safer
- Better OSGi support: I don’t really know OSGi well enough to know
if what we have is good or not
- Separate SQL logging: Currently most SQL goes through the DEBUG
level logging but people often want SQL logged but no other debug
info and/or to log SQL to a separate location. Ensure all SQL is
logged and handled separately
- The tag table structure currently doesn’t support multiple tags at
the same point and doesn’t always track all the changes in a tag
well. Probably need a separate DATABASECHANGELOGTAG table
- Refactor the Database API: It is currently a mix of “Dialect
logic”, connection handling, and more. Some logic should be split
out, other dialect logic scattered throughout the code should be
brought into the Database class
- Refactor ResourceAccessor API: I made some changes with 3.3 but
ensure the APIs cover what is needed
- Clean up multi-schema support: There is some support for managing
multiple schemas but it is not consistently used and supported.
- Move non-core database support to extensions
- What are core databases? I would suggest mysql, pgsql, oracle,
mssql, db2
- Should not be using “database instanceof MysqlDatabase” etc. Should be using subclassing instead.
New Features
- Postconditions: Like preconditions but ran before committing
the changeSet
- updateReference, rollbackReference, and
other *Reference commands
that perform the same logic as the
normal version (update, rollback, etc) but against the
“reference” database.
- Improved DBDoc with an updated skin
and new features
Infrastructure Improvements:
What are your thoughts on the 4.0 feature list? Anything you think
should be added or removed?
Nathan