[LB1.9.3] Internal datatype conversion to target databases

Trying LiquiBase for the first time I think this tool can help me upgrading my databases. Planning this for both MySQL and Oracle with the same changelog I found some data type conversion issues. For some types the the internal LiquiBase is not converted to the correct target database type or a possibly better fit is available. The table below lists the internal LiquiBase types as well as the target database types for Oracle and MySQL. Another column is added with the requested data type.

]
LiquiBase type Oracle Requested MySQL Requested
DOUBLE DOUBLE NUMBER or FLOAT DOUBLE
LONGVARCHAR CLOB TEXT LONG TEXT
TINYINT TINYINT NUMBER(3) ? TINYINT
VARCHAR VARCHAR VARCHAR2 VARCHAR

The double and tinyint are the most curious, Oracle doesn’t event accept these as datatype and can be considered a bug. The VARCHAR is now automatically converted to varchar2 but possibly Liquibase can do this.

For MySQL the LONGVARCHAR, can this be set to LONG TEXT? The Liquibase type TEXT gives support for normal length TEXT fields, then the LONGVARCHAR can support LONG TEXT, or should a additional type be added to the Liquibase for long text fields in MySQL?

Regards,
Chris

Those are some good suggestions, especially the ones oracle does not support currently.  I did have varchar converting to varchar2 at one point, but since oracle does the conversion anyway it seemed redundant, and did lead to one issue (I don’t remember what it was off hand).

It can get a bit difficult deciding on conversions for some because what makes sense for one person doesn’t for another.  LOGVARCHAR->LONG TEXT in particular may require some thought and/or discussion.

Before 2.0 final is out, I will try to build up a good list of what the supported “standard” datatypes are and ask for a review of it.  I created http://liquibase.jira.com/browse/CORE-398 to track progress.

Nathan

Thanks for the quick reply.

The possibility for a user provided conversion table would be nice, I already read the code to see if something like that was there. My application uses JPA to communicate Java objects to the database, when a Java attribute is defined as Double JPA creates an Oracle NUMBER(19,4), which is a double in MySQL, maybe DOUBLE in LiquiBase can be set to NUMBER in oracle. Another option would be to include a DECIMAL type in LiquiBase which maps to NUMBER in Oracle.

Another option for LONGVARCHAR can be a LONGTEXT in LiquiBase, then Oracle can be a CLOB, in MySQL the LONG TEXT value can be used. For the other databases I don’t know as I don’t use them.

Chris

I’ve run into the same issue using the same changelog to upgrade MySql, Oracle, and SQL Server databases with non-standard data type mappings.  My solution has just been to use tags for each vendor and replace the datatype with the one we want to map to (ie ).  I just compiled a list of all the mappings we needed and created tags for them in the bottom of a changeset and I now use that changeset as a template. 

The ability to add a configuration file for data type mappings would be great, as would the ability to create a custom tag.

One option for now is to use the changelog parameter functionality to pass in the datatypes you want.  For example:

    and then pass clob=“CLOB” or clob=“LONG TEXT” to liquibase when you run it.

    Nathan

    Thanks for the summary, Laird. This is as good of a thread as any to discuss how we want the type conversion logic to work for 2.1.


    Another use case I would throw in is database diff/changelog generation. When you generate a changelog from a database, should the changelog include the liqubase types or the database-specific types? Similarly, if you are comparing two databases (including the “hibernate configuration” database), how do you determine if datatypes are equal? Do you do database-specific comparisons or attempt to convert to a liquibase standard type?

    There is also the fact that liquibase doesn’t currently explicitly handle “standard” datatypes that the underlying database supports, so you can have type=“int” and it will work on oracle even though we don’t check for an “int” type and convert it to a “number(10)”. This currently confuses people when they do a generateChangeLog and see different types.

    I do like the idea of more explicit syntax in the changelog file to control what is meant. I think that is a necessarily fallback option (like the tag). We do need to be careful about backwards compatibility too, which I think means that type=“XXX” will need to use the liquibase types. 


    The java.sql.Type mappings have always been a bit weird in my mind. They were suggested and included by someone else, I think a standard liquibase type system should be enough. We can’t get the database types directly from the JDBC driver for a given java.sql.Type, so in the end is just just an alias for a standard liquibase type. I think this is a fine standard to continue with, but with better documentation on what the java.type->liquibase type is per database.


    I like the idea of a thinking of types as more descriptive than just standard datatypes. At the same time, database standard types bring with the a lot of built in meaning with less to type. For example, varchar(20) vs char(20) give the same meaning as string(size=20, padded=false) vs string(size=20, padded=true). Using the SQL99 datatypes as a basis I think makes sense, but there are other types such as CLOB, BLOB, TEXT, etc. that aren’t well defined.


    What about this as a suggestion:


    • When liquibase sees a string in a type attribute, it assumes it is a liquibase type something that should be converted to a database type


    • If the type attribute starts with a “db:”, that means the type is just to be passed directly along to the database with no conversion.


    • Liquibase types use the following syntax: TYPE(STANDARD_TYPE_PARAM(s), [LIQUIBASE_TYPE_PARAMS])

    For example, “varchar(30)”, “varchar(30 [fullTextIndexed=true,clob=false]”


    Implementation Details/Rationale:


    The above syntax makes a more natural progression between database-specific types to standard datatypes that are mapped by the database to datatypes managed by liquibase. We would also be able to move how a particular string is handled (like “boolean”) from release to release from being a database pass through to a liquibase type without changing the changelog file.


    In 2.0, we have the start of standard classes per liquibase datatype, but they do not really work like the rest of the extension system does. We should have a “name” column and a DataTypeFactory like we have with the ChangeFactory where we lookup the correct datatype by name (with the ability to add datatypes and/or override the core DataTypes with extensions) and instantiate a new instance of it. There would be a fallback RawDataType class if no registered type is found.


    We would then have a setParameters(String[])  method for passing in the STANDARD_TYPE_PARAM(s). For each of the LIQUIBASE_TYPE_PARAMS key/value sets, we would use get/set methods on the instance. This would allow us to have some properties in superclasses and some in subclasses and extensions. If a property is not supported, an exception is thrown. 


    The DataType interface would have a toDatabaseType(database) [or the Database interface would have a toDatabaseType(datatype)] method which would take the object and all it’s parameters and make a string to pass to the underlying database. If a type’s configuration is not valid for the given database (like too long of a varchar or an unsupported type) then an exception is thrown from the toDatabaseType() method.


    Going from a database diff, the DatabaseType objects can register the JDBC types they represent and the diff tool would use the DataTypeRegistry to instantiate the correcdt DataType object, then call a method on the DataType passing in its metadata and it would be up to the DataType instance to populate its own parameters.


    The general idea is to encapsulate all the to/from liquibase types into a single extensible class. I go back and forth on whether the logic of creating the database string for a given type belongs in the database instance or the datatype instance.


    Thoughts?

    Nathan

    +1 for this… Currently fighting liquibase to get a TEXT type field in mysql. 

    Ideally I would like to be able to set this like a changelog parameter at the top of the file. Changelog parameters are not quite powerful enough though, as you can only use the existing types.


    Thanks for the background Laird. I think I understand your intent, and philosophically agree with it, but I do wonder how much resistance development teams would have to learning yet another large and rich data type language.

    Even if you manage to make a perfectly expressive data type language, there are going to be users who want to work with the native data types offered by the databases they code to.

    Your off-the-cuff example of ‘"<span or something like that’ is going to fall over when someone wants to support MySQL and Oracle with the same codebase, unless there is a mapping mechanism similar to the <property tag.


    At the risk of throwing a big ol’ spanner into the works, let’s back up for a minute and talk about what the Liquibase data types are, why they exist, and what they’re attempting to accomplish.

    Let’s work backwards.

    At the “bottom”, you have the database.  Which database?  The one that you, one person, happen to be using.  At this level you’re not concerned about portability.  It has very picky rules about what datatypes it supports.  Unless it’s really weird, most of them are SQL-specified, but there are wrinkles.

    Of course the whole reason you’re using the tag, for example, and not INSERT INTO… is that you http://about.me/lairdnelson

    I think I am having now the exact same problem:

    The question is what should I do, because for each database type it is also different types. I am lost

    Found this oldish thread after bumping into CORE-1033 / CORE-1285, which is to say I can’t create TEXT and BLOB types on MySQL (or at least not without some clever use of modifySql).

    A few thoughts:

    It’s problematic to have raw and abstract types specified in the same attribute, since, as here, you have a conflict when the names are the same. For systems that support user-defined types, you never really know what name conflicts might occur. And if you ever someday add a new abstract type, you risk breaking everyone who was using a raw type of the same name. Prefixing is just one way of cramming everything into a single “type” string, which starts to take on its own internal structure.

    The whole point of having abstract types is to simplify portability. Actually, as it stands, DBMS-aware changelog parameters should be a sufficient mechanism for this purpose (or would be, if they could successfully specify raw types like TEXT). The added value of built-in abstract types, then, is just to concisely pre-package what most users will need.

    When generating a changelog from a database, the foremost concern is to reproduce that schema faithfully. If I start with “MEDIUMINT UNSIGNED” in MySQL, the changelog should recreate a column of that exact same type. But there’s no reason the generated changelog can’t also support other database systems with reasonable equivalents. The obvious way to do this is to map each raw type with a changelog parameter, which would specify the original raw type for the original DBMS and some substitute type (say, BIGINT in this example) for others. If you get this far, it’s nice to also somehow avoid the verbosity of listing the equivalents for each of fifteen different database systems.

    A better idea, though, is to elevate type definitions to something explicit, distinguishing raw and abstract, like this:









    This accomplishes several things:

    • Backwards compatible. Existing changelogs that just use type will function the same. 
    • Users on a single DBMS who despise the whole abstract type system can just use rawType everywhere and be sure that TEXT will not turn into LONGTEXT. 
    • The user, or likewise the changelog generator, can define new abstract types (or redefine built-in ones). These may in turn leverage other abstract types, built-in or not.

    Looking to the future, we may someday want to introduce a new built-in abstract type system, on an opt-in basis, and be able to opt out of the old one. So,


       
       

    In this model, abstract types simply map a type name to a raw type in each DBMS. If more intelligence than that is really worthwhile, I would rather see parameters done in XML than develop a new string format. For example,



    Hi,

    We want to use Liquibase to maintain scripts for a product towards Oracle and DB2-i database. We are now at a point that we need to decide whether to 

    • maintain a generic set of change scripts, and to transform them towards the DB-vendor specific datatype

    • or to maintain a set towards Oracle, and transform it towards DB2-i.

    The modifysql clause might be usefull, but as I see it, it is only related to one change set, so we would need to inject it in every changeset, which is hardly maintainable.

    So the contents of this forum thread represents exactly what we need.


    My Questions;

    • Are there any of these features already implemented in Liquibase?

    • Can you give an insight in the roadmap whether/when these concepts would be incorporated?

    • Are there any affiliate tools/projects that could provide some kind of '‘preprocessing’ to tailor towards a db specific script?

    I haven’t gotten to a full cleanup of data type logic, but Liquibase does try to handle type differences in a cross-database manner if you use “standard” types. In Liquibase 3.x you can control exactly how the data types are handled with extensions, but there is still some work to make it easier to use and handle edge cases better.


    I tend to suggest writing a generic changelog and then relying on liquibase and/or extensions to transform it to db-specific types. That lets you more easily handle new database types or make sweeping changes in how types and SQL are handled.


    Liquibase does have a type of preprocessing with changelog parameters which you may want to look into as well.


    Nathan