Support for Hive (Hadoop) schema changes via Liquibase?

I don’t know enough about Hive/Hadoop to provide a definitive answer, but in general Liquibase is extensible to new database types. In version 3.x there is still some database-specific code in Liquibase core, but I know that Nathan (the guy who wrote Liquibase and still the project owner) is planning to remove that in the 4.0 version, which is currently in the early stages.

There are some docs on writing your own extensions, as well as information on the extensions currently available at https://liquibase.jira.com/wiki/display/CONTRIB/LiquiBase+Extensions+Portal

Steve Donie
Principal Software Engineer
Datical, Inc. http://www.datical.com/

I used to manage database deployments via sourcing .sql text files and it was problematic.

I now have a number of applications a group of us has built whose SQL deployment operations I am managing in Liquibase. Great stuff!

I love the ability in Liquibase to have error handling and incremental idempotency in my SQL deployment operations, both command-line and via Maven builds. Thanks to creative use of RunAlways and Preconditions I am even able to create TDD unit tests for liquibase operations that ensure certain operations run always and can be ensured to remain true as part of each code build or database deployment!

However, the latest of the applications we’ve built is trying to take our SQL data and push it to Hadoop (via Sqoop) for predictive analysis there that is very computationally intensive. So I am now also supporting Hive schema on our Hadoop cluster, at least some of which mirrors our SQL schema very closely. (Hive is an open-source SQL implementation that runs across a Hadoop cluster and can be connected to with a standard JDBC driver.)

So after championing Liquibase amongst my technical peers I find myself in the predicament that I have to return to the days of sourcing some of my DDL in shell scripts with no error handling, and I can’t use Liquibase for all DDL changes related to our applications!

I don’t really need Liquibase to generate “HiveQL DDL” via its tags, but I at least need to be able to specify such sorts of DDL in a tag like I do with our existing database-specific DDL in Liquibase.

It seems plausible/feasible at a high level to me because one just needs to have Liquibase pass the DDL through a JDBC driver and read back the responses, right? I am probably grossly oversimplifying.

Is deploying DDL for Hive/Hadoop via Liquibase possible via some existing means? Is this even on the radar for the future?

Yes, support for Hive changes are definitely possible. The existing 3.x extension system should allow you to plug something in but like Steve says I am busily working on larger changes with 4.x that will make it easier to support new environments.

Nathan

Hi Mani,

I’ve develop the plugin you mention above, you can try it https://github.com/eselyavka/liquibase-impala. This plugin support Hive and Impala. I believe that there are a lot of areas where we can improve this plugin, but basic tests looks good for me at this moment. In case your find a bug please create an PR on this or address this issue to me.

Hi,

I've been trying to understand the codebase for sometime to see whether it is possible to make use of liquibase to handle schema changes for Impala/Hive. At first glance, come to know that CRUD operations are happening on two important tables - 1. databasechangeloglock 2. databasechangelog as it is heavily used to persist the state. 

Hive/Impala doesn’t have ACID support - No Transactional support, No CRUD support in my distribution although there is support (but not as full fledged feature) in general community version. Thought of sharing this info and asking for your suggestions. Thoughts? Is anyone trying out this?

Is it possible to keep these core tables somewhere outside in RDBMS (for ex, mysql) as common place for all Hive/Impala databases? Yes, there could be problems in maintain the integrity between these two operations.

Thanks,

Mani

Thanks for your reply. I will try using the extension and would like to understand the flow as well. Will update on this.

I did not manage to make it work with https://github.com/eselyavka/liquibase-impala. .

But I have another concern. The plugin is based on Cloudera .jar libraries. You need to accept the term and condition of Cloudera as well.

Cloudera grants to Customer a nonexclusive, nontransferable, and limited licsense to access, use, reproduce … exclusively for use with Cloudera’s Distribution Including Apache Hadoop.

Is that mean we can not use the extension and the jar without Cloudera’s product :/?