Add the possibility to set defaultCharsetName

wentwog · February 12, 2014, 8:28pm

Hi.
Can you help solve the following issue:

At our projects we use sql changelogs. These sql files are in utf-8, without BOM. Our Windows machines have cp-1251 encoding.
And as the result we have wrong characters for Russian letters in the DB after Liquibase runs scripts.

As far as I understand, Liquibase determines the encoding/charset of an sql changelog by the BOM in that file.
If there is no BOM - it uses system default encoding.
Code: UtfBomAwareReader.java : this.defaultCharsetName = Charset.defaultCharset().name();

I suggest to add the possibility to set defaultCharsetName somewhere. Possibly in xml changelog:

And this property should be applied to all the changelogs, these changelog includes (<include …>).

The other solution I see - adding BOM to our source files - is inconvenient for us.

P.S. I haven’t found how to create this issue at the jira (liquibase.jira.com), though, I’ve signed up there.

nvoxland · February 12, 2014, 8:28pm

What version of liquibase are you using? There have been some improvements to UTF support through 3.1.1 that may help.

If you have created a user on liquibase.jira com, there should be a blue “Create Issue” button int he middle of the header bar.

Nathan

nvoxland · February 12, 2014, 8:28pm

Someone else said they can’t see where to create an issue either. I’m looking into that…

Nathan

nvoxland · February 12, 2014, 8:28pm

Separate testing shows the button there. Perhaps a temporary jira bug that is resolved?

Nathan

un1381239061359r87id · February 12, 2014, 8:28pm

Nathan, thanks for the reply !

I’m using 3.1.1.
Yes, there were improvements.
Checksums now are calculated the same across Win and Linux. The calculation uses UtfBomStripperInputStream instead of UtfBomAwareReader.

But, when executing sql script, UtfBomAwareReader is used as before.
This leads to wrong characters in the DB.

I’ve experimented with 3.1.1 by replacing in UtfBomAwareReader
code: this.defaultCharsetName = Charset.defaultCharset().name();
with: this.defaultCharsetName = ‘UTF-8’;

And it works like a charm.
So, I suppose that the issue is still present.

nvoxland · February 12, 2014, 8:28pm

Thanks, I created https://liquibase.jira.com/browse/CORE-1776 to track the change. I’ll try it out and see if there are other chagnes that need to be made as well.

Nathan

Topic		Replies	Views
Incorrect default charset for formatted sql files via maven plugin General Discussion	3	1524	May 8, 2014
utf8 in a SQL tag encoding properly General Discussion	3	2066	October 24, 2013
Character encoding problem when using sqlFile General Discussion	2	2418	June 6, 2013
Liquibase 4 unable to work with changelog files that have UTF8 characters in their name? Liquibase Development	3	992	March 10, 2022
Liquibase 4 UTF-8 not recognized General Discussion	2	825	March 10, 2022

Add the possibility to set defaultCharsetName

Related topics