At our projects we use sql changelogs. These sql files are in utf-8, without BOM. Our Windows machines have cp-1251 encoding. And as the result we have wrong characters for Russian letters in the DB after Liquibase runs scripts.
As far as I understand, Liquibase determines the encoding/charset of an sql changelog by the BOM in that file. If there is no BOM - it uses system default encoding. Code: UtfBomAwareReader.java : this.defaultCharsetName = Charset.defaultCharset().name();
I suggest to add the possibility to set defaultCharsetName somewhere. Possibly in xml changelog:
And this property should be applied to all the changelogs, these changelog includes (<include …>).
The other solution I see - adding BOM to our source files - is inconvenient for us.
P.S. I haven’t found how to create this issue at the jira (liquibase.jira.com), though, I’ve signed up there.
I’m using 3.1.1. Yes, there were improvements. Checksums now are calculated the same across Win and Linux. The calculation uses UtfBomStripperInputStream instead of UtfBomAwareReader.
But, when executing sql script, UtfBomAwareReader is used as before. This leads to wrong characters in the DB.
I’ve experimented with 3.1.1 by replacing in UtfBomAwareReader code: this.defaultCharsetName = Charset.defaultCharset().name(); with: this.defaultCharsetName = ‘UTF-8’;
And it works like a charm. So, I suppose that the issue is still present.