JSON <==> XML

Bidirectional conversion of JSON <==> XML seems off.


Given the following insert changeSet in JSON from the source:


  1. {
  2.     "databaseChangeLog": [
  3.         {
  4.             "changeSet": {
  5.                 "id": "datatypetest-2",
  6.                 "author": "nvoxland",
  7.                 "validCheckSums": ["ANY"],
  8.                 "changes": [
  9.                     {
  10.                         "insert": {
  11.                             "tableName": "dataTypeTest",
  12.                             "columns": [
  13.                                 {
  14.                                     "column": {
  15.                                         "name": "id",
  16.                                         "valueNumeric": 1
  17.                                     }
  18.                                 },
  19.                                 {
  20.                                     "column": {
  21.                                         "name": "dateCol",
  22.                                         "valueDate": "2007-08-09"
  23.                                     }
  24.                                 },
  25.                                 {
  26.                                     "column": {
  27.                                         "name": "timeCol",
  28.                                         "valueDate": "13:14:15"
  29.                                     }
  30.                                 },
  31.                                 {
  32.                                     "column": {
  33.                                         "name": "dateTimeCol",
  34.                                         "valueDate": "2007-08-09T13:14:15"
  35.                                     }
  36.                                 },
  37.                                 {
  38.                                     "column": {
  39.                                         "name": "bigintcol",
  40.                                         "valueNumeric": 2
  41.                                     }
  42.                                 }
  43.                             ]
  44.                         }
  45.                     },
  46.                     {
  47.                         "insert": {
  48.                             "tableName": "dataTypeTest",
  49.                             "columns": [
  50.                                 {
  51.                                     "column": {
  52.                                         "name": "id",
  53.                                         "valueNumeric": 2
  54.                                     }
  55.                                 },
  56.                                 {
  57.                                     "column": {
  58.                                         "name": "dateCol",
  59.                                         "valueDate": "current_datetime"
  60.                                     }
  61.                                 },
  62.                                 {
  63.                                     "column": {
  64.                                         "name": "timeCol",
  65.                                         "valueComputed": "current_datetime"
  66.                                     }
  67.                                 },
  68.                                 {
  69.                                     "column": {
  70.                                         "name": "dateTimeCol",
  71.                                         "valueDate": "current_datetime"
  72.                                     }
  73.                                 },
  74.                                 {
  75.                                     "column": {
  76.                                         "name": "bigintcol",
  77.                                         "valueNumeric": 2
  78.                                     }
  79.                                 }
  80.                             ]
  81.                         }
  82.                     }
  83.                 ]
  84.             }
  85.         }
  86.     ]
  87. }


When converted to XML using a the JQuery json2xml converter (and others), turns into the following XML

  1. <?xml version="1.0" encoding="utf-8"?>
  2.  
  3.     ANY
  4.     datatypetest-2
  5.     nvoxland
  6.    
  7.      
  8.         dataTypeTest
  9.        
  10.          
  11.             1
  12.             id
  13.          
  14.        
  15.        
  16.          
  17.             dateCol
  18.             2007-08-09
  19.          
  20.        
  21.        
  22.          
  23.             timeCol
  24.             13:14:15
  25.          
  26.        
  27.        
  28.          
  29.             dateTimeCol
  30.             2007-08-09T13:14:15
  31.          
  32.        
  33.        
  34.          
  35.             2
  36.             bigintcol
  37.          
  38.        
  39.      
  40.    
  41.    
  42.      
  43.         dataTypeTest
  44.        
  45.          
  46.             2
  47.             id
  48.          
  49.        
  50.        
  51.          
  52.             dateCol
  53.             current_datetime
  54.          
  55.        
  56.        
  57.          
  58.             timeCol
  59.             current_datetime
  60.          
  61.        
  62.        
  63.          
  64.             dateTimeCol
  65.             current_datetime
  66.          
  67.        
  68.        
  69.          
  70.             2
  71.             bigintcol
  72.          
  73.        
  74.      
  75.    
  76.  

And the same insert changeSet in XML from the source:


  1. <?xml version="1.0" encoding="UTF-8"?>
  2.  
  3.     ANY
  4.    
  5.      
  6.      
  7.      
  8.      
  9.      
  10.    
  11.    
  12.      
  13.      
  14.      
  15.      
  16.      
  17.    
  18.  


When converted using the JQuery xml2json, produces the following JSON:


  1. {
  2.   "databaseChangeLog": {
  3.     "changeSet": {
  4.       "author": "nvoxland",
  5.       "id": "datatypetest-2",
  6.       "validCheckSum": "ANY",
  7.       "insert": [
  8.         {
  9.           "tableName": "dataTypeTest",
  10.           "column": [
  11.             {
  12.               "name": "id",
  13.               "valueNumeric": "1"
  14.             },
  15.             {
  16.               "name": "dateCol",
  17.               "valueDate": "2007-08-09"
  18.             },
  19.             {
  20.               "name": "timeCol",
  21.               "valueDate": "13:14:15"
  22.             },
  23.             {
  24.               "name": "dateTimeCol",
  25.               "valueDate": "2007-08-09T13:14:15"
  26.             },
  27.             {
  28.               "name": "bigintcol",
  29.               "valueNumeric": "2"
  30.             }
  31.           ]
  32.         },
  33.         {
  34.           "dbms": "!derby",
  35.           "tableName": "dataTypeTest",
  36.           "column": [
  37.             {
  38.               "name": "id",
  39.               "valueNumeric": "2"
  40.             },
  41.             {
  42.               "name": "dateCol",
  43.               "valueDate": "current_datetime"
  44.             },
  45.             {
  46.               "name": "timeCol",
  47.               "valueComputed": "current_datetime"
  48.             },
  49.             {
  50.               "name": "dateTimeCol",
  51.               "valueDate": "current_datetime"
  52.             },
  53.             {
  54.               "name": "bigintcol",
  55.               "valueNumeric": "2"
  56.             }
  57.           ]
  58.         }
  59.       ]
  60.     }
  61.   }
  62. }


I’m confused by this apparent discrepancy. It seems to prevent automated conversions from one format to another, which seems like a natural option given the relative compatibility of JSON, XML and YAML. I recognize that there are well-known problems with these conversions, but those don’t seem to be what is at play here.


I’d love to understand the thinking on this - it looks to me that you implemented YAML and got JSON, rather than deriving from your original XML structure.

The idea is that the XML format isn’t the canonical format, it is just one implementation. There is an underlying object structure which JSON/YAML/XML/SQL etc. are converted to and those objects are the real format. Each changelog format can be as different as they want as long as they can construct a valid object structure. 


I tried to keep the JSON/YAML and XML structures similar to make documentation easier but at the same time tried to have the json/yaml format make sense as json/yaml. If anything, the json/yaml formats are more in line with the object model than the XML is because there are places where the XML format should be different but we jump through hoops to maintain backward compatibility.


If you are looking for JSON <==> XML conversion, that is possible using the liquibase classes. The XMLChangeLogParser/JSONChangeLogParser classes can consume XML/JSON classes to create a DatabaseChangeLogObject which can then be passed to XMLChangeLogSerializer and JSONChangeLogSerializer to re-output it, even in a different format.


Nathan

I agree that the JSON/YAML format could be a bit less verbose. That’s what happens when you still try to be fairly similar to XML :slight_smile:  I may create a tighter DSL style format at some point as well. 


Using the changelog parser/serializer classes, you should be able to shift formats down the road if you ever need. Or you can mix and match when using the tags as well.


Do you have an example of the lack of multiline support in JSON that you are seeing? 


Nathan

Thanks for the thorough explanation. What you say makes complete sense to me - you started with XML, so it’s going to be the more mature and encumbered of the formats.


I am not a big fan of XML in general, but in this instance it has won out as my format of choice for some very specific reasons:

  1. Schema validation - when creating the database changelogs, many IDE's will provide code assistance and validation per the XSD's. IIRC neither YAML or JSON have a schema specification.
  2. While the YAML markup is better suited as a DSL, I use IntelliJ, and it doesn't support the Language Injection feature for YAML - which I find incredibly valuable when dealing embedded SQL (createView, createProcedure, etc).
  3. The JSON format is just too verbose for me - requiring the repeated "column" tag when the JSON itself handle arrays of values is troublesome.
  4. Most damning for JSON is its lack of multiline support.
  5. The Formatted SQL doesn't have the auto-rollback functionality.

This all came to the fore because at this point I’ve cycled through changelogs in every one of the four formats, I started with JSON, found its rough edges and moved to YAML, and got bit by invalid syntax and realized the superior format, for now, is XML - however I can see a need to move back to YAML at some point in the future, and having the ability to translate back and forth would really help me do that.

Okay…that was me above.

It does give me comfort knowing that I could translate using the Object Model and the parser/serializer classes.


I am currently mixing and matching my formats using the tag.


Sorry for the imprecise language - I’m referring to the JSON spec which does not allow embedded control characters, and therefore requires that multi-line strings have escaped newlines.


This plays havoc with readability.

Good point, that is a problem with JSON in general that I hadn’t really thought of.  For multiline sql you can use the sqlFile tag even in JSON, but it is a pain.


Nathan