Slow performance for LoadUpdateData with postgresql

pkernevez · August 25, 2022, 10:29am

Hi all,

I face slowness on loadUpdateData with a file of 100’000 lines.
I have an index on the columns use in the where clause.

I see that liquibase is sending a request per row in the DB:

BEGIN
UPDATE INSTRUMENT_PRICE_HISTORY SET CREATED_AT = ‘2020-11-03 00:00:00’, ID = -1560221, PRICE_CURRENCY_ISO_CODE = ‘CHF’, PRICE_QUANTITY = 107.16, PUBLICATION_DATE = ‘2020-11-03 00:00:00’, UPDATED_AT = ‘2020-11-03 00:00:00’ WHERE INSTRUMENT_ID = -156 AND DATE = ‘2020-11-03’;
IF not found THEN
INSERT INTO INSTRUMENT_PRICE_HISTORY (ID, CREATED_AT, UPDATED_AT, INSTRUMENT_ID, DATE, PRICE_QUANTITY, PRICE_CURRENCY_ISO_CODE, PUBLICATION_DATE) VALUES (-1560221, ‘2020-11-03 00:00:00’, ‘2020-11-03 00:00:00’, -156, ‘2020-11-03’, 107.16, ‘CHF’, ‘2020-11-03 00:00:00’);
END IF;
END;

I try to understand why is not grouping the requests and if it’s possible to use an SQL MERGE instead of this statement.
I found this comment in the code:

              // we don't do batch updates for Postgres but we still send as a prepared statement, see LB-744
              // mysql supports batch updates, but the performance vs. the big insert is worse

But I can’t find the issue or information about ‘LB-744’.

github.com

liquibase/liquibase/blob/master/liquibase-core/src/main/java/liquibase/change/core/LoadDataChange.java#L860


      
                      }
                  }
          
          
        statements.add(insertStatement);
              }
          }
          if (rows.stream().anyMatch(LoadDataRowConfig::needsPreparedStatement)) {
              // If we have only prepared statements and the database supports batching, let's roll
              if (supportsBatchUpdates(database) && !preparedStatements.isEmpty()) {
                  if (database instanceof PostgresDatabase || database instanceof MySQLDatabase) {
                      // we don't do batch updates for Postgres but we still send as a prepared statement, see LB-744
                      // mysql supports batch updates, but the performance vs. the big insert is worse
                      return preparedStatements.toArray(new SqlStatement[0]);
                  } else {
                      return new SqlStatement[]{
                              new BatchDmlExecutablePreparedStatement(
                                      database, getCatalogName(), getSchemaName(),
                                      getTableName(), columns,
                                      getChangeSet(), Scope.getCurrentScope().getResourceAccessor(),
                                      preparedStatements)
                      };

There is also a comment that is not aligned with the code here:

github.com

liquibase/liquibase/blob/master/liquibase-core/src/main/java/liquibase/change/core/LoadDataChange.java#L856


      
                      insertStatement.addColumnValue(columnName, value);
          
          
            if (insertStatement instanceof InsertOrUpdateStatement) {
                          ((InsertOrUpdateStatement) insertStatement).setAllowColumnUpdate(columnName, column.getAllowUpdate() == null || column.getAllowUpdate());
                      }
                  }
          
          
        statements.add(insertStatement);
              }
          }
          if (rows.stream().anyMatch(LoadDataRowConfig::needsPreparedStatement)) {
              // If we have only prepared statements and the database supports batching, let's roll
              if (supportsBatchUpdates(database) && !preparedStatements.isEmpty()) {
                  if (database instanceof PostgresDatabase || database instanceof MySQLDatabase) {
                      // we don't do batch updates for Postgres but we still send as a prepared statement, see LB-744
                      // mysql supports batch updates, but the performance vs. the big insert is worse
                      return preparedStatements.toArray(new SqlStatement[0]);
                  } else {
                      return new SqlStatement[]{
                              new BatchDmlExecutablePreparedStatement(
                                      database, getCatalogName(), getSchemaName(),

        if (rows.stream().anyMatch(LoadDataRowConfig::needsPreparedStatement)) {
            // If we have only prepared statements and the database supports batching, let's roll

The comment should be If we have at least one prepared statements
But instead of going through all the rows, I don’t understand why !preparedStatements.isEmpty()is not sufficient.

I have several questions:

Where can I found the ‘LB-744’ ?
Why not using PreparedStatement with Liquibase ?
Why Liquibase is not using a MERGE Statement ?
Is there a reason to not group requests ?
Like

BEGIN
UPDATE INSTRUMENT_PRICE_HISTORY SET CREATED_AT = '2020-11-03 00:00:00', ID = -1560221, PRICE_CURRENCY_ISO_CODE = 'CHF', PRICE_QUANTITY = 107.16, PUBLICATION_DATE = '2020-11-03 00:00:00', UPDATED_AT = '2020-11-03 00:00:00' WHERE INSTRUMENT_ID = -156 AND DATE = '2020-11-03';
IF not found THEN
INSERT INTO INSTRUMENT_PRICE_HISTORY (ID, CREATED_AT, UPDATED_AT, INSTRUMENT_ID, DATE, PRICE_QUANTITY, PRICE_CURRENCY_ISO_CODE, PUBLICATION_DATE) VALUES (-1560221, '2020-11-03 00:00:00', '2020-11-03 00:00:00', -156, '2020-11-03', 107.16, 'CHF', '2020-11-03 00:00:00');
END IF;

UPDATE INSTRUMENT_PRICE_HISTORY SET CREATED_AT = '2020-11-03 00:00:00', ID = -1560221, PRICE_CURRENCY_ISO_CODE = 'CHF', PRICE_QUANTITY = 107.16, PUBLICATION_DATE = '2020-11-03 00:00:00', UPDATED_AT = '2020-11-03 00:00:00' WHERE INSTRUMENT_ID = -157 AND DATE = '2020-11-03';
IF not found THEN
INSERT INTO INSTRUMENT_PRICE_HISTORY (ID, CREATED_AT, UPDATED_AT, INSTRUMENT_ID, DATE, PRICE_QUANTITY, PRICE_CURRENCY_ISO_CODE, PUBLICATION_DATE) VALUES (-1560221, '2020-11-03 00:00:00', '2020-11-03 00:00:00', -157, '2020-11-03', 107.16, 'CHF', '2020-11-03 00:00:00');

/* Other statements here too */

END;

Thanks for you help,
Philippe

nvoxland · August 30, 2022, 8:22pm

The loadData/loadUpdateData code definitely needs a good cleaning up. It’s grown through small patches over the years and it’s ended up a lot of odd edge cases and attempts to work around database behaviors that are not well (or incorrectly) documented.

Performance especially is a big question in there. What version of liquibase are you looking at? There has been performance fixes in postgresql over the last couple months already if you are not fully up to date. In particular, I’m maybe remembering that postgreql was found to be surprisingly slow when using prepared statements and so we updated the code to try to use non-prepared statements when possible on postgresql? It was very surprising but very noticable.

To answer some of your questions, LB-744 was from an old and internal bug tracking system we had for a bit which is not available externally.

The reason we’re not grouping statements together more is because Liquibase’s code paths are designed around sending single statements so that we have no differences between update-sql output and update logic. The same reason keeps the code avoiding prepared statements as much as possible (plus maybe postgresql’s odd performance with them). That does hurt loadData’s ability to be more performant though and we are planning on looking into alternatives to that structure, but it will take a relatively large refactoring to support that and it hasn’t made it to the top of the priority list so far.

Merge would work as a statement instead. I don’t remember when merge was added in postgresql, perhaps it wasn’t widely available when the liquibase added loadUpdateData? Is there an advantage to merge vs. update if not found then ?

Nathan

Topic		Replies	Views
load data csv performance General Discussion	3	2175	February 21, 2011
Loading large CSV files is hugely inefficient and slow Liquibase Development	1	1022	June 22, 2016
liquibase bulk upload General Discussion	5	3058	February 15, 2011
loadUpdateData doesn't work in PSQL less than 9.0 General Discussion	5	524	April 22, 2011
loadUpdateData problem General Discussion	5	388	May 23, 2012

Slow performance for LoadUpdateData with postgresql

Related topics