Postgres and Vacuum with the IBM FHIR Server: Debugging Details

The IBM FHIR Server stores resources in the PostgreSQL database and uses a relational model to store historical FHIR Resource and enable search on the latest FHIR resources. The resource data is spread in a relational model that is occasionally tweaked in order to improve search or optimize the retrieval using the relational model.

In the IBM FHIR Server Performance Guide, the guide outlines some important alterations to the tables that facilitate an optimized Postgres instance. The guide suggests altering, per your providers recommendation, autovacuum_vacuum_cost_limit, autovacuum_vacuum_scale_factor and autovacuum_vacuum_threshold in order to optimize the VACUUM process. With the IBM FHIR Server fhir-persistence-schema-cli, autovacuum_vacuum_scale_factor is not automatically configured, and not recommended on Databases for Postgres on IBM Cloud.

As Postgres uses "multi-version concurrency control (MVCC) to ensure that data remains consistent and accessible in high-concurrency environments", each transaction runs on a snapshot, and needs to be reconciled so dead_rows are removed – vacuumed. The VACUUM process manages dead rows and disk usage (reuse). The VACUUM process (autovacuum) frequently runs – gathering statistics and reconciling the maintenance of the table statitstics and data.

To check for tables that need vacuuming:

 SELECT relname AS "table_name",
        n_tup_ins AS "inserts",
        n_tup_upd AS "updates",
        n_tup_del AS "deletes",
        n_live_tup AS "live_tuples",
        n_dead_tup AS "dead_tuples"
   FROM pg_stat_user_tables
  WHERE schemaname = 'fhirdata'
    AND (relname = 'logical_resources' OR relname LIKE '%_values')
    AND n_dead_tup > 0;

Then a database administrator runs – VACUUM FULL FHIRDATA.PROCEDURE_RESOURCE_TOKEN_REFS; to execute a vacuum, which runs in the background.

While the VACUUM process is running, the pg_stat_progress_vacuum view can be queried to see worker process.

SELECT * 
FROM pg_stat_progress_vacuum;

If you need to update a specific tables settings, you can run with --vacuum-table-name.

java -jar ./fhir-persistence-schema-${VERSION}-cli.jar \
--db-type postgresql --prop db.host=localhost --prop db.port=5432 \
--prop db.database=fhirdb --schema-name fhirdata \
--prop user=fhiradmin --prop password=passw0rd \
--update-vacuum --vacuum-cost-limit 2000 --vacuum-threshold 1000 \
--vacuum-scale-factor 0.01 --vacuum-table-name LOGICAL_RESOURCES

To update all tables in a schema, you can run without the table parameter. If you omit any value, the defaults are picked as described in the Performance guide.

If you hit a lock (ShareUpdateExclusiveLock), the VACUUM worker process is currently churning on the table, and the ALTER statement is waiting.

  • wait_type = Lock relation Waiting to acquire a lock on a relation.
  • wait_lock_type – ShareUpdateExclusiveLock Acquired by VACUUM and conflicts with ALTER

CHeck for the Blocking PID, and grab the blocking_pid.

 -- list bad connections
   SELECT blocked_locks.pid     AS blocked_pid,
         blocked_activity.usename  AS blocked_user,
         blocking_locks.pid     AS blocking_pid,
         blocking_activity.usename AS blocking_user,
         blocked_activity.query    AS blocked_statement,
         blocking_activity.query   AS current_statement_in_blocking_process,
         blocked_activity.application_name AS blocked_application,
         blocking_activity.application_name AS blocking_application
   FROM  pg_catalog.pg_locks         blocked_locks
    JOIN pg_catalog.pg_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks         blocking_locks 
        ON blocking_locks.locktype = blocked_locks.locktype
        AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
        AND blocking_locks.pid != blocked_locks.pid
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
   WHERE NOT blocked_locks.GRANTED and blocked_activity.usename = 'fhirserver'

Try canceling the PID, SELECT pg_cancel_backend(205384);

Else, cancel the current Transaction the blocked pid:

-- cancel the blocking trannsaction/pid (hard stop)
SELECT pg_cancel_backend(blocked_locks.pid)     AS blocked_pid,
         blocked_activity.usename  AS blocked_user,
         blocking_locks.pid     AS blocking_pid,
         blocking_activity.usename AS blocking_user,
         blocked_activity.query    AS blocked_statement,
         blocking_activity.query   AS current_statement_in_blocking_process,
         blocked_activity.application_name AS blocked_application,
         blocking_activity.application_name AS blocking_application
   FROM  pg_catalog.pg_locks         blocked_locks
    JOIN pg_catalog.pg_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks         blocking_locks 
        ON blocking_locks.locktype = blocked_locks.locktype
        AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
        AND blocking_locks.pid != blocked_locks.pid
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
   WHERE NOT blocked_locks.GRANTED and blocked_activity.usename = 'fhirserver'

Now, wait until the VACUUM finishes, and then execute a new ALTER.

You should be all set at this point.

Check wait_type

-- Check Wait Type
SELECT 
    waiting.locktype           AS waiting_locktype,
    waiting.relation::regclass AS waiting_table,
    waiting_stm.query          AS waiting_query,
    waiting.mode               AS waiting_mode,
    waiting.pid                AS waiting_pid,
    other.locktype             AS other_locktype,
    other.relation::regclass   AS other_table,
    other_stm.query            AS other_query,
    other.mode                 AS other_mode,
    other.pid                  AS other_pid,
    other.granted              AS other_granted
FROM
    pg_catalog.pg_locks AS waiting
JOIN
    pg_catalog.pg_stat_activity AS waiting_stm
    ON (
        waiting_stm.pid = waiting.pid
    )
JOIN
    pg_catalog.pg_locks AS other
    ON (
        (
            waiting."database" = other."database"
        AND waiting.relation  = other.relation
        )
        OR waiting.transactionid = other.transactionid
    )
JOIN
    pg_catalog.pg_stat_activity AS other_stm
    ON (
        other_stm.pid = other.pid
    )
WHERE
    NOT waiting.granted
AND
    waiting.pid <> other.pid   

SELECT
  schemaname, relname,
  last_vacuum, last_autovacuum,
  vacuum_count, autovacuum_count, *
FROM pg_stat_user_tables
WHERE schemaname = 'fhirdata' AND relname = 'observation_date_values';

Check with locks

SELECT now()::time, query, backend_start, xact_start, query_start,
         state_change, state,
         now()::time - state_change::time AS locked_since,
         pid, wait_event_type, wait_event
  FROM pg_stat_activity
  WHERE wait_event_type IS NOT NULL and wait_event_type = 'Lock'
ORDER BY locked_since DESC;

Check a PID

SELECT a.usename, a.application_name, a.datname, a.query,
         l.granted, l.mode, transactionid
    FROM pg_locks l
    JOIN pg_stat_activity a ON a.pid = l.pid
    WHERE granted = false AND a.pid = 327589;

Check a tansaction

SELECT a.usename, a.application_name, a.datname, a.query,
        l.granted, l.mode, transactionid,
        now()::time - a.state_change::time AS acquired_since,
        a.pid
   FROM pg_locks l
   JOIN pg_stat_activity a ON a.pid = l.pid
   WHERE granted = true AND transactionid = 3031;

Reference

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.