Recipe: Building a custom IBM FHIR Server container with Bulk Data Parquet Support

Note: Parquet Support is now obsolete.

The IBM FHIR Server has early support for Bulk Data export to the Apache Parquet format using the Apache Spark libraries. New as of version 4.4.0, the export to parquet feature requires:

  • Apache Spark v3.0 and the IBM Stocator adapter (version 1.1)
  • the configuration /fhirServer/bulkdata/storageProviders/(source)/enableParquet set to true

The Parquet Bulk Data export is activated using a custom _outputFormat in the export request.

        {
            "name": "_outputFormat",
            "valueString": "application/fhir+parquet"
        },

Let me show you how to build a custom IBM FHIR Server container with parquet support Docker: ibmcom/ibm-fhir-server. It is recommended to use 4.9.0 or higher.

Recipe

  1. Prior to 4.9.0, build the Maven Projects and the Docker Build. You should see [INFO] BUILD SUCCESS after each Maven build, and docker.io/ibmcom/ibm-fhir-server:latest when the Docker build is successful.
mvn clean install -f fhir-examples -B -DskipTests -ntp
mvn clean install -f fhir-parent -B -DskipTests -ntp
docker build -t ibmcom/ibm-fhir-server:latest fhir-install
  1. Download the dependency files for parquet and stocator.
export WORKSPACE=~/git/wffh/2021/fhir
bash ${WORKSPACE}/fhir-bulkdata-webapp/src/main/sh/cache-parquet-deps.sh
  1. Download the fhir-server-config.json
curl -L -o fhir-server-config.json \
    https://raw.githubusercontent.com/IBM/FHIR/main/fhir-server/liberty-config/config/default/fhir-server-config.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  8423  100  8423    0     0  40495      0 --:--:-- --:--:-- --:--:-- 40301
  1. Update the fhir-server-config.json to use a IBM COS storage provider with parquet support. You’ll need to update with your HMAC id, internal and external URLs and parquet enabled.
"storageProviders": {
                "default" : {
                    "type": "ibm-cos",
                    "bucketName": "fhir-performance",
                    "location": "us-east",
                    "endpointInternal": "https://s3.us-east.cloud-object-storage.appdomain.cloud",
                    "endpointExternal": "https://s3.us-east.cloud-object-storage.appdomain.cloud",
                    "auth" : {
                        "type": "hmac",
                        "accessKeyId": "key",
                        "secretAccessKey": "secret"
                    },
                    "enableParquet": true,
                    "disableOperationOutcomes": true,
                    "duplicationCheck": false, 
                    "validateResources": false, 
                    "create": false
                }
            }
  1. Start the Docker container, and capture the container id. It’s going to take a few moments to start up as it lays down the test database.
docker run -d -p 9443:9443 -e BOOTSTRAP_DB=true \
  -v $(pwd)/fhir-server-config.json:/config/config/default/fhir-server-config.json \
  -v $(pwd)/deps:/config/userlib/ \
  ibmcom/ibm-fhir-server
3f8e90f20cd42129adc58df8a0295efc3fb2a0f4507350589f71939a072999ae
  1. Check the logs until you see:
docker logs 3f8e90f20cd42129adc58df8a0295efc3fb2a0f4507350589f71939a072999ae
...
[6/16/21, 15:31:34:533 UTC] 0000002a FeatureManage A   CWWKF0011I: The defaultServer server is ready to run a smarter planet. The defaultServer server started in 17.665 seconds.
  1. Download the Sample Data
curl -L https://raw.githubusercontent.com/IBM/FHIR/main/fhir-server-test/src/test/resources/testdata/everything-operation/Antonia30_Acosta403.json \
-o Antonia30_Acosta403.json
  1. Load the Sample Data bundle to the IBM FHIR Server
curl -k --location --request POST 'https://localhost:9443/fhir-server/api/v4' \
--header 'Content-Type: application/fhir+json' \
--user "fhiruser:${DUMMY_PASSWORD}" \
--data-binary  "@Antonia30_Acosta403.json" -o response.json

Note, DUMMY_PASSWORD should be previously set.

  1. Scan the response.json for any status that is not "status": "201". For example, the status is in the family of User Request Error or Server Side Error.
cat response.json | jq -r '.entry[].response.status' | sort -u
201
  1. Call the export to Parquet operation, and grab the content-location.
curl --location --request GET 'https://localhost:9443/fhir-server/api/v4/$export?_outputFormat=application/fhir%2Bparquet&_type=Patient' \
--header 'X-FHIR-TENANT-ID: default' \
--user "fhiruser:${DUMMY_PASSWORD}" \
--header 'Content-Type: application/json' -k -v
< content-location: https://localhost:9443/fhir-server/api/v4/$bulkdata-status?job=LqzauvqtHSmkpChVHo%2B1MQ
  1. Check the exprot status using the previous URL, and once you see a 200 response, you can go out and use your exported Parquet data.
curl --location --request GET 'https://localhost:9443/fhir-server/api/v4/$bulkdata-status?job=LqzauvqtHSmkpChVHo%2B1MQ' \
--header 'X-FHIR-TENANT-ID: default' \
--user "fhiruser:${DUMMY_PASSWORD}" \
--header 'Content-Type: application/json' -k
{
    "transactionTime": "2021-08-09T00:34:11.594Z",
    "request": "https://localhost:9443/fhir-server/api/v4/$export?_outputFormat=application/fhir%2Bparquet&_type=Patient",
    "requiresAccessToken": false,
    "output": [
        {
            "type": "Patient",
            "url": "https://s3.us-east.cloud-object-storage.appdomain.cloud/fhir-performance/AZ0gsQS05_RqZnHPhj57AfhYSIHU8VzwmnWjDCQdi2I/Patient_1.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=fc85bf9cc1ac49e99e40085f9ba00f77%2F20210809%2Fus-east%2Fs3%2Faws4_request&X-Amz-Date=20210809T003601Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Signature=6d54f677b91d92304caf889eb0a1efbc2b3ebe3d24cefd9c17169b21816d1cdf",
            "count": 1
        }
    ]
}
  1. You can access the files via COS. cos://us-east/fhir-performance/AZ0gsQS05_RqZnHPhj57AfhYSIHU8VzwmnWjDCQdi2I/Patient_1.parquet/part-00000-dba6ec99-7fdb-4674-a202-0452d4435d18-c000-attempt_202108090034065817435166928016302_0003_m_000000_3.snappy.parquet
List of the Parquet Files
View of the Parquet File

Posted

in

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.