Originally posted to https://community.ibm.com/community/user/blogs/paul-bastide/2026/03/20/docling-with-ibm-power
If you’ve been following the rapid evolution of document parsing in AI, you’ve likely encountered Docling. It’s a powerhouse for converting complex PDFs and documents into machine-readable formats. The AI Services team and the IBM Power Python Ecosystem team have provided all of the requirements so you can use docling and as it iterates rapidly, stay up-to-date.
For python developers using IBM Power, this article provides a recipe to use docling with IBM Power. You can also learn more about the using the Python Ecosystem at https://community.ibm.com/community/user/blogs/janani-janakiraman/2025/09/10/developing-apps-using-python-packages-on-ibm-power
The Recipe: Step-by-Step Installation
This guide assumes you are working in a Linux environment (specifically optimized for ppc64le architectures, though the logic holds for most setups).
1. Prepare Your Environment
Start by setting up a fresh virtual environment to avoid dependency issues
python3 -m venv ./test-venv
source ./test-venv/bin/activate
python3.12 -m venv --upgrade test-venv/
2. Define the Requirements
The AI Services team has identified a specific “golden set” of versions that play well together. Create a requirements.txt file containing the necessary packages, including docling, torch, and transformers.
accelerate==1.13.0
annotated-doc==0.0.4
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
attrs==26.1.0
beautifulsoup4==4.14.3
certifi==2026.2.25
charset-normalizer==3.4.6
click==8.3.1
colorlog==6.10.1
defusedxml==0.7.1
dill==0.4.1
docling==2.77.0
docling-core==2.70.2
docling-ibm-models==3.12.0
docling-parse==5.3.2
et_xmlfile==2.0.0
Faker==40.11.0
filelock==3.25.2
filetype==1.2.0
fsspec==2026.2.0
huggingface_hub==0.36.2
idna==3.11
Jinja2==3.1.6
jsonlines==4.0.0
jsonref==1.1.0
jsonschema==4.26.0
jsonschema-specifications==2025.9.1
latex2mathml==3.79.0
lxml==6.0.2
markdown-it-py==4.0.0
marko==2.2.2
MarkupSafe==3.0.3
mdurl==0.1.2
mpire==2.10.2
mpmath==1.3.0
multiprocess==0.70.19
networkx==3.6.1
numpy==2.4.1
omegaconf==2.3.0
opencv-python==4.10.0.84+ppc64le2
openpyxl==3.1.5
packaging==26.0
pandas==2.3.3
pillow==12.1.1
pip==26.0.1
pluggy==1.6.0
polyfactory==3.3.0
psutil==7.2.2
pyclipper==1.4.0
pydantic==2.12.5
pydantic_core==2.41.5
pydantic-settings==2.13.1
Pygments==2.19.2
pylatexenc==2.10
pypdfium2==5.6.0
python-dateutil==2.9.0.post0
python-docx==1.2.0
python-dotenv==1.2.2
python-pptx==1.0.2
pytz==2026.1.post1
PyYAML==6.0.3
rapidocr==3.7.0
referencing==0.37.0
regex==2026.2.28
requests==2.32.5
rich==14.3.3
rpds-py==0.30.0
rtree==1.4.1
safetensors==0.7.0
scipy==1.17.0
semchunk==3.2.5
shapely==2.1.2
shellingham==1.5.4
six==1.17.0
soupsieve==2.8.3
sympy==1.14.0
tabulate==0.10.0
tokenizers==0.22.2
torch==2.9.1
torchvision==0.24.1
tqdm==4.67.3
transformers==4.57.6
tree-sitter==0.25.2
tree-sitter-c==0.24.1
tree-sitter-javascript==0.25.0
tree-sitter-python==0.25.0
tree-sitter-typescript==0.23.2
typer==0.21.2
typing_extensions==4.15.0
typing-inspection==0.4.2
tzdata==2025.3
urllib3==2.6.3
xlsxwriter==3.2.9
Note: Ensure you include the full list of dependencies (like docling==2.77.0 and docling-core==2.66.0) to maintain stability across your build.
If you need OCR, you will need to run:
yum install -y --setopt=tsflags=nodocs python3.12-devel python3.12-pip \
lcms2-devel openblas-devel freetype libicu libjpeg-turbo && \
yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \
yum install -y spatialindex-devel
3. The Installation Secret Sauce
Before running the install, ensure pip is at its latest version. Then, use the --extra-index-url flag to point to the optimized IBM developer wheels. This is the trick to getting the faster compilation mentioned earlier.
pip install --upgrade pip
pip install -r requirements.txt \
--extra-index-url=https://wheels.developerfirst.ibm.com/ppc64le/linux \
--prefer-binary
Verifying the Build
Once the installation completes, it’s a good idea to run a “smoke test” to ensure the models can be fetched properly. You can use a simple script to trigger the model downloads:
# download_docling_models.py
from docling.pipeline.standard_pdf_pipeline import StandardPdfPipeline
# This triggers the download of Layout & TableFormer models
pipeline = StandardPdfPipeline()
print("Download complete.")
When you see the output Downloading ds4sd--docling-models (Layout & TableFormer)..., you’re officially ready to start parsing.
Why This Matters
By focusing on the dependencies rather than the wheel itself, the AI Services team has given us a way to stay agile. We get the latest features of Docling without the overhead of waiting for official distribution builds to catch up to the repo’s velocity.
Special credit to Yussuf and his test!