DFDP2 – DICOM to RDF Processing and Visualization Demo

This project is a self-contained web application that demonstrates a complete pipeline for:

Processing DICOM files
Extracting key metadata
Mapping it to semantic ontologies (DCAT, DCTERMS, ROO, SNOMED CT, FOAF, DICOM)
Generating an in-memory RDF knowledge graph
Providing a web-based interface for dataset discovery, SPARQL querying, and graph visualization

Built with Python, the app uses:

FastAPI for the web server
pydicom for handling DICOM files
rdflib for RDF generation and SPARQL querying
D3.js for frontend graph visualization

Features

Feature	Description
DICOM Processing	Extracts metadata from DICOM files sourced from TCIA
Metadata Extraction	Extracts Patient ID, Study Date, Modality, Accession Number, etc.
Semantic Mapping	Maps values to ROO, SNOMED CT, FOAF, and DICOM ontologies
RDF Generation	Builds a hierarchical knowledge graph (Patient → Study → Series)
SPARQL Endpoint	Supports SPARQL 1.1 queries via a web form
Metadata Catalog	Web interface styled after FAIR Data Platforms (Health DCAT-AP)
Knowledge Graph Visualization	In-browser force-directed graph using D3.js

Application Workflow

Data Pipeline

flowchart TD
    subgraph Preprocessing["Preprocessing — run once"]
        DCM[("dicom_files/\nDICOM Files")]
        FD["fetch_dicom.py\nextract & filter tags"]
        JSON[("dicom_metadata.json")]
        MAP["map_dicom_complete.py\nmap to RDF ontologies"]
        TTL[("dicom_mapped_with_catalog.ttl\nTurtle RDF graph")]
    end

    subgraph Server["FastAPI Server — main.py"]
        RDF_G[("In-Memory\nRDF Graph\nrdflib")]

        subgraph API["REST Endpoints"]
            E1["GET /api/catalog\nSPARQL → JSON"]
            E2["GET /api/visualize\nSPARQL → JSON"]
            E3["POST /sparql\nuser query → JSON"]
            E4["GET /rdf/{catalog}\nTurtle subgraph download"]
        end
    end

    subgraph Web["Web Interface — Jinja2 + Tailwind CSS"]
        W1["Catalog Page\n/catalog"]
        W2["Visualize Page\n/visualize — D3.js"]
        W3["SPARQL Page\n/sparql"]
    end

    DCM --> FD --> JSON --> MAP --> TTL
    TTL -->|"startup: g.parse()"| RDF_G
    RDF_G -->|SPARQL| E1 --> W1
    RDF_G -->|SPARQL| E2 --> W2
    RDF_G -->|SPARQL| E3 --> W3
    RDF_G --> E4

RDF Knowledge Graph Model

erDiagram
    Catalog {
        string title
        string publisher
        date   issued
        string language
    }
    Study["Dataset (Study)"] {
        string studyInstanceUID
        string title
        date   studyDate
        string accessionNumber
        string description
    }
    Patient {
        string patientID
        string name
        string sex
        string age
        string patientHistory
    }
    Series["Distribution (Series)"] {
        string seriesInstanceUID
        string modality
        string bodyPartExamined
        date   seriesDate
        string seriesDescription
        string protocolName
    }

    Catalog       ||--o{ Study   : "dcat:dataset"
    Study         }o--||  Patient : "dcterms:subject"
    Study         ||--o{ Series  : "dcat:distribution"

Installation

Prerequisites

Python 3.10+
uv — modern Python package manager

Setup with uv

# Clone the repository
git clone <repo-url>
cd DFDP2

# Create virtual environment and install all dependencies
uv sync

Open in GitHub Codespaces (Recommended)

Running the Application

# Step 1 — Extract metadata from DICOM files
uv run python fetch_dicom.py          # → dicom_metadata.json

# Step 2 — Map metadata to RDF/Turtle
uv run python map_dicom_complete.py   # → dicom_mapped_with_catalog.ttl

# Step 3 — Start the web server
uv run uvicorn main:app --reload

Visit: http://127.0.0.1:8000

Port conflict fix:

lsof -i :8000   # find the PID next to uvicorn
kill -9 <PID>

Web Interface

Page	URL	Description
Home	`/`	Landing page
Catalog	`/catalog`	Browse processed DICOM datasets
SPARQL	`/sparql`	Query the RDF graph
Visualize	`/visualize`	Interactive force-directed graph

SPARQL Query Examples

The graph follows a three-level hierarchy: Catalog → Study (+ Patient) → Series.

List patients, study titles and modalities:

PREFIX dcat:    <http://www.w3.org/ns/dcat#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dicom:   <http://dicom.nema.org/resources/ontology/DCM#>
PREFIX roo:     <http://www.cancerdata.org/roo/>

SELECT ?patientID ?studyTitle ?modality WHERE {
  ?catalog a dcat:Catalog .
  ?catalog dcat:dataset ?study .
  ?study dcterms:title ?studyTitle ;
         dcterms:subject ?patient ;
         dcat:distribution ?series .
  ?patient dicom:PatientID ?patientID .
  OPTIONAL { ?series dicom:Modality ?modality . }
}

Find all CT series with body part examined:

PREFIX dcat:    <http://www.w3.org/ns/dcat#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dicom:   <http://dicom.nema.org/resources/ontology/DCM#>

SELECT ?patientID ?seriesUID ?bodyPart WHERE {
  ?study a dcat:Dataset ;
         dcterms:subject ?patient ;
         dcat:distribution ?series .
  ?patient dicom:PatientID ?patientID .
  ?series  dicom:SeriesInstanceUID ?seriesUID ;
           dicom:Modality "CT" ;
           dicom:BodyPartExamined ?bodyPart .
}
ORDER BY ?patientID

Count series per patient:

PREFIX dcat:    <http://www.w3.org/ns/dcat#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dicom:   <http://dicom.nema.org/resources/ontology/DCM#>

SELECT ?patientID (COUNT(?series) AS ?numSeries) WHERE {
  ?study a dcat:Dataset ;
         dcterms:subject ?patient ;
         dcat:distribution ?series .
  ?patient dicom:PatientID ?patientID .
}
GROUP BY ?patientID
ORDER BY DESC(?numSeries)

Query patient demographics:

PREFIX dicom:   <http://dicom.nema.org/resources/ontology/DCM#>
PREFIX roo:     <http://www.cancerdata.org/roo/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dcat:    <http://www.w3.org/ns/dcat#>

SELECT DISTINCT ?patientID ?age ?sex ?reasonForStudy WHERE {
  ?study a dcat:Dataset ;
         dcterms:subject ?patient .
  ?patient dicom:PatientID ?patientID .
  OPTIONAL { ?patient roo:hasAge ?age . }
  OPTIONAL { ?patient roo:hasSex ?sex . }
  OPTIONAL { ?study roo:hasReasonForStudy ?reasonForStudy . }
}

Find all series from a specific manufacturer:

PREFIX dicom:   <http://dicom.nema.org/resources/ontology/DCM#>
PREFIX dcat:    <http://www.w3.org/ns/dcat#>
PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT DISTINCT ?patientID ?manufacturer ?modelName WHERE {
  ?study a dcat:Dataset ;
         dcterms:subject ?patient ;
         dcat:distribution ?series .
  ?patient dicom:PatientID ?patientID .
  ?series  dicom:Manufacturer ?manufacturer ;
           dicom:ManufacturerModelName ?modelName .
  FILTER(?manufacturer = "GE MEDICAL SYSTEMS")
}

Ontologies Used

Prefix	URI
`dcat`	http://www.w3.org/ns/dcat#
`dcterms`	http://purl.org/dc/terms/
`dicom`	http://dicom.nema.org/resources/ontology/DCM#
`foaf`	http://xmlns.com/foaf/0.1/
`roo`	http://www.cancerdata.org/roo/
`snomed`	http://snomed.info/sct/

Directory Structure

.
├── pyproject.toml          # Project metadata and dependencies (uv)
├── main.py                 # FastAPI application and SPARQL endpoints
├── fetch_dicom.py          # DICOM metadata extraction
├── map_dicom_complete.py   # DICOM JSON → RDF/Turtle mapping
├── dicom_files/            # Sample DICOM files (Catalog1, Catalog2)
├── templates/              # Jinja2 HTML templates
└── static/                 # CSS assets

Visualization

At /visualize, you'll find a D3.js-based force-directed graph of the RDF data:

Nodes are color-coded by type (Catalog, Study, Patient, Series, Modality, Body Part)
Drag nodes to explore relationships
Hover over nodes to view URIs and labels

To-Do / Ideas for Future

Persistent RDF store (e.g., Blazegraph, Apache Jena Fuseki)
Support for real-world DICOM tags and vocabularies
Authentication for upload and SPARQL features
Multi-user catalog and permission system

Screenshots

License

MIT License — Free to use, modify, and distribute with proper attribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DFDP2 – DICOM to RDF Processing and Visualization Demo

Features

Application Workflow

Data Pipeline

RDF Knowledge Graph Model

Installation

Prerequisites

Setup with uv

Open in GitHub Codespaces (Recommended)

Running the Application

Web Interface

SPARQL Query Examples

Ontologies Used

Directory Structure

Visualization

To-Do / Ideas for Future

Screenshots

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
dicom_files		dicom_files
static		static
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fetch_dicom.py		fetch_dicom.py
main.py		main.py
map_dicom_complete.py		map_dicom_complete.py
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

DFDP2 – DICOM to RDF Processing and Visualization Demo

Features

Application Workflow

Data Pipeline

RDF Knowledge Graph Model

Installation

Prerequisites

Setup with uv

Open in GitHub Codespaces (Recommended)

Running the Application

Web Interface

SPARQL Query Examples

Ontologies Used

Directory Structure

Visualization

To-Do / Ideas for Future

Screenshots

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages