Skip to content

priya-gitTest/DFDP2

Repository files navigation

DFDP2 – DICOM to RDF Processing and Visualization Demo

This project is a self-contained web application that demonstrates a complete pipeline for:

  • Processing DICOM files
  • Extracting key metadata
  • Mapping it to semantic ontologies (DCAT, DCTERMS, ROO, SNOMED CT, FOAF, DICOM)
  • Generating an in-memory RDF knowledge graph
  • Providing a web-based interface for dataset discovery, SPARQL querying, and graph visualization

Built with Python, the app uses:

  • FastAPI for the web server
  • pydicom for handling DICOM files
  • rdflib for RDF generation and SPARQL querying
  • D3.js for frontend graph visualization

Features

Feature Description
DICOM Processing Extracts metadata from DICOM files sourced from TCIA
Metadata Extraction Extracts Patient ID, Study Date, Modality, Accession Number, etc.
Semantic Mapping Maps values to ROO, SNOMED CT, FOAF, and DICOM ontologies
RDF Generation Builds a hierarchical knowledge graph (Patient → Study → Series)
SPARQL Endpoint Supports SPARQL 1.1 queries via a web form
Metadata Catalog Web interface styled after FAIR Data Platforms (Health DCAT-AP)
Knowledge Graph Visualization In-browser force-directed graph using D3.js

Application Workflow

Data Pipeline

flowchart TD
    subgraph Preprocessing["Preprocessing — run once"]
        DCM[("dicom_files/\nDICOM Files")]
        FD["fetch_dicom.py\nextract & filter tags"]
        JSON[("dicom_metadata.json")]
        MAP["map_dicom_complete.py\nmap to RDF ontologies"]
        TTL[("dicom_mapped_with_catalog.ttl\nTurtle RDF graph")]
    end

    subgraph Server["FastAPI Server — main.py"]
        RDF_G[("In-Memory\nRDF Graph\nrdflib")]

        subgraph API["REST Endpoints"]
            E1["GET /api/catalog\nSPARQL → JSON"]
            E2["GET /api/visualize\nSPARQL → JSON"]
            E3["POST /sparql\nuser query → JSON"]
            E4["GET /rdf/{catalog}\nTurtle subgraph download"]
        end
    end

    subgraph Web["Web Interface — Jinja2 + Tailwind CSS"]
        W1["Catalog Page\n/catalog"]
        W2["Visualize Page\n/visualize — D3.js"]
        W3["SPARQL Page\n/sparql"]
    end

    DCM --> FD --> JSON --> MAP --> TTL
    TTL -->|"startup: g.parse()"| RDF_G
    RDF_G -->|SPARQL| E1 --> W1
    RDF_G -->|SPARQL| E2 --> W2
    RDF_G -->|SPARQL| E3 --> W3
    RDF_G --> E4
Loading

RDF Knowledge Graph Model

erDiagram
    Catalog {
        string title
        string publisher
        date   issued
        string language
    }
    Study["Dataset (Study)"] {
        string studyInstanceUID
        string title
        date   studyDate
        string accessionNumber
        string description
    }
    Patient {
        string patientID
        string name
        string sex
        string age
        string patientHistory
    }
    Series["Distribution (Series)"] {
        string seriesInstanceUID
        string modality
        string bodyPartExamined
        date   seriesDate
        string seriesDescription
        string protocolName
    }

    Catalog       ||--o{ Study   : "dcat:dataset"
    Study         }o--||  Patient : "dcterms:subject"
    Study         ||--o{ Series  : "dcat:distribution"
Loading

Installation

Prerequisites

  • Python 3.10+
  • uv — modern Python package manager

Setup with uv

# Clone the repository
git clone <repo-url>
cd DFDP2

# Create virtual environment and install all dependencies
uv sync

Open in GitHub Codespaces (Recommended)

Open in GitHub Codespaces


Running the Application

# Step 1 — Extract metadata from DICOM files
uv run python fetch_dicom.py          # → dicom_metadata.json

# Step 2 — Map metadata to RDF/Turtle
uv run python map_dicom_complete.py   # → dicom_mapped_with_catalog.ttl

# Step 3 — Start the web server
uv run uvicorn main:app --reload

Visit: http://127.0.0.1:8000

Port conflict fix:

lsof -i :8000   # find the PID next to uvicorn
kill -9 <PID>

Web Interface

Page URL Description
Home / Landing page
Catalog /catalog Browse processed DICOM datasets
SPARQL /sparql Query the RDF graph
Visualize /visualize Interactive force-directed graph

SPARQL Query Examples

The graph follows a three-level hierarchy: Catalog → Study (+ Patient) → Series.

List patients, study titles and modalities:

PREFIX dcat:    <http://www.w3.org/ns/dcat#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dicom:   <http://dicom.nema.org/resources/ontology/DCM#>
PREFIX roo:     <http://www.cancerdata.org/roo/>

SELECT ?patientID ?studyTitle ?modality WHERE {
  ?catalog a dcat:Catalog .
  ?catalog dcat:dataset ?study .
  ?study dcterms:title ?studyTitle ;
         dcterms:subject ?patient ;
         dcat:distribution ?series .
  ?patient dicom:PatientID ?patientID .
  OPTIONAL { ?series dicom:Modality ?modality . }
}

Find all CT series with body part examined:

PREFIX dcat:    <http://www.w3.org/ns/dcat#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dicom:   <http://dicom.nema.org/resources/ontology/DCM#>

SELECT ?patientID ?seriesUID ?bodyPart WHERE {
  ?study a dcat:Dataset ;
         dcterms:subject ?patient ;
         dcat:distribution ?series .
  ?patient dicom:PatientID ?patientID .
  ?series  dicom:SeriesInstanceUID ?seriesUID ;
           dicom:Modality "CT" ;
           dicom:BodyPartExamined ?bodyPart .
}
ORDER BY ?patientID

Count series per patient:

PREFIX dcat:    <http://www.w3.org/ns/dcat#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dicom:   <http://dicom.nema.org/resources/ontology/DCM#>

SELECT ?patientID (COUNT(?series) AS ?numSeries) WHERE {
  ?study a dcat:Dataset ;
         dcterms:subject ?patient ;
         dcat:distribution ?series .
  ?patient dicom:PatientID ?patientID .
}
GROUP BY ?patientID
ORDER BY DESC(?numSeries)

Query patient demographics:

PREFIX dicom:   <http://dicom.nema.org/resources/ontology/DCM#>
PREFIX roo:     <http://www.cancerdata.org/roo/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dcat:    <http://www.w3.org/ns/dcat#>

SELECT DISTINCT ?patientID ?age ?sex ?reasonForStudy WHERE {
  ?study a dcat:Dataset ;
         dcterms:subject ?patient .
  ?patient dicom:PatientID ?patientID .
  OPTIONAL { ?patient roo:hasAge ?age . }
  OPTIONAL { ?patient roo:hasSex ?sex . }
  OPTIONAL { ?study roo:hasReasonForStudy ?reasonForStudy . }
}

Find all series from a specific manufacturer:

PREFIX dicom:   <http://dicom.nema.org/resources/ontology/DCM#>
PREFIX dcat:    <http://www.w3.org/ns/dcat#>
PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT DISTINCT ?patientID ?manufacturer ?modelName WHERE {
  ?study a dcat:Dataset ;
         dcterms:subject ?patient ;
         dcat:distribution ?series .
  ?patient dicom:PatientID ?patientID .
  ?series  dicom:Manufacturer ?manufacturer ;
           dicom:ManufacturerModelName ?modelName .
  FILTER(?manufacturer = "GE MEDICAL SYSTEMS")
}

Ontologies Used

Prefix URI
dcat http://www.w3.org/ns/dcat#
dcterms http://purl.org/dc/terms/
dicom http://dicom.nema.org/resources/ontology/DCM#
foaf http://xmlns.com/foaf/0.1/
roo http://www.cancerdata.org/roo/
snomed http://snomed.info/sct/

Directory Structure

.
├── pyproject.toml          # Project metadata and dependencies (uv)
├── main.py                 # FastAPI application and SPARQL endpoints
├── fetch_dicom.py          # DICOM metadata extraction
├── map_dicom_complete.py   # DICOM JSON → RDF/Turtle mapping
├── dicom_files/            # Sample DICOM files (Catalog1, Catalog2)
├── templates/              # Jinja2 HTML templates
└── static/                 # CSS assets

Visualization

At /visualize, you'll find a D3.js-based force-directed graph of the RDF data:

  • Nodes are color-coded by type (Catalog, Study, Patient, Series, Modality, Body Part)
  • Drag nodes to explore relationships
  • Hover over nodes to view URIs and labels

To-Do / Ideas for Future

  • Persistent RDF store (e.g., Blazegraph, Apache Jena Fuseki)
  • Support for real-world DICOM tags and vocabularies
  • Authentication for upload and SPARQL features
  • Multi-user catalog and permission system

Screenshots

image image image image

License

MIT License — Free to use, modify, and distribute with proper attribution.

About

DICOM to RDF Processing and Visualization Demo

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors