This project is a self-contained web application that demonstrates a complete pipeline for:
- Processing DICOM files
- Extracting key metadata
- Mapping it to semantic ontologies (DCAT, DCTERMS, ROO, SNOMED CT, FOAF, DICOM)
- Generating an in-memory RDF knowledge graph
- Providing a web-based interface for dataset discovery, SPARQL querying, and graph visualization
Built with Python, the app uses:
FastAPIfor the web serverpydicomfor handling DICOM filesrdflibfor RDF generation and SPARQL queryingD3.jsfor frontend graph visualization
| Feature | Description |
|---|---|
| DICOM Processing | Extracts metadata from DICOM files sourced from TCIA |
| Metadata Extraction | Extracts Patient ID, Study Date, Modality, Accession Number, etc. |
| Semantic Mapping | Maps values to ROO, SNOMED CT, FOAF, and DICOM ontologies |
| RDF Generation | Builds a hierarchical knowledge graph (Patient → Study → Series) |
| SPARQL Endpoint | Supports SPARQL 1.1 queries via a web form |
| Metadata Catalog | Web interface styled after FAIR Data Platforms (Health DCAT-AP) |
| Knowledge Graph Visualization | In-browser force-directed graph using D3.js |
flowchart TD
subgraph Preprocessing["Preprocessing — run once"]
DCM[("dicom_files/\nDICOM Files")]
FD["fetch_dicom.py\nextract & filter tags"]
JSON[("dicom_metadata.json")]
MAP["map_dicom_complete.py\nmap to RDF ontologies"]
TTL[("dicom_mapped_with_catalog.ttl\nTurtle RDF graph")]
end
subgraph Server["FastAPI Server — main.py"]
RDF_G[("In-Memory\nRDF Graph\nrdflib")]
subgraph API["REST Endpoints"]
E1["GET /api/catalog\nSPARQL → JSON"]
E2["GET /api/visualize\nSPARQL → JSON"]
E3["POST /sparql\nuser query → JSON"]
E4["GET /rdf/{catalog}\nTurtle subgraph download"]
end
end
subgraph Web["Web Interface — Jinja2 + Tailwind CSS"]
W1["Catalog Page\n/catalog"]
W2["Visualize Page\n/visualize — D3.js"]
W3["SPARQL Page\n/sparql"]
end
DCM --> FD --> JSON --> MAP --> TTL
TTL -->|"startup: g.parse()"| RDF_G
RDF_G -->|SPARQL| E1 --> W1
RDF_G -->|SPARQL| E2 --> W2
RDF_G -->|SPARQL| E3 --> W3
RDF_G --> E4
erDiagram
Catalog {
string title
string publisher
date issued
string language
}
Study["Dataset (Study)"] {
string studyInstanceUID
string title
date studyDate
string accessionNumber
string description
}
Patient {
string patientID
string name
string sex
string age
string patientHistory
}
Series["Distribution (Series)"] {
string seriesInstanceUID
string modality
string bodyPartExamined
date seriesDate
string seriesDescription
string protocolName
}
Catalog ||--o{ Study : "dcat:dataset"
Study }o--|| Patient : "dcterms:subject"
Study ||--o{ Series : "dcat:distribution"
- Python 3.10+
- uv — modern Python package manager
# Clone the repository
git clone <repo-url>
cd DFDP2
# Create virtual environment and install all dependencies
uv sync# Step 1 — Extract metadata from DICOM files
uv run python fetch_dicom.py # → dicom_metadata.json
# Step 2 — Map metadata to RDF/Turtle
uv run python map_dicom_complete.py # → dicom_mapped_with_catalog.ttl
# Step 3 — Start the web server
uv run uvicorn main:app --reloadVisit: http://127.0.0.1:8000
Port conflict fix:
lsof -i :8000 # find the PID next to uvicorn
kill -9 <PID>| Page | URL | Description |
|---|---|---|
| Home | / |
Landing page |
| Catalog | /catalog |
Browse processed DICOM datasets |
| SPARQL | /sparql |
Query the RDF graph |
| Visualize | /visualize |
Interactive force-directed graph |
The graph follows a three-level hierarchy: Catalog → Study (+ Patient) → Series.
List patients, study titles and modalities:
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dicom: <http://dicom.nema.org/resources/ontology/DCM#>
PREFIX roo: <http://www.cancerdata.org/roo/>
SELECT ?patientID ?studyTitle ?modality WHERE {
?catalog a dcat:Catalog .
?catalog dcat:dataset ?study .
?study dcterms:title ?studyTitle ;
dcterms:subject ?patient ;
dcat:distribution ?series .
?patient dicom:PatientID ?patientID .
OPTIONAL { ?series dicom:Modality ?modality . }
}Find all CT series with body part examined:
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dicom: <http://dicom.nema.org/resources/ontology/DCM#>
SELECT ?patientID ?seriesUID ?bodyPart WHERE {
?study a dcat:Dataset ;
dcterms:subject ?patient ;
dcat:distribution ?series .
?patient dicom:PatientID ?patientID .
?series dicom:SeriesInstanceUID ?seriesUID ;
dicom:Modality "CT" ;
dicom:BodyPartExamined ?bodyPart .
}
ORDER BY ?patientIDCount series per patient:
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dicom: <http://dicom.nema.org/resources/ontology/DCM#>
SELECT ?patientID (COUNT(?series) AS ?numSeries) WHERE {
?study a dcat:Dataset ;
dcterms:subject ?patient ;
dcat:distribution ?series .
?patient dicom:PatientID ?patientID .
}
GROUP BY ?patientID
ORDER BY DESC(?numSeries)Query patient demographics:
PREFIX dicom: <http://dicom.nema.org/resources/ontology/DCM#>
PREFIX roo: <http://www.cancerdata.org/roo/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT DISTINCT ?patientID ?age ?sex ?reasonForStudy WHERE {
?study a dcat:Dataset ;
dcterms:subject ?patient .
?patient dicom:PatientID ?patientID .
OPTIONAL { ?patient roo:hasAge ?age . }
OPTIONAL { ?patient roo:hasSex ?sex . }
OPTIONAL { ?study roo:hasReasonForStudy ?reasonForStudy . }
}Find all series from a specific manufacturer:
PREFIX dicom: <http://dicom.nema.org/resources/ontology/DCM#>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT DISTINCT ?patientID ?manufacturer ?modelName WHERE {
?study a dcat:Dataset ;
dcterms:subject ?patient ;
dcat:distribution ?series .
?patient dicom:PatientID ?patientID .
?series dicom:Manufacturer ?manufacturer ;
dicom:ManufacturerModelName ?modelName .
FILTER(?manufacturer = "GE MEDICAL SYSTEMS")
}| Prefix | URI |
|---|---|
dcat |
http://www.w3.org/ns/dcat# |
dcterms |
http://purl.org/dc/terms/ |
dicom |
http://dicom.nema.org/resources/ontology/DCM# |
foaf |
http://xmlns.com/foaf/0.1/ |
roo |
http://www.cancerdata.org/roo/ |
snomed |
http://snomed.info/sct/ |
.
├── pyproject.toml # Project metadata and dependencies (uv)
├── main.py # FastAPI application and SPARQL endpoints
├── fetch_dicom.py # DICOM metadata extraction
├── map_dicom_complete.py # DICOM JSON → RDF/Turtle mapping
├── dicom_files/ # Sample DICOM files (Catalog1, Catalog2)
├── templates/ # Jinja2 HTML templates
└── static/ # CSS assets
At /visualize, you'll find a D3.js-based force-directed graph of the RDF data:
- Nodes are color-coded by type (Catalog, Study, Patient, Series, Modality, Body Part)
- Drag nodes to explore relationships
- Hover over nodes to view URIs and labels
- Persistent RDF store (e.g., Blazegraph, Apache Jena Fuseki)
- Support for real-world DICOM tags and vocabularies
- Authentication for upload and SPARQL features
- Multi-user catalog and permission system
MIT License — Free to use, modify, and distribute with proper attribution.