diff --git a/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb b/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb
index f307890146..333c746d16 100644
--- a/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb
+++ b/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb
@@ -1 +1,273 @@
-{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Train arcgis.learn models on multiple GPUs"]}, {"cell_type": "markdown", "metadata": {}, "source": ["
"]}, {"cell_type": "markdown", "metadata": {}, "source": ["
Accelerating AI with GPUs"]}, {"cell_type": "markdown", "metadata": {}, "source": ["In this guide, we will walk you through how `arcgis.learn` models with PyTorch backend, support training across multiple GPUs.\n", "You can verify how many GPUs are available to PyTorch by runing the following commands: \n", "\n", "`import torch \n", "print ('Available devices ', torch.cuda.device_count())\n", "`\n", "\n", "PyTorch provides capabilities to utilize multiple GPUs in two ways:\n", "- Data Parallelism\n", "- Model Parallelism\n", "\n", "`arcgis.learn` uses one of the two ways to train models using multiple GPUs.\n", "Each of the two ways has its own significance and both offer an easy means of wrapping your code to add the capability of training the model on multiple GPUs.\n", "\n", "**Data Parallelism**: Data Parallelism is when we split the mini-batch of samples into multiple smaller mini-batches and run the computation for each of the smaller mini-batches in parallel across multiple GPUs.\n", "\n", "Data Parallelism is implemented using `torch.nn.DataParallel`. You can wrap a Module in `DataParallel` and it will be parallelized over multiple GPUs in the batch dimension. For more details [click here](https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html). \n", "\n", "For certain models, `arcgis.learn` models already provide support for data parallelism in order to enhance model performance. This makes it easy for users to utilize multiple GPUs while training model on a single machine. Below is a detailed list of the models that have `DataParallel` support.\n", "\n", "- `FeatureClassifier`\n", "- `MaskRCNN`\n", "- `MultiTaskRoadExtractor`\n", "- `ConnectNet`\n", "- `PointCNN`\n", "- `SingleShotDetetor`\n", "- `UnetClassifier`\n", "\n", "You can set a subset of GPUs to be used for training your model by running the following command in the first cell:\n", "\n", "`\n", "import os\n", "os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # for setting first 2 GPUs\n", "`"]}, {"cell_type": "markdown", "metadata": {}, "source": ["**Model parallelism**: is when we split a model between multiple devices or nodes (such as GPU-equipped instances) for creating an efficient training pipeline to maximize GPU utilization. Model parallelism is implemented using `torch.nn.DistributedDataParallel`. \n", "\n", "This approach parallelizes the application of the given module by splitting the input across the specified devices by chunking in the batch dimension. The module is replicated on each machine and each device, and each such replica handles a portion of the input. During the backwards pass, gradients from each node are averaged. To read more, [click here](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html).\n", "Currently, the following models available in arcgis.learn support `DistributedDataParallel`: \n", "\n", "- `UnetClassifier`\n", "- `DeepLab`\n", "- `PSPNetClassifier`\n", "- `MaskRCNN`\n", "- `FasterRCNN`\n", "- `HEDEdgeDetector`\n", "- `BDCNEdgeDetector`\n", "- `ModelExtension`"]}, {"cell_type": "markdown", "metadata": {}, "source": ["The following blocks of code download a script and training data that can be used to test the functionality of multi-gpu support, given a user has multiple GPUs. "]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": ["from arcgis.gis import GIS"]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": ["gis = GIS()"]}, {"cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": ["script_item = gis.content.get('afd1c9a88a6f4f04896b4172c0f3a78c')"]}, {"cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [{"data": {"text/plain": ["'train_model.zip'"]}, "execution_count": 10, "metadata": {}, "output_type": "execute_result"}], "source": ["script_item.name"]}, {"cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": ["filepath = script_item.download(file_name=script_item.name)"]}, {"cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": ["import zipfile\n", "import os\n", "from pathlib import Path\n", "with zipfile.ZipFile(filepath, 'r') as zip_ref:\n", " zip_ref.extractall(Path(filepath).parent)"]}, {"cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": ["script_path = Path(os.path.join(os.path.splitext(filepath)[0]) + '.py')"]}, {"cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [{"data": {"text/plain": ["WindowsPath('C:/Users/Admin/AppData/Local/Temp/train_model.py')"]}, "execution_count": 14, "metadata": {}, "output_type": "execute_result"}], "source": ["script_path"]}, {"cell_type": "markdown", "metadata": {}, "source": ["To run multiple GPUs while training your model, you can download the script from above and/or create one for your own training data. Execute the command shown below in your command prompt:\n", "\n", "`python -m torch.distributed.launch --nproc_per_node=2 train-model.py`\n", "\n", "`nproc_per_node` = number of GPU instances per machine."]}, {"cell_type": "markdown", "metadata": {}, "source": ["For detailed arguments of (Distributed data Parallel)DDP, please refer to [this page](https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py). "]}, {"cell_type": "markdown", "metadata": {}, "source": ["
"]}, {"cell_type": "markdown", "metadata": {}, "source": ["To verify that your GPUs are utilized for training, run `nvidia-smi` as shown below:"]}, {"cell_type": "markdown", "metadata": {}, "source": ["
"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### References:"]}, {"cell_type": "markdown", "metadata": {}, "source": ["- https://www.esri.com/arcgis-blog/products/arcgis-pro/imagery/deep-learning-with-arcgis-pro-tips-tricks/\n", "- https://pytorch.org/tutorials/intermediate/ddp_tutorial.html\n", "- https://fastai1.fast.ai/distributed.html\n", "- https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py\n", "- https://towardsdatascience.com/how-to-scale-training-on-multiple-gpus-dae1041f49d2"]}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9"}}, "nbformat": 4, "nbformat_minor": 4}
\ No newline at end of file
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Train arcgis.learn models on multiple GPUs"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Accelerating AI with GPUs"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In this guide, we walk you through how `arcgis.learn` models with PyTorch backend, support training across multiple GPUs.\n",
+ "You can verify how many GPUs are available to PyTorch by running the following commands: \n",
+ "\n",
+ "`import torch \n",
+ "print ('Available devices ', torch.cuda.device_count())\n",
+ "`\n",
+ "\n",
+ "PyTorch provides the capability to utilize multiple GPUs in two ways:\n",
+ "- Data Parallelism\n",
+ "- Distributed Data Parallelism\n",
+ "\n",
+ "`arcgis.learn` uses one of these two methods to train models using multiple GPUs.\n",
+ "Each method has its own significance and both offer an easy way to wrap your code to enable training on multiple GPUs.\n",
+ "\n",
+ "**Data Parallelism**: Data Parallelism refers to splitting the mini-batch of samples into multiple smaller mini-batches and running the computation for each of the smaller mini-batches in parallel across multiple GPUs on a single machine.\n",
+ "\n",
+ "Data Parallelism is implemented using `torch.nn.DataParallel`. You can wrap a module in `DataParallel`, and it will be parallelized over multiple GPUs in the batch dimension. For more details [click here](https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html). \n",
+ "\n",
+ "For certain models, `arcgis.learn` already provide support for data parallelism to enhance model performance. This makes it easy for users to utilize multiple GPUs while training a model on a single machine. Below is a list of the models that have `DataParallel` support.\n",
+ "\n",
+ "- `FeatureClassifier`\n",
+ "- `SingleShotDetector`\n",
+ "\n",
+ "You can set a subset of GPUs to be used for training your model by running the following command in the first cell:\n",
+ "\n",
+ "`\n",
+ "import os\n",
+ "os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # for setting the first 2 GPUs\n",
+ "`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Distributed Data Parallelism (DDP)**: In DDP, data is partitioned across multiple devices or nodes (such as GPU-equipped instances). It is implemented using `torch.nn.DistributedDataParallel`. Unlike data parallelism, DDP supports model parallelism across multiple machines, making it significantly more scalable. \n",
+ "\n",
+ "This approach parallelizes by chunking data along the batch dimension and distributing it across specified devices. Ideally, one process is spawned for each model replica; however, a single replica may span multiple devices if the model is large.\n",
+ "During the backward pass, DDP automatically synchronizes gradients across all processes to ensure consistent model updates.\n",
+ "To read more, [click here](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html).\n",
+ "Currently, the following models in `arcgis.learn` support `DistributedDataParallel`: \n",
+ "\n",
+ "- `UnetClassifier`\n",
+ "- `DeepLab`\n",
+ "- `PSPNetClassifier`\n",
+ "- `MaskRCNN`\n",
+ "- `FasterRCNN`\n",
+ "- `HEDEdgeDetector`\n",
+ "- `BDCNEdgeDetector`\n",
+ "- `ModelExtension`\n",
+ "- `SuperResolution`\n",
+ "- `DETReg`\n",
+ "- `MaXDeepLab`\n",
+ "- `MMDetection`\n",
+ "- `MMSegmentation`\n",
+ "- `RTDetrV2`\n",
+ "- `SamLoRA`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The following blocks of code download a sample script that you can use to test the functionality of multi-GPU support, provided you have multiple GPUs. \n",
+ "You can modify the script downloaded at `script_path` to include your model and dataset."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from arcgis.gis import GIS"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "gis = GIS()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "script_item = gis.content.get(\"afd1c9a88a6f4f04896b4172c0f3a78c\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'train_model.zip'"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "script_item.name"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "filepath = script_item.download(file_name=script_item.name)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import zipfile\n",
+ "import os\n",
+ "from pathlib import Path\n",
+ "\n",
+ "with zipfile.ZipFile(filepath, \"r\") as zip_ref:\n",
+ " zip_ref.extractall(Path(filepath).parent)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "script_path = Path(os.path.join(os.path.splitext(filepath)[0]) + \".py\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "WindowsPath('C:/Users/Admin/AppData/Local/Temp/train_model.py')"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "script_path"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To run multiple GPUs while training your model, you can download the script from above and/or create one for your own training data. Execute the command shown below in your command prompt:\n",
+ "\n",
+ "`python -m torch.distributed.launch --nproc_per_node=2 train_model.py`\n",
+ "\n",
+ "`nproc_per_node` = number of GPU instances to be used on the given machine."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "For detailed arguments of DDP (Distributed Data Parallel), please refer to [this page](https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py). "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To verify that your GPUs are being utilized for training, run `nvidia-smi` as shown below:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### References:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "- https://www.esri.com/arcgis-blog/products/arcgis-pro/imagery/deep-learning-with-arcgis-pro-tips-tricks/\n",
+ "- https://pytorch.org/tutorials/intermediate/ddp_tutorial.html\n",
+ "- https://fastai1.fast.ai/distributed.html\n",
+ "- https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py\n",
+ ""
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "dl-embedding_test_01",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.13.10"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}