roboflow
diff --git a/‎docs/foundation/depth_estimation.md‎
Lines changed: 7 additions & 0 deletions b/‎docs/foundation/depth_estimation.md‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎docs/foundation/florence2.md‎
Lines changed: 9 additions & 0 deletions b/‎docs/foundation/florence2.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/foundation/gaze.md‎
Lines changed: 7 additions & 0 deletions b/‎docs/foundation/gaze.md‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎docs/foundation/moondream2.md‎
Lines changed: 7 additions & 0 deletions b/‎docs/foundation/moondream2.md‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎docs/foundation/sam2.md‎
Lines changed: 7 additions & 0 deletions b/‎docs/foundation/sam2.md‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎docs/foundation/sam3_3d.md‎
Lines changed: 8 additions & 1 deletion b/‎docs/foundation/sam3_3d.md‎
Lines changed: 8 additions & 1 deletion
diff --git a/‎docs/foundation/smolvlm.md‎
Lines changed: 7 additions & 0 deletions b/‎docs/foundation/smolvlm.md‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎inference/core/interfaces/http/http_api.py‎
Lines changed: 141 additions & 5 deletions b/‎inference/core/interfaces/http/http_api.py‎
Lines changed: 141 additions & 5 deletions
diff --git a/‎inference/core/workflows/core_steps/models/foundation/depth_estimation/v1.py‎
Lines changed: 52 additions & 2 deletions b/‎inference/core/workflows/core_steps/models/foundation/depth_estimation/v1.py‎
Lines changed: 52 additions & 2 deletions
@@ -7,6 +7,13 @@ You can use Depth-Anything-V2-Small to estimate the depth of objects in images,
 
 You can deploy Depth-Anything-V2-Small with Inference.
 
+### Execution Modes
+
+Depth Estimation supports both local and remote execution modes when used in workflows:
+
+- **Local execution**: The model runs directly on your inference server (GPU recommended for faster inference)
+- **Remote execution**: The model can be invoked via HTTP API on a remote inference server using the `depth_estimation()` client method
+
 ### Installation
 
 To install inference with the extra dependencies necessary to run Depth-Anything-V2-Small, run
 
@@ -18,6 +18,15 @@ You can use Inference for all the Florence-2 tasks above.
 
 The text in the parentheses are the task prompts you will need to use each task.
 
+### Execution Modes
+
+Florence-2 supports both local and remote execution modes when used in workflows:
+
+- **Local execution**: The model runs directly on your inference server (GPU recommended)
+- **Remote execution**: The model can be invoked via HTTP API on a remote inference server
+
+When using Florence-2 in a workflow, you can specify the execution mode to control where inference happens.
+
 ### How to Use Florence-2
 
 ??? Note "Install `inference`"
 
@@ -2,6 +2,13 @@
 
 You can detect the direction in which someone is looking using the L2CS-Net model.
 
+## Execution Modes
+
+L2CS-Net gaze detection supports both local and remote execution modes when used in workflows:
+
+- **Local execution**: The model runs directly on your inference server
+- **Remote execution**: The model can be invoked via HTTP API on a remote inference server using `detect_gazes()` client method
+
 ## How to Use L2CS-Net
 
 To use L2CS-Net with Inference, you will need a Roboflow API key. If you don't already have a Roboflow account, <a href="https://app.roboflow.com" target="_blank">sign up for a free Roboflow account</a>. Then, retrieve your API key from the Roboflow dashboard. Run the following command to set your API key in your coding environment:
 
@@ -2,6 +2,13 @@
 
 You can deploy Moondream2 with Inference.
 
+### Execution Modes
+
+Moondream2 supports both local and remote execution modes when used in workflows:
+
+- **Local execution**: The model runs directly on your inference server (GPU recommended)
+- **Remote execution**: The model can be invoked via HTTP API on a remote inference server using the `infer_lmm()` client method
+
 ### Installation
 
 To install inference with the extra dependencies necessary to run Moondream2, run
 
@@ -2,6 +2,13 @@
 
 You can use Segment Anything 2 to identify the precise location of objects in an image. This process can generate masks for objects in an image iteratively, by specifying points to be included or discluded from the segmentation mask.
 
+## Execution Modes
+
+Segment Anything 2 supports both local and remote execution modes when used in workflows:
+
+- **Local execution**: The model runs directly on your inference server (GPU strongly recommended)
+- **Remote execution**: The model can be invoked via HTTP API on a remote inference server using the `sam2_segment_image()` client method
+
 ## How to Use Segment Anything
 
 To use Segment Anything 2 with Inference, you will need a Roboflow API key. If you don't already have a Roboflow account, <a href="https://app.roboflow.com" target="_blank">sign up for a free Roboflow account</a>. Then, retrieve your API key from the Roboflow dashboard.
 
@@ -2,7 +2,14 @@
 
 3D object generation model that converts 2D images with masks into 3D assets (meshes and Gaussian splats).
 
-This model is currenlty in Beta state! The model is only available if "SAM3_3D_OBJECTS_ENABLED" flag is on. The model can currently be ran using inference package, and also be used in Roboflow Worklows as a part of local inference server.
+This model is currently in Beta state! The model is only available if "SAM3_3D_OBJECTS_ENABLED" flag is on. The model can currently be ran using inference package, and also be used in Roboflow Workflows as a part of local inference server.
+
+## Execution Modes
+
+SAM3-3D supports both local and remote execution modes when used in workflows:
+
+- **Local execution**: The model runs directly on your inference server (32GB+ VRAM GPU strongly recommended)
+- **Remote execution**: The model can be invoked via HTTP API on a remote inference server using the `sam3_3d_infer()` client method or the `/sam3_3d/infer` endpoint
 
 ## DISCLAIMER: In order to run this model you will need a 32GB+ VRAM GPU machine. 
 
 
@@ -4,6 +4,13 @@ You can use SmolVLM2 for a range of multimodal tasks, including VQA, document OC
 
 You can deploy SmolVLM2 with Inference.
 
+### Execution Modes
+
+SmolVLM2 supports both local and remote execution modes when used in workflows:
+
+- **Local execution**: The model runs directly on your inference server (GPU recommended)
+- **Remote execution**: The model can be invoked via HTTP API on a remote inference server using the `infer_lmm()` client method
+
 ### Installation
 
 To install inference with the extra dependencies necessary to run SmolVLM2, run
 
@@ -65,6 +65,7 @@
     Sam2SegmentationRequest,
 )
 from inference.core.entities.requests.sam3 import Sam3SegmentationRequest
+from inference.core.entities.requests.sam3_3d import Sam3_3D_Objects_InferenceRequest
 from inference.core.entities.requests.server_state import (
     AddModelRequest,
     ClearModelRequest,
@@ -1179,14 +1180,13 @@ def infer_lmm(
                     countinference: Optional[bool] = None,
                     service_secret: Optional[str] = None,
                 ):
-                    """Run inference with the specified object detection model.
+                    """Run inference with the specified large multi-modal model.
 
                     Args:
-                        inference_request (ObjectDetectionInferenceRequest): The request containing the necessary details for object detection.
-                        background_tasks: (BackgroundTasks) pool of fastapi background tasks
+                        inference_request (LMMInferenceRequest): The request containing the necessary details for LMM inference.
 
                     Returns:
-                        Union[ObjectDetectionInferenceResponse, List[ObjectDetectionInferenceResponse]]: The response containing the inference results.
+                        Union[LMMInferenceResponse, List[LMMInferenceResponse]]: The response containing the inference results.
                     """
                     logger.debug(f"Reached /infer/lmm")
                     return process_inference_request(
@@ -1195,6 +1195,61 @@ def infer_lmm(
                         service_secret=service_secret,
                     )
 
+                @app.post(
+                    "/infer/lmm/{model_id:path}",
+                    response_model=Union[
+                        LMMInferenceResponse,
+                        List[LMMInferenceResponse],
+                        StubResponse,
+                    ],
+                    summary="Large multi-modal model infer with model ID in path",
+                    description="Run inference with the specified large multi-modal model. Model ID is specified in the URL path (can contain slashes).",
+                    response_model_exclude_none=True,
+                )
+                @with_route_exceptions
+                @usage_collector("request")
+                def infer_lmm_with_model_id(
+                    model_id: str,
+                    inference_request: LMMInferenceRequest,
+                    countinference: Optional[bool] = None,
+                    service_secret: Optional[str] = None,
+                ):
+                    """Run inference with the specified large multi-modal model.
+
+                    The model_id can be specified in the URL path. If model_id is also provided
+                    in the request body, it must match the path parameter.
+
+                    Args:
+                        model_id (str): The model identifier from the URL path.
+                        inference_request (LMMInferenceRequest): The request containing the necessary details for LMM inference.
+
+                    Returns:
+                        Union[LMMInferenceResponse, List[LMMInferenceResponse]]: The response containing the inference results.
+
+                    Raises:
+                        HTTPException: If model_id in path and request body don't match.
+                    """
+                    logger.debug(f"Reached /infer/lmm/{model_id}")
+
+                    # Validate model_id consistency between path and request body
+                    if (
+                        inference_request.model_id is not None
+                        and inference_request.model_id != model_id
+                    ):
+                        raise HTTPException(
+                            status_code=400,
+                            detail=f"Model ID mismatch: path specifies '{model_id}' but request body specifies '{inference_request.model_id}'",
+                        )
+
+                    # Set the model_id from path if not in request body
+                    inference_request.model_id = model_id
+
+                    return process_inference_request(
+                        inference_request,
+                        countinference=countinference,
+                        service_secret=service_secret,
+                    )
+
         if not DISABLE_WORKFLOW_ENDPOINTS:
 
             @app.post(
@@ -2613,6 +2668,87 @@ def sam3_segment_image(
                         )
                     return model_response
 
+            if CORE_MODEL_SAM3_ENABLED and not GCP_SERVERLESS:
+
+                @app.post(
+                    "/sam3_3d/infer",
+                    summary="SAM3 3D Object Generation",
+                    description="Generate 3D meshes and Gaussian splatting from 2D images with mask prompts.",
+                )
+                @with_route_exceptions
+                @usage_collector("request")
+                def sam3_3d_infer(
+                    inference_request: Sam3_3D_Objects_InferenceRequest,
+                    request: Request,
+                    api_key: Optional[str] = Query(
+                        None,
+                        description="Roboflow API Key that will be passed to the model during initialization for artifact retrieval",
+                    ),
+                    countinference: Optional[bool] = None,
+                    service_secret: Optional[str] = None,
+                ):
+                    """Generate 3D meshes and Gaussian splatting from 2D images with mask prompts.
+
+                    Args:
+                        inference_request (Sam3_3D_Objects_InferenceRequest): The request containing
+                            the image and mask input for 3D generation.
+                        api_key (Optional[str]): Roboflow API Key for artifact retrieval.
+
+                    Returns:
+                        dict: Response containing base64-encoded 3D outputs:
+                            - mesh_glb: Scene mesh in GLB format (base64)
+                            - gaussian_ply: Combined Gaussian splatting in PLY format (base64)
+                            - objects: List of individual objects with their 3D data
+                            - time: Inference time in seconds
+                    """
+                    logger.debug("Reached /sam3_3d/infer")
+                    model_id = inference_request.model_id or "sam3-3d-objects"
+
+                    self.model_manager.add_model(
+                        model_id,
+                        api_key=api_key,
+                        endpoint_type=ModelEndpointType.CORE_MODEL,
+                        countinference=countinference,
+                        service_secret=service_secret,
+                    )
+
+                    model_response = self.model_manager.infer_from_request_sync(
+                        model_id, inference_request
+                    )
+
+                    if LAMBDA:
+                        actor = request.scope["aws.event"]["requestContext"][
+                            "authorizer"
+                        ]["lambda"]["actor"]
+                        trackUsage(model_id, actor)
+
+                    # Convert bytes to base64 for JSON serialization
+                    def encode_bytes(data):
+                        if data is None:
+                            return None
+                        return base64.b64encode(data).decode("utf-8")
+
+                    objects_list = []
+                    for obj in model_response.objects:
+                        objects_list.append(
+                            {
+                                "mesh_glb": encode_bytes(obj.mesh_glb),
+                                "gaussian_ply": encode_bytes(obj.gaussian_ply),
+                                "metadata": {
+                                    "rotation": obj.metadata.rotation,
+                                    "translation": obj.metadata.translation,
+                                    "scale": obj.metadata.scale,
+                                },
+                            }
+                        )
+
+                    return {
+                        "mesh_glb": encode_bytes(model_response.mesh_glb),
+                        "gaussian_ply": encode_bytes(model_response.gaussian_ply),
+                        "objects": objects_list,
+                        "time": model_response.time,
+                    }
+
             if CORE_MODEL_OWLV2_ENABLED:
 
                 @app.post(
@@ -2756,7 +2892,7 @@ def depth_estimation(
                     depth_data = response.response
                     depth_response = DepthEstimationResponse(
                         normalized_depth=depth_data["normalized_depth"].tolist(),
-                        image=depth_data["image"].numpy_image.tobytes().hex(),
+                        image=depth_data["image"].base64_image,
                     )
                     return depth_response
 
 
@@ -1,8 +1,14 @@
 from typing import List, Literal, Optional, Type, Union
 
+import numpy as np
 from pydantic import ConfigDict, Field
 
 from inference.core.entities.requests.inference import DepthEstimationRequest
+from inference.core.env import (
+    HOSTED_CORE_MODEL_URL,
+    LOCAL_INFERENCE_API_URL,
+    WORKFLOWS_REMOTE_API_TARGET,
+)
 from inference.core.managers.base import ModelManager
 from inference.core.workflows.core_steps.common.entities import StepExecutionMode
 from inference.core.workflows.execution_engine.entities.base import (
@@ -24,6 +30,7 @@
     WorkflowBlock,
     WorkflowBlockManifest,
 )
+from inference_sdk import InferenceHTTPClient
 
 
 class BlockManifest(WorkflowBlockManifest):
@@ -138,14 +145,57 @@ def run(
                 model_version=model_version,
             )
         elif self._step_execution_mode == StepExecutionMode.REMOTE:
-            raise NotImplementedError(
-                "Remote execution is not supported for Depth Estimation. Please use a local or dedicated inference server."
+            return self.run_remotely(
+                images=images,
+                model_version=model_version,
             )
         else:
             raise ValueError(
                 f"Unknown step execution mode: {self._step_execution_mode}"
             )
 
+    def run_remotely(
+        self,
+        images: Batch[WorkflowImageData],
+        model_version: str = "depth-anything-v3/small",
+    ) -> BlockResult:
+        api_url = (
+            LOCAL_INFERENCE_API_URL
+            if WORKFLOWS_REMOTE_API_TARGET != "hosted"
+            else HOSTED_CORE_MODEL_URL
+        )
+        client = InferenceHTTPClient(
+            api_url=api_url,
+            api_key=self._api_key,
+        )
+        if WORKFLOWS_REMOTE_API_TARGET == "hosted":
+            client.select_api_v0()
+
+        predictions = []
+        for single_image in images:
+            result = client.depth_estimation(
+                inference_input=single_image.base64_image,
+                model_id=model_version,
+            )
+            # Convert the result back to the expected format
+            # Remote returns: {"normalized_depth": [...], "image": hex_string}
+            image_output = WorkflowImageData.copy_and_replace(
+                origin_image_data=single_image,
+                base64_image=result.get("image", ""),
+            )
+
+            normalized_depth = np.array(result.get("normalized_depth", []))
+
+            # Return in the same format as local execution expects
+            predictions.append(
+                {
+                    "image": image_output,
+                    "normalized_depth": normalized_depth,
+                }
+            )
+
+        return predictions
+
     def run_locally(
         self,
         images: Batch[WorkflowImageData],