Skip to content

Commit 951e61e

Browse files
authored
Add remote exec capability for foundation models missing it (#1968)
* add remote exec for foundation models missing it * make style * fix missing name in unit tests * fix depth estimation endpoint to return propper base64 * allow passing model id explicitly in /infer/llm endpoint * fix image returned by depth estimation block on remote exec
1 parent 36a492e commit 951e61e

File tree

23 files changed

+2000
-22
lines changed

23 files changed

+2000
-22
lines changed

docs/foundation/depth_estimation.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,13 @@ You can use Depth-Anything-V2-Small to estimate the depth of objects in images,
77

88
You can deploy Depth-Anything-V2-Small with Inference.
99

10+
### Execution Modes
11+
12+
Depth Estimation supports both local and remote execution modes when used in workflows:
13+
14+
- **Local execution**: The model runs directly on your inference server (GPU recommended for faster inference)
15+
- **Remote execution**: The model can be invoked via HTTP API on a remote inference server using the `depth_estimation()` client method
16+
1017
### Installation
1118

1219
To install inference with the extra dependencies necessary to run Depth-Anything-V2-Small, run

docs/foundation/florence2.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,15 @@ You can use Inference for all the Florence-2 tasks above.
1818

1919
The text in the parentheses are the task prompts you will need to use each task.
2020

21+
### Execution Modes
22+
23+
Florence-2 supports both local and remote execution modes when used in workflows:
24+
25+
- **Local execution**: The model runs directly on your inference server (GPU recommended)
26+
- **Remote execution**: The model can be invoked via HTTP API on a remote inference server
27+
28+
When using Florence-2 in a workflow, you can specify the execution mode to control where inference happens.
29+
2130
### How to Use Florence-2
2231

2332
??? Note "Install `inference`"

docs/foundation/gaze.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,13 @@
22

33
You can detect the direction in which someone is looking using the L2CS-Net model.
44

5+
## Execution Modes
6+
7+
L2CS-Net gaze detection supports both local and remote execution modes when used in workflows:
8+
9+
- **Local execution**: The model runs directly on your inference server
10+
- **Remote execution**: The model can be invoked via HTTP API on a remote inference server using `detect_gazes()` client method
11+
512
## How to Use L2CS-Net
613

714
To use L2CS-Net with Inference, you will need a Roboflow API key. If you don't already have a Roboflow account, <a href="https://app.roboflow.com" target="_blank">sign up for a free Roboflow account</a>. Then, retrieve your API key from the Roboflow dashboard. Run the following command to set your API key in your coding environment:

docs/foundation/moondream2.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,13 @@
22

33
You can deploy Moondream2 with Inference.
44

5+
### Execution Modes
6+
7+
Moondream2 supports both local and remote execution modes when used in workflows:
8+
9+
- **Local execution**: The model runs directly on your inference server (GPU recommended)
10+
- **Remote execution**: The model can be invoked via HTTP API on a remote inference server using the `infer_lmm()` client method
11+
512
### Installation
613

714
To install inference with the extra dependencies necessary to run Moondream2, run

docs/foundation/sam2.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,13 @@
22

33
You can use Segment Anything 2 to identify the precise location of objects in an image. This process can generate masks for objects in an image iteratively, by specifying points to be included or discluded from the segmentation mask.
44

5+
## Execution Modes
6+
7+
Segment Anything 2 supports both local and remote execution modes when used in workflows:
8+
9+
- **Local execution**: The model runs directly on your inference server (GPU strongly recommended)
10+
- **Remote execution**: The model can be invoked via HTTP API on a remote inference server using the `sam2_segment_image()` client method
11+
512
## How to Use Segment Anything
613

714
To use Segment Anything 2 with Inference, you will need a Roboflow API key. If you don't already have a Roboflow account, <a href="https://app.roboflow.com" target="_blank">sign up for a free Roboflow account</a>. Then, retrieve your API key from the Roboflow dashboard.

docs/foundation/sam3_3d.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,14 @@
22

33
3D object generation model that converts 2D images with masks into 3D assets (meshes and Gaussian splats).
44

5-
This model is currenlty in Beta state! The model is only available if "SAM3_3D_OBJECTS_ENABLED" flag is on. The model can currently be ran using inference package, and also be used in Roboflow Worklows as a part of local inference server.
5+
This model is currently in Beta state! The model is only available if "SAM3_3D_OBJECTS_ENABLED" flag is on. The model can currently be ran using inference package, and also be used in Roboflow Workflows as a part of local inference server.
6+
7+
## Execution Modes
8+
9+
SAM3-3D supports both local and remote execution modes when used in workflows:
10+
11+
- **Local execution**: The model runs directly on your inference server (32GB+ VRAM GPU strongly recommended)
12+
- **Remote execution**: The model can be invoked via HTTP API on a remote inference server using the `sam3_3d_infer()` client method or the `/sam3_3d/infer` endpoint
613

714
## DISCLAIMER: In order to run this model you will need a 32GB+ VRAM GPU machine.
815

docs/foundation/smolvlm.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,13 @@ You can use SmolVLM2 for a range of multimodal tasks, including VQA, document OC
44

55
You can deploy SmolVLM2 with Inference.
66

7+
### Execution Modes
8+
9+
SmolVLM2 supports both local and remote execution modes when used in workflows:
10+
11+
- **Local execution**: The model runs directly on your inference server (GPU recommended)
12+
- **Remote execution**: The model can be invoked via HTTP API on a remote inference server using the `infer_lmm()` client method
13+
714
### Installation
815

916
To install inference with the extra dependencies necessary to run SmolVLM2, run

inference/core/interfaces/http/http_api.py

Lines changed: 141 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@
6565
Sam2SegmentationRequest,
6666
)
6767
from inference.core.entities.requests.sam3 import Sam3SegmentationRequest
68+
from inference.core.entities.requests.sam3_3d import Sam3_3D_Objects_InferenceRequest
6869
from inference.core.entities.requests.server_state import (
6970
AddModelRequest,
7071
ClearModelRequest,
@@ -1179,14 +1180,13 @@ def infer_lmm(
11791180
countinference: Optional[bool] = None,
11801181
service_secret: Optional[str] = None,
11811182
):
1182-
"""Run inference with the specified object detection model.
1183+
"""Run inference with the specified large multi-modal model.
11831184
11841185
Args:
1185-
inference_request (ObjectDetectionInferenceRequest): The request containing the necessary details for object detection.
1186-
background_tasks: (BackgroundTasks) pool of fastapi background tasks
1186+
inference_request (LMMInferenceRequest): The request containing the necessary details for LMM inference.
11871187
11881188
Returns:
1189-
Union[ObjectDetectionInferenceResponse, List[ObjectDetectionInferenceResponse]]: The response containing the inference results.
1189+
Union[LMMInferenceResponse, List[LMMInferenceResponse]]: The response containing the inference results.
11901190
"""
11911191
logger.debug(f"Reached /infer/lmm")
11921192
return process_inference_request(
@@ -1195,6 +1195,61 @@ def infer_lmm(
11951195
service_secret=service_secret,
11961196
)
11971197

1198+
@app.post(
1199+
"/infer/lmm/{model_id:path}",
1200+
response_model=Union[
1201+
LMMInferenceResponse,
1202+
List[LMMInferenceResponse],
1203+
StubResponse,
1204+
],
1205+
summary="Large multi-modal model infer with model ID in path",
1206+
description="Run inference with the specified large multi-modal model. Model ID is specified in the URL path (can contain slashes).",
1207+
response_model_exclude_none=True,
1208+
)
1209+
@with_route_exceptions
1210+
@usage_collector("request")
1211+
def infer_lmm_with_model_id(
1212+
model_id: str,
1213+
inference_request: LMMInferenceRequest,
1214+
countinference: Optional[bool] = None,
1215+
service_secret: Optional[str] = None,
1216+
):
1217+
"""Run inference with the specified large multi-modal model.
1218+
1219+
The model_id can be specified in the URL path. If model_id is also provided
1220+
in the request body, it must match the path parameter.
1221+
1222+
Args:
1223+
model_id (str): The model identifier from the URL path.
1224+
inference_request (LMMInferenceRequest): The request containing the necessary details for LMM inference.
1225+
1226+
Returns:
1227+
Union[LMMInferenceResponse, List[LMMInferenceResponse]]: The response containing the inference results.
1228+
1229+
Raises:
1230+
HTTPException: If model_id in path and request body don't match.
1231+
"""
1232+
logger.debug(f"Reached /infer/lmm/{model_id}")
1233+
1234+
# Validate model_id consistency between path and request body
1235+
if (
1236+
inference_request.model_id is not None
1237+
and inference_request.model_id != model_id
1238+
):
1239+
raise HTTPException(
1240+
status_code=400,
1241+
detail=f"Model ID mismatch: path specifies '{model_id}' but request body specifies '{inference_request.model_id}'",
1242+
)
1243+
1244+
# Set the model_id from path if not in request body
1245+
inference_request.model_id = model_id
1246+
1247+
return process_inference_request(
1248+
inference_request,
1249+
countinference=countinference,
1250+
service_secret=service_secret,
1251+
)
1252+
11981253
if not DISABLE_WORKFLOW_ENDPOINTS:
11991254

12001255
@app.post(
@@ -2613,6 +2668,87 @@ def sam3_segment_image(
26132668
)
26142669
return model_response
26152670

2671+
if CORE_MODEL_SAM3_ENABLED and not GCP_SERVERLESS:
2672+
2673+
@app.post(
2674+
"/sam3_3d/infer",
2675+
summary="SAM3 3D Object Generation",
2676+
description="Generate 3D meshes and Gaussian splatting from 2D images with mask prompts.",
2677+
)
2678+
@with_route_exceptions
2679+
@usage_collector("request")
2680+
def sam3_3d_infer(
2681+
inference_request: Sam3_3D_Objects_InferenceRequest,
2682+
request: Request,
2683+
api_key: Optional[str] = Query(
2684+
None,
2685+
description="Roboflow API Key that will be passed to the model during initialization for artifact retrieval",
2686+
),
2687+
countinference: Optional[bool] = None,
2688+
service_secret: Optional[str] = None,
2689+
):
2690+
"""Generate 3D meshes and Gaussian splatting from 2D images with mask prompts.
2691+
2692+
Args:
2693+
inference_request (Sam3_3D_Objects_InferenceRequest): The request containing
2694+
the image and mask input for 3D generation.
2695+
api_key (Optional[str]): Roboflow API Key for artifact retrieval.
2696+
2697+
Returns:
2698+
dict: Response containing base64-encoded 3D outputs:
2699+
- mesh_glb: Scene mesh in GLB format (base64)
2700+
- gaussian_ply: Combined Gaussian splatting in PLY format (base64)
2701+
- objects: List of individual objects with their 3D data
2702+
- time: Inference time in seconds
2703+
"""
2704+
logger.debug("Reached /sam3_3d/infer")
2705+
model_id = inference_request.model_id or "sam3-3d-objects"
2706+
2707+
self.model_manager.add_model(
2708+
model_id,
2709+
api_key=api_key,
2710+
endpoint_type=ModelEndpointType.CORE_MODEL,
2711+
countinference=countinference,
2712+
service_secret=service_secret,
2713+
)
2714+
2715+
model_response = self.model_manager.infer_from_request_sync(
2716+
model_id, inference_request
2717+
)
2718+
2719+
if LAMBDA:
2720+
actor = request.scope["aws.event"]["requestContext"][
2721+
"authorizer"
2722+
]["lambda"]["actor"]
2723+
trackUsage(model_id, actor)
2724+
2725+
# Convert bytes to base64 for JSON serialization
2726+
def encode_bytes(data):
2727+
if data is None:
2728+
return None
2729+
return base64.b64encode(data).decode("utf-8")
2730+
2731+
objects_list = []
2732+
for obj in model_response.objects:
2733+
objects_list.append(
2734+
{
2735+
"mesh_glb": encode_bytes(obj.mesh_glb),
2736+
"gaussian_ply": encode_bytes(obj.gaussian_ply),
2737+
"metadata": {
2738+
"rotation": obj.metadata.rotation,
2739+
"translation": obj.metadata.translation,
2740+
"scale": obj.metadata.scale,
2741+
},
2742+
}
2743+
)
2744+
2745+
return {
2746+
"mesh_glb": encode_bytes(model_response.mesh_glb),
2747+
"gaussian_ply": encode_bytes(model_response.gaussian_ply),
2748+
"objects": objects_list,
2749+
"time": model_response.time,
2750+
}
2751+
26162752
if CORE_MODEL_OWLV2_ENABLED:
26172753

26182754
@app.post(
@@ -2756,7 +2892,7 @@ def depth_estimation(
27562892
depth_data = response.response
27572893
depth_response = DepthEstimationResponse(
27582894
normalized_depth=depth_data["normalized_depth"].tolist(),
2759-
image=depth_data["image"].numpy_image.tobytes().hex(),
2895+
image=depth_data["image"].base64_image,
27602896
)
27612897
return depth_response
27622898

inference/core/workflows/core_steps/models/foundation/depth_estimation/v1.py

Lines changed: 52 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,14 @@
11
from typing import List, Literal, Optional, Type, Union
22

3+
import numpy as np
34
from pydantic import ConfigDict, Field
45

56
from inference.core.entities.requests.inference import DepthEstimationRequest
7+
from inference.core.env import (
8+
HOSTED_CORE_MODEL_URL,
9+
LOCAL_INFERENCE_API_URL,
10+
WORKFLOWS_REMOTE_API_TARGET,
11+
)
612
from inference.core.managers.base import ModelManager
713
from inference.core.workflows.core_steps.common.entities import StepExecutionMode
814
from inference.core.workflows.execution_engine.entities.base import (
@@ -24,6 +30,7 @@
2430
WorkflowBlock,
2531
WorkflowBlockManifest,
2632
)
33+
from inference_sdk import InferenceHTTPClient
2734

2835

2936
class BlockManifest(WorkflowBlockManifest):
@@ -138,14 +145,57 @@ def run(
138145
model_version=model_version,
139146
)
140147
elif self._step_execution_mode == StepExecutionMode.REMOTE:
141-
raise NotImplementedError(
142-
"Remote execution is not supported for Depth Estimation. Please use a local or dedicated inference server."
148+
return self.run_remotely(
149+
images=images,
150+
model_version=model_version,
143151
)
144152
else:
145153
raise ValueError(
146154
f"Unknown step execution mode: {self._step_execution_mode}"
147155
)
148156

157+
def run_remotely(
158+
self,
159+
images: Batch[WorkflowImageData],
160+
model_version: str = "depth-anything-v3/small",
161+
) -> BlockResult:
162+
api_url = (
163+
LOCAL_INFERENCE_API_URL
164+
if WORKFLOWS_REMOTE_API_TARGET != "hosted"
165+
else HOSTED_CORE_MODEL_URL
166+
)
167+
client = InferenceHTTPClient(
168+
api_url=api_url,
169+
api_key=self._api_key,
170+
)
171+
if WORKFLOWS_REMOTE_API_TARGET == "hosted":
172+
client.select_api_v0()
173+
174+
predictions = []
175+
for single_image in images:
176+
result = client.depth_estimation(
177+
inference_input=single_image.base64_image,
178+
model_id=model_version,
179+
)
180+
# Convert the result back to the expected format
181+
# Remote returns: {"normalized_depth": [...], "image": hex_string}
182+
image_output = WorkflowImageData.copy_and_replace(
183+
origin_image_data=single_image,
184+
base64_image=result.get("image", ""),
185+
)
186+
187+
normalized_depth = np.array(result.get("normalized_depth", []))
188+
189+
# Return in the same format as local execution expects
190+
predictions.append(
191+
{
192+
"image": image_output,
193+
"normalized_depth": normalized_depth,
194+
}
195+
)
196+
197+
return predictions
198+
149199
def run_locally(
150200
self,
151201
images: Batch[WorkflowImageData],

0 commit comments

Comments
 (0)