Main Content

segmentObjectsFromEmbeddings

Segment objects in image using Segment Anything Model (SAM) feature embeddings

Since R2024a

Description

masks = segmentObjectsFromEmbeddings(sam,embeddings,imageSize,ForegroundPoints=pointPrompt) segments objects from an image of size imageSize using the SAM feature embeddings embeddings and the foreground point coordinates pointPrompt as a visual prompt.

Note

This functionality requires Deep Learning Toolbox™, Computer Vision Toolbox™, and the Image Processing Toolbox™ Model for Segment Anything Model. You can install the Image Processing Toolbox Model for Segment Anything Model from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons.

example

[masks] = segmentObjectsFromEmbeddings(sam,embeddings,imageSize,BoundingBox=boxPrompt) segments objects from an image using bounding box coordinates boxPrompt as a visual prompt.

[masks,scores,maskLogits] = segmentObjectsFromEmbeddings(___) returns the scores corresponding to each predicted object mask and the prediction mask logits maskLogits, using any combination of input arguments from previous syntaxes.

[___] = segmentObjectsFromEmbeddings(___,Name=Value) specifies options using one or more name-value arguments in addition to any combination of arguments from previous syntaxes. For example, ReturnMultiMask=true specifies to return three masks for a segmented object.

Examples

collapse all

Create a Segment Anything Model (SAM) object for image segmentation.

sam = segmentAnythingModel;

Load an image that contains the object to segment into the workspace.

I = imread("pears.png");

Define the image size.

imageSize = size(I);

Extract the feature embeddings from the image.

embeddings = extractEmbeddings(sam,I);

Specify the visual prompts for semantic segmentation of a single object from the image using coordinates of foreground points, or points inside of the object to segment, and coordinates of background points, or points outside of the object to segment.

foregroundPoints = [512 400; 480 420];
backgroundPoints = [340 300];

Segment an object in the image using SAM segmentation, and return the mask and corresponding prediction score.

masks = segmentObjectsFromEmbeddings(sam,embeddings,imageSize, ...
    ForegroundPoints=foregroundPoints,BackgroundPoints=backgroundPoints);

Overlay the detected object mask on the test image.

imMask = insertObjectMask(I,masks);
imshow(imMask)

Display the foreground (green) and background (red) points used as visual prompts.

fx = foregroundPoints(:,1);
fy = foregroundPoints(:,2);
bx = backgroundPoints(:,1);
by = backgroundPoints(:,2);
hold on
plot(fx,fy,'g*',bx,by,'r*')
hold off

Figure contains an axes object. The hidden axes object contains 3 objects of type image, line. One or more of the lines displays its values using only markers

Input Arguments

collapse all

Segment Anything Model for semantic segmentation, specified as a segmentAnythingModel object.

Image embeddings, specified as a 64-by-64-by-256 array. Generate the embeddings for an image or a batch of images using the extractEmbeddings object function.

Size of the input image used to generate the embeddings, specified as a 1-by-3 vector of positive integers of the form [height width channels] or a 1-by-2 vector of positive integers of the form [height width], in pixels.

Points of the object to be segmented, or foreground points, specified as a P-by-2 matrix. Each row specifies the xy-coordinates of a point in the form [x y]. P is the number of points.

Note

Use at least one of these options as the visual prompts for interactive segmentation: foreground points coordinates specified by pointPrompt, or the object bounding box coordinates specified by boxPrompt, in addition to optional name-value arguments.

Rectangular bounding box that contains the object to be segmented, specified as a 1-by-4 vector of the form [x y width height]. The coordinates x and y specify the center of the box, and width and height are the width and height of the box, respectively.

Note

Use at least one of these options as the visual prompts for interactive segmentation: object bounding box coordinates specified by boxPrompt, or the foreground points coordinates specified by pointPrompt, in addition to optional name-value arguments.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: segmentObjectsFromEmbeddings(sam,embeddings,imageSize,ForegroundPoints=pointPrompt,BoundingBox=boxPrompt,BackgroundPoints=MyPoints) specifies the background point coordinates visual prompt as the array MyPoints.

Background points, specified as a P-by-2 array. Each row specifies the xy-coordinates of a point in the form [x y]. P is the number of points. Use this argument to specify points in the image that are not part of the object to be segmented, as an additional visual prompt to foreground points or bounding boxes.

Mask prediction logits, specified by the value of maskLogits from the previous output of the segmentObjectsFromEmbeddings function. Specify the MaskLogits argument to refine an existing mask.

Multiple segmentation masks, specified as a numeric or logical 0 (false) or 1 (true). Specify ReturnMultiMask as true to return three masks in place of the default single mask, where each mask is a page of an H-by-W-by-3 logical array. H and W are the height and width, respectively, of the input image I.

Use this argument to return three masks when you use ambiguous visual prompts, such as single points. You can choose one or a combination of the resulting masks to capture different sub-regions of the object.

Output Arguments

collapse all

Object masks, returned as one of these options:

H and W are the height and width, respectively, of the input image I.

Prediction confidence scores for the segmentation, returned as one of these options:

Mask prediction logits, returned as one of these options:

Mask logits are raw, unnormalized predictions generated by the model for each pixel in the image, representing the probability that the pixel belongs to a particular instance or object class.

You can specify this value to the MaskLogits name-value argument on subsequent segmentObjectsFromEmbeddings function calls to refine the output mask.

References

[1] Kirillov, Alexander, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, et al. "Segment Anything," April 5, 2023. https://doi.org/10.48550/arXiv.2304.02643.

Version History

Introduced in R2024a