HomeAI Tools
Segment Anything

Segment Anything

Meta AI
118 liked
entry-slick
entry-slick
entry-slick
entry-slick
entry-slick
About Segment Anything

Segment Anything Model (SAM): a new AI model from Meta AI that can “cut out” any object, in any image, with a single click

You can click here to experience the model demo for free

SAM is a promptable segmentation system with zero-shot generalization to unfamiliar objects and images, without the need for additional training.

SAM: A generalized approach to segmentation

Previously, to solve any kind of segmentation problem, there were two classes of approaches. The first, interactive segmentation, allowed for segmenting any class of object but required a person to guide the method by iteratively refining a mask. The second, automatic segmentation, allowed for segmentation of specific object categories defined ahead of time (e.g., cats or chairs) but required substantial amounts of manually annotated objects to train (e.g., thousands or even tens of thousands of examples of segmented cats), along with the compute resources and technical expertise to train the segmentation model. Neither approach provided a general, fully automatic approach to segmentation.

SAM is a generalization of these two classes of approaches. It is a single model that can easily perform both interactive segmentation and automatic segmentation. The model’s promptable interface (described shortly) allows it to be used in flexible ways that make a wide range of segmentation tasks possible simply by engineering the right prompt for the model (clicks, boxes, text, and so on). Moreover, SAM is trained on a diverse, high-quality dataset of over 1 billion masks (collected as part of this project), which enables it to generalize to new types of objects and images beyond what it observed during training. This ability to generalize means that, by and large, practitioners will no longer need to collect their own segmentation data and fine-tune a model for their use case.

Taken together, these capabilities enable SAM to generalize both to new tasks and to new domains. This flexibility is the first of its kind for image segmentation. - SAM allows users to segment objects with just a click or by interactively clicking points to include and exclude from the object. The model can also be prompted with a bounding box.

  • SAM can output multiple valid masks when faced with ambiguity about the object being segmented, an important and necessary capability for solving segmentation in the real world.

  • SAM can automatically find and mask all objects in an image.

  • SAM can generate a segmentation mask for any prompt in real time after precomputing the image embedding, allowing for real-time interaction with the model.

How SAM works: Promptable segmentation

In natural language processing and, more recently, computer vision, one of the most exciting developments is that of foundation models that can perform zero-shot and few-shot learning for new datasets and tasks using “prompting” techniques. We took inspiration from this line of work.

We trained SAM to return a valid segmentation mask for any prompt, where a prompt can be foreground/background points, a rough box or mask, freeform text, or, in general, any information indicating what to segment in an image. The requirement of a valid mask simply means that even when a prompt is ambiguous and could refer to multiple objects (for example, a point on a shirt may indicate either the shirt or the person wearing it), the output should be a reasonable mask for one of those objects. This task is used to pretrain the model and to solve general downstream segmentation tasks via prompting.

We observed that the pretraining task and interactive data collection imposed specific constraints on the model design. In particular, the model needs to run in real time on a CPU in a web browser to allow our annotators to use SAM interactively in real time to annotate efficiently. While the runtime constraint implies a trade-off between quality and runtime, we find that a simple design yields good results in practice.

Under the hood, an image encoder produces a one-time embedding for the image, while a lightweight encoder converts any prompt into an embedding vector in real time. These two information sources are then combined in a lightweight decoder that predicts segmentation masks. After the image embedding is computed, SAM can produce a segment in just 50 milliseconds given any prompt in a web browser.

Reviews
Show more
Don Quixote
最近GPT一直都被炒得火熱,沒想到這麼快就見到了CV的大模型,而且擁有新數據集+新范式+超強零樣本泛化能力。雖然此次出現的CV大模型沒有NLP中的GPT那麼強大的效果:用一個模型就可以處理多個下游任務。但這也是一個很好的開始,也應該是CV未來的發展趨勢。
Share
小青爱吃草莓
Segment Anything can get different segmentation results through different prompt methods. Generally speaking, the effect is still very good. The key is that it can also run quickly in the CPU environment. I recommend you to try it!
Share
Meta AI
Prepare for a huge dose of nostalgia with these celeb wedding photos from the '70s and '80s. Actually having lived through those decades is recommended, but not necessary, to enjoy them.
Share
Meta AI
In July, we embarked on a journey to tackle this challenge, and it's incredible what we've achieved so far. >> Hire Graphic De...

Even more astonishing, it's converting! 🔥

Early launch vibes, and we're still building, but unlike any other, we're already seeing…
video
00:22
Share
Community Posts
Meta AI
Prepare for a huge dose of nostalgia with these celeb wedding photos from the '70s and '80s. Actually having lived through those decades is recommended, but not necessary, to enjoy them.
Share
Meta AI
In July, we embarked on a journey to tackle this challenge, and it's incredible what we've achieved so far. >> Hire Graphic De...

Even more astonishing, it's converting! 🔥

Early launch vibes, and we're still building, but unlike any other, we're already seeing…
video
00:22
Share
Meta AI
link
Model T hand crank start #hotrod #antiquecars #ford
Model T hank crank start with Hot Rodder John Hall builder of the yellow Hot Wheels Roadster, a car like this will keep you on your toes.
Share
Meta AI
Last month we announced SeamlessM4T, a foundational multimodal model for speech translation that can perform tasks across speech-to-text, speech-to-speech and more for up to 100 languages depending on the task.

More details on this work ➡️ t.co/v2AhogY1cX
image
Share
Meta AI
Get front-row access to our interactive livestream FW23 fashion show and shop the new looks as they strut down the runway!
Share
Meta AI
I need your help!

Your input matters! Help me choose the new face of "Tony is Trading." 🚀 Cast your vote on the logo poll below and let's shape our brand together. Your voice counts! 🗣️💼

Help me choose ...

#LogoPoll #TonyIsTrading #YourVoteMatters #BrandIdentity
link
Help me choose my new design!
I’m running a logo contest on 99designs. Designers have submitted 519 designs so far. Please vote on your favorite as I’d love to receive your feedback.
Share
Meta AI
We recently released Belebele, a first-of-its-kind multilingual reading comprehension dataset. It's parallel for 122 language variants, enabling direct comparison of how well models understand different languages.

Dataset ⬇️
GitHub - facebo...
link
GitHub - facebookresearch/belebele: Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.
Repo for the Belebele dataset, a massively multilingual reading comprehension dataset. - GitHub - facebookresearch/belebele: Repo for the Belebele dataset, a massively multilingual reading comprehe...
Share
Meta AI
📣 Llama 2 and Code Llama are now on #KaggleModels!
-------------
From @Kaggle:🤖 New on #KaggleModels! Introducing Llama 2 from @MetaAI: a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. 📚 Explore, share, and upvote your favorite notebooks. Happy Kaggling!
Share
Meta AI
We've expanding access to DINOv2 by releasing the training code and model weights under the Apache-2 license.

Details on this and more of our recent work to advance computer vision research and fairness in AI ⬇️
Share
Meta AI
RT @MyoSuite: (1/8) In July, we presented SAR at #RSS2023. We show that SAR enables SOTA high-dim control by applying the neuroscience of m…
Share
Meta AI
Anyscale Endpoints enables AI application developers to easily swap closed models for the Llama 2 models — or use open models along with closed models in the same application.
-------------
From @ray:The team @MetaAI has done a tremendous amount to move the field forward with the Llama models. We're thrilled to collaborate to help grow the Llama ecosystem.
Share
Meta AI
As part of our continued belief in the value of an open approach to today's AI, we've published a research paper with more information on Code Llama training, evaluation results, safety and more.

Code Llama: Open Foundation Models for Code ➡️ Code Llama: Ope...
image
Share
Meta AI
Last week we released FACET, a new comprehensive benchmark dataset for evaluating the fairness of models across a number of different vision tasks, constructed of 32K images from SA-1B, labeled by expert annotators.

Read the paper ➡️ t.co/OoYV2eiSYX
video
00:06
Share
Meta AI
Unlock the mysteries of your future 👀 🌟🔮
Share
Meta AI
This happens so many times in clubs and bars at the end of the night… have you ever wondered and not done anything?

Watch the full episode #26 of our ManTFup Podcast now! Link in bio
#podcast #mantfuppodcast #sextrafficking #survivor #sextraffickedsurvivor #kendrageronimo
video
00:39
Share
Meta AI
We believe that AI models benefit from an open approach, both in terms of innovation and safety. Releasing models like Code Llama means the entire community can evaluate their capabilities, identify issues & fix vulnerabilities. GitHub - facebo...
link
GitHub - facebookresearch/codellama: Inference code for CodeLlama models
Inference code for CodeLlama models. Contribute to facebookresearch/codellama development by creating an account on GitHub.
Share
Meta AI
RT @CaggianoVitt: 💪The biggest release so far!

🧩300+ tasks to challenge the best Reinforcement Learning algorithms to solve real human mo…
Share
Meta AI
🦙 Llama 2 and Code Llama models are now available in the Vertex AI Model Garden — more details from @GoogleCloudTech ⬇️
-------------
From @Google Cloud Tech:Vertex AI, your one-stop shop for building #generativeAI apps, just got some upgrades:

🔹 Access 100 large models in Model Garden, including Llama 2 from @MetaAI
🔹 Model and tuning upgrades for PaLM 2, Imagen, and Codey

& more ↓ #GoogleCloudNext
Share
Meta AI
RT @robertnishihara: Just tried out Code Llama (34B) on Anyscale Endpoints.

Impressive work from @MetaAI, and I'm proud to see our team at…
Share
Meta AI
We evaluated Code Llama against existing solutions on both HumanEval & MBPP.
- It performed better than open-source, code-specific LLMs & Llama 2.
- Code Llama 34B scored the highest vs other SOTA open solutions on MBPP — on par w/ ChatGPT.

More info ➡️ t.co/yLBEKVIK4x
image
Share