Meta launched SAM Audio (SAM-Audio), positioned as a "unified" audio segmentation and editing AI model, with the goal of isolating and editing specific sounds promptly in complex mixes. Typical use cases include extracting guitars or vocals individually from a band video, filtering outdoor traffic noise, removing distractions such as dog barking from podcasts.
SAM Audio's interactive approach emphasizes "human intuitive prompts" and supports three types of prompts that can be overlaid: text prompts (such as typing "dog barking" and "singing voice"), visual prompts (clicking on the person or object that is making a sound in the video screen to lock the sound source), and time period prompts (marking the target sound that appears within a certain time range). Meta also provides an online demo portal, Segment Anything Playground, which allows users to experience model capabilities using platform materials or uploading their own audio and video, and opens model downloads and local inference.
On the open source and ecological side, the official repository provides inference code and sample notebooks, and publishes model weights of multiple sizes (small/base/large), as well as variants with stronger visual prompt performance. It should be noted that at this stage, the prompt form is mainly text, picture and time period, and fine-grained separation may still be limited in the "similar sound source" scenario. When it comes to commercial production, copyrighted audio, and character sounds, you should also evaluate licensing, compliance, and final sound stability risks.
FAQs
Q: What type of model is SAM Audio?
A: SAM Audio is a unified AI model for audio separation and editing, isolating the target sound from complex mixed audio and outputting editable results.
Q: What cues does SAM Audio support to locate sounds?
A: SAM Audio supports text prompts, visual prompts (click on the sound object in the video screen) and time period prompts, and can combine multiple prompts.
Q: What creative and post-production scenarios is SAM Audio suitable for?
A: Common scenarios in SAM Audio include instrument/vocal track splitting, outdoor recording noise reduction, podcast noise removal, and video post-production sound source enhancement.
Q: What can Segment Anything Playground do?
A: Segment Anything Playground provides an online experience portal where you can test SAM Audio's separation and editing capabilities with sample materials or uploaded audio and video, and the specific functions and scope of use are subject to the page rules.
Q: How can SAM Audio open source weights be obtained and used?
A: SAM Audio provides open-source inference code and multi-dimensional weights, some of which may be downloaded after the model hosting platform may require access permissions.