How well does the NSFW AI work on audio? In order to answer this question, we need to take a look at the technological capabilities of AI systems in general and how these technologies are being used — or not — for audio content processing/moderation. We are exploring an audio domain, and we have started to fit machine learning models that were traditionally applied with text or images NSFW AI into this specific problem.
AI could work well with audio content, but it is based on some parameters like accuracy: To detect inappropriate materials attachment rates can go from 70 to 90 percent according the complexity of input. For example, sound processing algorithms efficiency relies on audio quality, background noise and diversity of training set. NSFW AI is capable of detecting explicit content in up to 95% clean and high-quality audio samples but achieves lower accuracy when subject audio forms are noiced out, or diversity of accents/languages that will bring the accuracy down around under 80%.
Just as there are industry terms for certain types of AI functionality (NLP, anybody? or how about speech-to-text conversion?), "natural language processing" and AI transformation from an audio file to a text metafile is core to NSFW.AI's approach. And then, NLP algorithms are called to analyze transcriptions of audio content — spoken words can be converted into text which the AI is later able to process. The delay in this transitional can heavily affect the functionality of an AI, real-time is great but many run behind with viewing targets by up to 3 seconds before switching its output unit and it effect content moderation more adversely.
Here are a few historical examples to demonstrate some of the problems involved with applying NSFW AI on audio. Google caught flack in 2017 when its voice recognition system had trouble providing closed captioning and censoring inappropriate content during live streams. This demonstrated the short-comings of current technology, since here — due to live inputs on an endless scale and with enormous variability in content — the AI lost its efficiency. But the limits of NSFW AI are clear in this case: While we may have made big leaps in text and image moderation for things like PornHub or Instagram. unruly, listener-submitted audio retransmissions offer unique challenges that simple algorithms trained on noisier data struggle with.
However, whether NSFW AI is effective for audio involves answering another question: how expensive it will be to implement. Sebastian estimates that it can cost between $100,000 and $1 million to develop and deploy even basic audio-specific AI moderation systems depending on the scale of operation. The efficacy of the AI must justify that investment — especially in a highly public-facing application like live streaming content or broadcast, where incorrect categorization could lead to massively damaging reputational harm and/or legal exposure.
The success case of NSFW AI in audio has also been possible with improved algorithms. Down-level companies and startups are going to have trouble in the application layer given they will not be able to invest in continuous training, need refinement with focusing on getting error rates down by a minimum of 10% per year. These systems go through a lifecycle of regular updates, with the refresh cycle typically being 6-12 months to keep up-to-date with new kinds and forms of explicit content they detect.
Regardless, as concluded above NSFW Ai seems to be stumbling a bit when it comes the audio domain. The industry always seeks the newest and best, but implementations in practice still give rise to different results between investigations, especially within complex regimes. More here on the ever developing topic from nsfw ai