Generative AI with Text-To-Speech and Voice Generator Added to the Data Studio

Hello everyone!

I’m excited to announce we’ve added generative AI for speech (Text-to-Speech and AI Voice Generator) to the SensiML Data Studio (which can connect to your Piccolo AI server)

This feature combines ElevenLabs voice generator APIs with the Data Studio labeling capabilities to generate rich speech datasets based on voice prompts.

By utilizing this tool, you can build large speech datasets curated for keyword recognition or speaker identification models in minutes. We are very excited on how it almost eliminates the time-consuming process of manually recording phrases from individuals. I can’t wait to see the kinds of projects that it will enable in the community.

Below I’ll go over some of the features in more detail and give some tips for building out robust datasets.

How to Use

  1. Open a project in the Data Studio

  1. Open the ‘Text to Speech and AI Voice Generator’ window

Left Navigation Button

  1. Enter your ElevenLabs API key. (Sign up for an account for free at https://elevenlabs.io/)

API Key

  1. Enter a prompt for the voice.

  1. Select the voices you want to use.

  1. Adjust the Elevenlabs speech options.

Note that depending on your subscription level you have a limited number of available characters to use from ElevenLabs.

However, when you click ‘Generate’ Data Studio will give you a warning of how many characters will be used in the confirmation dialog before you actually use the characters.

Confirmation

  1. (Optional) Add pitch, gain, echo, or reverberation adjustments to the files. These adjustments are done natively in the Data Studio, so they do not use any of your ElevenLabs characters. This allows you to greatly expand your dataset by simulating pitch, volume, and distance data augmentation, which will make for a more robust/rich model.

Auto-Labeling and Metadata

Metadata

The Data Studio can add metadata to your files for any of the settings used as you generate files. This means you can track how all of your files were generated, for example you can save the VoiceName, Gender, Stability, Similarity, Pitch, and Gain percent as metadata to your files.

Given the nature of generative AI, you might discover that certain files have artifacts or don’t sound natural. Adding metadata makes it easier to see what may be causing the artifacts or unnatural voices and it helps you organize and filter your files when you are building a model.

Metadata is saved to the file and allows you to filter/sort your files when you are reviewing your training dataset later.

Auto-Labeling

If you open any of the files in the Project Explorer, you will see the Data Studio automatically adds labels/segments for your Prompt as it generates the files.

By default, the Data Studio uses the Prompt text for the label, but you can change this in the File Settings (or you can disable this feature completely).

You can see the labels on your files through the Project Explorer Label Distribution columns

Tip: Add More Voices

The Data Studio will load the voices available to your account. By default, ElevenLabs automatically adds a range of voices to your account to get started, but there are many more voices available. Depending on your application, we recommend finding additional voices with different accents, ages, and use cases through the Elevenlabs Voice Library.

You can add additional community-built voices from the ElevenLabs Voice Library by visiting https://elevenlabs.io/app/voice-library.

  1. Open the Voice Library

  1. Click Add To My Voices to add additional voices to the available voices in your library

  1. (Optional) You can also create your own using the ElevenLabs Voice Design/Voice Cloning features.

After adding new voices click the Refresh button to load the voices into the Data Studio