Emotion - Text To Speech OpenAI API

The Emotion To Speech API allows you to convert text into high-quality, emotionally expressive speech. This API extends the basic text-to-speech functionality by adding emotional context through vibe settings and custom prompts, enabling you to create more engaging and contextually appropriate audio content.

Emotion To Speech

POST https://api.ttsopenai.com/uapi/v1/text-to-speech-advanced

This endpoint allows you to convert text into speech with emotional expression. You can customize the voice, speed, model, emotional vibe, and provide custom prompts for enhanced expressiveness.

Example Request

curl -X POST https://api.ttsopenai.com/uapi/v1/text-to-speech-advanced \
  -H "Content-Type: application/json" \
  -H "x-api-key: <your api key>" \
  -d '{
    "model": "audio_stable",
    "voice_id": "OA001",
    "speed": 1,
    "input": "Hello, my name is OpenAI. I am excited to help you today!",
    "vibe_id": 1,
    "emotion": "excited",
    "custom_prompt": "Speak with enthusiasm and energy"
  }'

Request Attributes

model string

The model used for the conversion. Fixed value: audio_stable.

voice_id string

The voice used for the conversion. You can find the list of voice IDs in the Voice Library. The default value is OA001.

speed float

The speed of the speech. The value should be between 1 and 4. The default value is 1.

input string

The text to be converted into speech. The maximum length is 10,000 characters.

vibe_id number

The emotional vibe identifier used to control the emotional expression of the speech. This numeric value corresponds to predefined emotional settings.

emotion string

The specific emotion to be expressed in the speech. Examples include "happy", "sad", "excited", "calm", "angry", "surprised", etc.

custom_prompt string

A custom prompt that provides additional context or instructions for how the emotion should be expressed in the speech generation.

Example Response

Response

{
  "success": true,
  "result": {
    "uuid": "eef94c08-a806-11ef-b617-22023a24db09",
    "voice_id": "OA001",
    "speed": 1,
    "model": "audio_stable",
    "tts_input": "Hello, my name is OpenAI. I am excited to help you today!",
    "vibe_id": 1,
    "emotion": "excited",
    "custom_prompt": "Speak with enthusiasm and energy",
    "estimated_credit": 58,
    "used_credit": 58,
    "status": 1,
    "status_percentage": 50,
    "error_message": "",
    "speaker_name": "Alloy",
    "created_at": "2024-11-21T12:48:40",
    "updated_at": "2024-11-21T12:48:40"
  }
}

Response Attributes

success boolean

Indicates whether the request was successful.

result object

The result of the emotion-to-speech conversion.

result.uuid string

The unique identifier for the conversion.

result.voice_id string

The voice used for the conversion.

result.speed float

The speed of the speech.

result.model string

The model used for the conversion. Fixed value: audio_stable.

result.tts_input string

The text that was converted into speech.

result.vibe_id number

The emotional vibe identifier used for the conversion.

result.emotion string

The specific emotion expressed in the speech.

result.custom_prompt string

The custom prompt used for emotional expression guidance.

result.estimated_credit integer

The estimated number of credits used for the conversion.

result.used_credit integer

The actual number of credits used for the conversion.

result.status integer

The status of the conversion. Possible values are:

1: Converting
2: Completed
3: Error
11: Reworking
12: Joining Audio
13: Merging Audio
14: Downloading Audio

result.status_percentage integer

The percentage of the conversion that has been completed.

result.error_message string

The error message, if any.

result.speaker_name string

The name of the speaker.

result.created_at string

The date and time when the conversion was created.

result.updated_at string

The date and time when the conversion was last updated.

Story Maker

Generate lifelike speech from multiple text inputs and create engaging audio stories.

Custom Vibes

Create and manage custom emotional vibes for personalized speech generation.