Resources

Document

Generate lifelike speech from pdf, docx, pptx, and other document formats.

The Text To Speech OpenAI (TTS) API allows you to convert files in various document formats into high-quality, natural-sounding speech. You can use this API to generate voiceovers for multimedia content, create narrations for e-books and documents, or turn subtitles into engaging audio experiences.

Text To Speech

POST https://api.ttsopenai.com/uapi/v1/document-to-speech

This endpoint allows you to convert document into speech. You can customize the voice, speed, and model used for the conversion.

Example Request

curl -X POST https://api.ttsopenai.com/uapi/v1/document-to-speech \
  -H "Content-Type: multipart/form-data" \
  -H "x-api-key: <your api key>" \
  --form "model=tts-1" \
  --form "voice_id=OA001" \
  --form "speed=1" \
  --form "file=@/path/to/your/document.pdf" \
  --form "file_password=your_password"

Request Attributes

model string

The model used for the conversion. You can choose between tts-1 and tts-1-hd. The default value is tts-1.

voice_id string

The voice used for the conversion. You can find the list of voice IDs in the Voice Library. The default value is OA001.

speed float

The speed of the speech. The value should be between 1 and 4. The default value is 1.

file string($binary)

The document file to be converted. Supported formats include .docx , .xlsx , .pptx , .pdf , .epub , .mobi , .txt , .html , .odt , .ods , .odp , .azw , .azw3. The maximum file size is 100 MB and max 500,000 rows of data.

file_password string

The password for the document file, if it is password-protected.

Example Response

Response
{
  "success": true,
  "result": {
    "uuid": "4a7693ee-aa35-11ef-bfda-7eba07618aa0",
    "voice_id": "OA001",
    "speed": 1,
    "model": "tts-1",
    "tts_input": "5101447014.pdf",
    "estimated_credit": 0,
    "used_credit": 0,
    "status": 1,
    "status_percentage": 1,
    "error_message": "",
    "speaker_name": "Alloy",
    "created_at": "2024-11-24T07:25:33",
    "updated_at": "2024-11-24T07:25:33",
    "file_size": 98842
  }
}

Response Attributes

success boolean

Indicates whether the request was successful.

result object

The result of the document-to-speech conversion.

result.uuid string

The unique identifier for the conversion.

result.voice_id string

The voice used for the conversion.

result.speed float

The speed of the speech.

result.model string

The model used for the conversion.

result.tts_input string

The document file that was converted into speech.

result.estimated_credit integer

The estimated number of credits used for the conversion.

result.used_credit integer

The actual number of credits used for the conversion.

result.status integer

The status of the conversion. Possible values are:

  • 1: Converting
  • 2: Completed
  • 3: Error
  • 11: Reworking
  • 12: Joining Audio
  • 13: Merging Audio
  • 14: Downloading Audio

result.status_percentage integer

The percentage of the conversion that has been completed.

result.error_message string

The error message, if any.

result.speaker_name string

The name of the speaker.

result.created_at string

The date and time when the conversion was created.

result.updated_at string

The date and time when the conversion was last updated.

result.file_size integer

The size of the document file in bytes.


The website is jointly operated by A2ZAI LTD No:16078579 Registered address at 83 Green Lanes, London, England, N13 4BS
Copyright © 2025