Despeech Docs

Despeech offers an extremely simple API from which to schedule jobs and retrieve transcription output. It has an endpoint for each of these that requires a Bearer token set to your api key in your dashboard:

POST /api/v1/transcribe

Accepts an argument 'url' indicating the source of the audio to transcribe. Optionally accepts a 'model' with which to transcribe. Responds with the id of the transcription job created.

Request:

curl -X POST https://despeech.com/api/v1/transcribe \\
  -H "Authorization: Bearer $API_KEY" \\
  -H "Content-Type: application/json" \\
  -d '{"url": "https://myaudiofile.com/audio.mp3"}'

To enable speaker diarisation, pass diarization: true. Optionally provide both min_speakers and max_speakers to constrain the number of speakers detected.

Diarisation request:

curl -X POST https://despeech.com/api/v1/transcribe \\
  -H "Authorization: Bearer $API_KEY" \\
  -H "Content-Type: application/json" \\
  -d '{"url": "https://myaudiofile.com/audio.mp3", "diarization": true, "min_speakers": 2, "max_speakers": 4}'

Response:

{
  "id": "beaa6e89-3935-4c72-9ed1-f06237832388-e1",
  "status": "IN_QUEUE"
}

POST /api/v1/status/{id}

Returns the status of the transcription, along with output if complete.

Request:

curl -X POST https://despeech.com/api/v1/status/beaa6e89-3935-4c72-9ed1-f06237832388-e1 \\
  -H "Authorization: Bearer $API_KEY"

Responses:

{
  "id": "beaa6e89-3935-4c72-9ed1-f06237832388-e1",
  "status": "PROCESSING"
}

Completed Response:

{
  "delayTime": 11829,
  "executionTime": 2668,
  "id": "beaa6e89-3935-4c72-9ed1-f06237832388-e1",
  "output": {
    "detected_language": "en",
    "device": "cuda",
    "model": "base",
    "transcription": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi magna velit, aliquam eget metus eget, suscipit tempus orci.",
    "translation": null,
    "segments": {
      ...
    }
  },
  "status": "COMPLETED",
  "workerId": "w7it92xoifddgx"
}

Completed response (Diarized):

{
  "delayTime": 11829,
  "executionTime": 5432,
  "id": "beaa6e89-3935-4c72-9ed1-f06237832388-e1",
  "output": {
    "segments": [
      { "start": 0.0, "end": 3.2, "text": "Lorem ipsum dolor sit amet.", "speaker": "SPEAKER_00" },
      { "start": 3.5, "end": 7.1, "text": "Consectetur adipiscing elit.", "speaker": "SPEAKER_01" }
    ]
  },
  "status": "COMPLETED",
  "workerId": "w7it92xoifddgx"
}

GET /api/v1/transcript/{id}

Returns the transcription.

Request:

curl -X GET https://despeech.com/api/v1/transcript/beaa6e89-3935-4c72-9ed1-f06237832388-e1 \\
  -H "Authorization: Bearer $API_KEY"

Responses:

{
  "transcript_id": "beaa6e89-3935-4c72-9ed1-f06237832388-e1",
  "transcript": {
    "detected_language": "en",
    "device": "cuda",
    "model": "base",
    "transcription": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi magna velit, aliquam eget metus eget, suscipit tempus orci.",
    "translation": null,
    "segments": {
      ...
    }
  }
}