Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.sully.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Sully.ai provides two approaches for converting patient conversations to text:
ApproachBest ForLatency
File UploadPre-recorded audio, batch processing, large filesAsync (seconds to minutes)
Real-time StreamingLive visits, immediate feedback, interactive transcriptionReal-time (~200ms)
Both approaches produce the same high-quality medical transcription output that can be passed to note generation.

File Upload

Upload pre-recorded audio files for asynchronous transcription. This approach is ideal for batch processing, large files, or when real-time feedback is not required.

Supported Formats

FormatMIME TypeExtension
WAVaudio/wav.wav
MP3audio/mpeg.mp3
FLACaudio/flac.flac
OGGaudio/ogg.ogg
WebMaudio/webm.webm
MP4audio/mp4.mp4
M4Aaudio/mp4.m4a
AACaudio/aac.aac
Opusaudio/opus.opus
Maximum file size: 100MB. For larger files, consider splitting into segments or using real-time streaming.

Upload and Poll

File transcription is asynchronous. Submit your file, then poll for completion.
import SullyAI from '@sullyai/sullyai';
import * as fs from 'fs';

const client = new SullyAI();

// 1. Upload audio file
const transcription = await client.audio.transcriptions.create({
  audio: fs.createReadStream('patient-visit.mp3'),
});

console.log(`Transcription ID: ${transcription.transcriptionId}`);

// 2. Poll until complete
let result = await client.audio.transcriptions.retrieve(
  transcription.transcriptionId
);

while (result.status === 'STATUS_PROCESSING') {
  await new Promise((resolve) => setTimeout(resolve, 2000));
  result = await client.audio.transcriptions.retrieve(
    transcription.transcriptionId
  );
}

if (result.status === 'STATUS_ERROR') {
  throw new Error('Transcription failed');
}

console.log('Transcript:', result.payload?.transcription);

Dictation Formatting

If you want prerecorded transcript output formatted for dictation workflows, add the optional dictation field and set it to true. If omitted, dictation defaults to false.
import { readFile } from 'fs/promises';

const audioBytes = await readFile('./patient-visit.mp3');
const formData = new FormData();

formData.append(
  'audio',
  new File([audioBytes], 'patient-visit.mp3', { type: 'audio/mpeg' })
);
formData.append('dictation', 'true');

await fetch('https://api.sully.ai/v2/audio/transcriptions', {
  method: 'POST',
  headers: {
    'X-API-Key': process.env.SULLY_API_KEY!,
    'X-Account-Id': process.env.SULLY_ACCOUNT_ID!,
  },
  body: formData,
});

Status Lifecycle

StatusDescription
pendingRequest received, queued for processing
processingActively being transcribed
completedTranscription ready in payload.transcription
failedAn error occurred
For production applications, use webhooks instead of polling to receive notifications when transcription completes.

Real-time Streaming

Stream audio in real-time during patient visits for immediate transcription feedback. This approach uses WebSockets to send audio chunks and receive transcription segments as they are processed.

Connection Flow

1. Get token     POST /v1/audio/transcriptions/stream/token
2. Connect       wss://api.sully.ai/v1/audio/transcriptions/stream?...
3. Wait for      { "type": "status", "status": "connected" }
4. Send audio    { "audio": "<base64-encoded-audio>" }
5. Receive       status, transcript, and error messages
6. Close         ws.close()

Get a Streaming Token

Before connecting to the WebSocket, obtain a short-lived token:
const tokenResponse = await fetch(
  'https://api.sully.ai/v1/audio/transcriptions/stream/token',
  {
    method: 'POST',
    headers: {
      'X-API-Key': process.env.SULLY_API_KEY!,
      'X-Account-Id': process.env.SULLY_ACCOUNT_ID!,
    },
  }
);

const { data: { token } } = await tokenResponse.json();

WebSocket URL

Connect to the streaming endpoint with your token and audio parameters:
wss://api.sully.ai/v1/audio/transcriptions/stream?sample_rate=16000&account_id={accountId}&api_token={token}
ParameterRequiredDescription
sample_rateNoAudio sample rate in Hz (e.g., 16000, 44100). If omitted, the streaming service currently defaults to 16000. For raw, headerless audio, send the actual sample rate explicitly.
account_idYesYour Sully account ID
api_tokenYesToken from the stream token endpoint
languageNoBCP47 language tag (e.g., en, es, multi)
dictationNoSet to true to request dictation-oriented transcript formatting
For raw, headerless audio, send both encoding and sample_rate. For containerized audio, omit encoding. If sample_rate is omitted, the current streaming service still defaults it to 16000.

Message Format

Stream ready:
{
  "type": "status",
  "status": "connected",
  "timestamp": "2026-04-22T14:32:07.123Z"
}
Sending audio:
{ "audio": "<base64-encoded-audio-chunk>" }
Receiving transcription:
{
  "type": "transcript",
  "text": "The patient reports feeling tired",
  "isFinal": false,
  "is_final": false
}
Error message:
{
  "type": "error",
  "error": "An error occurred during transcription",
  "timestamp": "2026-04-22T14:32:12.123Z"
}
  • Wait for the status: connected message before sending audio
  • text: The transcribed text for the current segment
  • is_final: Canonical finality flag for the current segment
  • isFinal: Compatibility alias for is_final
  • type: "error" indicates a transcription problem. Some runtime errors are non-terminal, while other failures are followed by socket closure.

Basic WebSocket Connection

// Get streaming token first (see above)
const token = await getStreamingToken();
const accountId = process.env.SULLY_ACCOUNT_ID!;

// Connect to WebSocket
const ws = new WebSocket(
  `wss://api.sully.ai/v1/audio/transcriptions/stream?sample_rate=16000&account_id=${accountId}&api_token=${token}`
);

// Track transcription segments
const segments: string[] = [];
let currentIndex = 0;
let streamReady = false;

ws.onopen = () => {
  console.log('Socket opened, waiting for stream readiness...');
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);

  if (data.type === 'status') {
    streamReady = data.status === 'connected';
    console.log(`Stream status: ${data.status}`);
    return;
  }

  if (data.type === 'error') {
    console.warn('Transcription stream error:', data.error);
    return;
  }

  if (data.type === 'transcript' && data.text) {
    segments[currentIndex] = data.text;
    const isFinal = data.is_final ?? data.isFinal ?? false;

    if (isFinal) {
      console.log(`Segment ${currentIndex}: ${data.text}`);
      currentIndex++;
    }
  }
};

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};

ws.onclose = (event) => {
  console.log(`Connection closed: ${event.code} ${event.reason}`);
  const fullTranscript = segments.join(' ');
  console.log('Full transcript:', fullTranscript);
};

// Send audio data (from microphone, file, etc.)
function sendAudio(audioBuffer: ArrayBuffer) {
  if (!streamReady) {
    console.warn('Stream not ready yet; wait for the connected status message.');
    return;
  }

  const base64Audio = btoa(
    String.fromCharCode(...new Uint8Array(audioBuffer))
  );
  ws.send(JSON.stringify({ audio: base64Audio }));
}

Production Streaming

Real-time audio streaming in production requires handling network interruptions, reconnection, and audio buffering. This section provides battle-tested patterns for reliable streaming.

Key Challenges

  1. Network interruptions - Mobile networks and WiFi can drop unexpectedly
  2. Token expiration - Streaming tokens have limited validity
  3. Audio continuity - Buffering audio during reconnection to prevent data loss
  4. State recovery - Resuming transcription context after reconnection
  5. Error frames - Some server error messages are non-terminal, while others precede disconnects

Reconnection with Exponential Backoff

Never reconnect immediately after a failure. Use exponential backoff with jitter to prevent thundering herd problems:
interface BackoffConfig {
  baseDelayMs: number;
  maxDelayMs: number;
  maxAttempts: number;
}

function calculateBackoff(
  attempt: number,
  config: BackoffConfig
): number {
  const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt);
  const cappedDelay = Math.min(exponentialDelay, config.maxDelayMs);
  // Add jitter: random value between 0-25% of delay
  const jitter = cappedDelay * Math.random() * 0.25;
  return cappedDelay + jitter;
}

Production WebSocket Implementation

The following implementation handles reconnection, audio buffering, and error recovery:
This example reconnects on close events and connection failures. In your own implementation, surface type: "error" messages immediately, but do not assume every error frame is terminal. Some runtime errors are followed by later stream messages, while other failures are followed by socket closure.
import { EventEmitter } from 'events';

interface StreamConfig {
  accountId: string;
  apiKey: string;
  sampleRate: number;
  language?: string;
  maxReconnectAttempts?: number;
  baseReconnectDelayMs?: number;
  maxReconnectDelayMs?: number;
  audioBufferMaxSize?: number;
}

interface TranscriptionSegment {
  index: number;
  text: string;
  isFinal: boolean;
}

type ConnectionState =
  | 'disconnected'
  | 'connecting'
  | 'connected'
  | 'reconnecting';

class ProductionTranscriptionStream extends EventEmitter {
  private ws: WebSocket | null = null;
  private state: ConnectionState = 'disconnected';
  private reconnectAttempt = 0;
  private audioBuffer: string[] = [];
  private segments: string[] = [];
  private currentSegmentIndex = 0;
  private token: string | null = null;
  private abortController: AbortController | null = null;

  private readonly config: Required<StreamConfig>;

  constructor(config: StreamConfig) {
    super();
    this.config = {
      maxReconnectAttempts: 5,
      baseReconnectDelayMs: 1000,
      maxReconnectDelayMs: 30000,
      audioBufferMaxSize: 100, // Buffer up to 100 audio chunks
      language: 'en',
      ...config,
    };
  }

  async connect(signal?: AbortSignal): Promise<void> {
    if (signal?.aborted) {
      throw new Error('Connection aborted');
    }

    this.abortController = new AbortController();
    this.state = 'connecting';
    this.emit('stateChange', this.state);

    try {
      // Get fresh token
      this.token = await this.fetchToken();
      await this.establishConnection();
    } catch (error) {
      this.state = 'disconnected';
      this.emit('stateChange', this.state);
      throw error;
    }
  }

  private async fetchToken(): Promise<string> {
    const response = await fetch(
      'https://api.sully.ai/v1/audio/transcriptions/stream/token',
      {
        method: 'POST',
        headers: {
          'X-API-Key': this.config.apiKey,
          'X-Account-Id': this.config.accountId,
        },
        signal: this.abortController?.signal,
      }
    );

    if (!response.ok) {
      throw new Error(`Token fetch failed: ${response.status}`);
    }

    const { data } = await response.json();
    return data.token;
  }

  private async establishConnection(): Promise<void> {
    return new Promise((resolve, reject) => {
      const params = new URLSearchParams({
        sample_rate: this.config.sampleRate.toString(),
        account_id: this.config.accountId,
        api_token: this.token!,
      });

      if (this.config.language) {
        params.set('language', this.config.language);
      }

      const url = `wss://api.sully.ai/v1/audio/transcriptions/stream?${params}`;
      this.ws = new WebSocket(url);

      const connectionTimeout = setTimeout(() => {
        this.ws?.close();
        reject(new Error('Connection timeout'));
      }, 10000);

      this.ws.onopen = () => {
        clearTimeout(connectionTimeout);
        this.state = 'connected';
        this.reconnectAttempt = 0;
        this.emit('stateChange', this.state);
        this.emit('connected');

        // Flush buffered audio
        this.flushAudioBuffer();
        resolve();
      };

      this.ws.onmessage = (event) => {
        this.handleMessage(event.data);
      };

      this.ws.onerror = (error) => {
        clearTimeout(connectionTimeout);
        this.emit('error', error);
      };

      this.ws.onclose = (event) => {
        clearTimeout(connectionTimeout);
        this.handleDisconnect(event);
        if (this.state === 'connecting') {
          reject(new Error(`Connection closed: ${event.code}`));
        }
      };
    });
  }

  private handleMessage(data: string): void {
    try {
      const message = JSON.parse(data);

      if (message.error) {
        this.emit('error', new Error(message.error));
        return;
      }

      if (message.text !== undefined) {
        this.segments[this.currentSegmentIndex] = message.text;

        const segment: TranscriptionSegment = {
          index: this.currentSegmentIndex,
          text: message.text,
          isFinal: message.isFinal ?? false,
        };

        this.emit('transcription', segment);

        if (message.isFinal) {
          this.currentSegmentIndex++;
        }
      }
    } catch (error) {
      this.emit('error', new Error(`Failed to parse message: ${data}`));
    }
  }

  private async handleDisconnect(event: CloseEvent): Promise<void> {
    const wasConnected = this.state === 'connected';
    this.ws = null;

    // Normal closure or intentional disconnect
    if (event.code === 1000 || this.state === 'disconnected') {
      this.state = 'disconnected';
      this.emit('stateChange', this.state);
      this.emit('disconnected', { code: event.code, reason: event.reason });
      return;
    }

    // Unexpected disconnect - attempt reconnection
    if (wasConnected && this.reconnectAttempt < this.config.maxReconnectAttempts) {
      await this.attemptReconnect();
    } else {
      this.state = 'disconnected';
      this.emit('stateChange', this.state);
      this.emit('disconnected', {
        code: event.code,
        reason: event.reason,
        reconnectFailed: true,
      });
    }
  }

  private async attemptReconnect(): Promise<void> {
    this.state = 'reconnecting';
    this.emit('stateChange', this.state);

    const delay = this.calculateBackoff();
    this.emit('reconnecting', {
      attempt: this.reconnectAttempt + 1,
      maxAttempts: this.config.maxReconnectAttempts,
      delayMs: delay,
    });

    await this.sleep(delay);
    this.reconnectAttempt++;

    try {
      // Get fresh token for reconnection
      this.token = await this.fetchToken();
      await this.establishConnection();
    } catch (error) {
      this.emit('error', error);
      // Will trigger another reconnect attempt via onclose handler
    }
  }

  private calculateBackoff(): number {
    const exponentialDelay =
      this.config.baseReconnectDelayMs * Math.pow(2, this.reconnectAttempt);
    const cappedDelay = Math.min(
      exponentialDelay,
      this.config.maxReconnectDelayMs
    );
    const jitter = cappedDelay * Math.random() * 0.25;
    return Math.floor(cappedDelay + jitter);
  }

  sendAudio(audioData: ArrayBuffer | Uint8Array): void {
    const base64Audio = this.arrayBufferToBase64(audioData);

    if (this.state === 'connected' && this.ws?.readyState === WebSocket.OPEN) {
      // Send immediately if connected
      this.ws.send(JSON.stringify({ audio: base64Audio }));
    } else if (
      this.state === 'reconnecting' ||
      this.state === 'connecting'
    ) {
      // Buffer audio during reconnection
      this.bufferAudio(base64Audio);
    }
    // Drop audio if disconnected (not reconnecting)
  }

  private bufferAudio(base64Audio: string): void {
    this.audioBuffer.push(base64Audio);

    // Prevent unbounded buffer growth
    while (this.audioBuffer.length > this.config.audioBufferMaxSize) {
      this.audioBuffer.shift();
      this.emit('bufferOverflow');
    }
  }

  private flushAudioBuffer(): void {
    if (this.audioBuffer.length === 0) return;

    const bufferedCount = this.audioBuffer.length;
    this.emit('bufferFlush', { count: bufferedCount });

    for (const base64Audio of this.audioBuffer) {
      if (this.ws?.readyState === WebSocket.OPEN) {
        this.ws.send(JSON.stringify({ audio: base64Audio }));
      }
    }

    this.audioBuffer = [];
  }

  private arrayBufferToBase64(buffer: ArrayBuffer | Uint8Array): string {
    const bytes = buffer instanceof Uint8Array ? buffer : new Uint8Array(buffer);
    let binary = '';
    for (let i = 0; i < bytes.byteLength; i++) {
      binary += String.fromCharCode(bytes[i]);
    }
    return btoa(binary);
  }

  private sleep(ms: number): Promise<void> {
    return new Promise((resolve) => setTimeout(resolve, ms));
  }

  getTranscript(): string {
    return this.segments.join(' ');
  }

  getState(): ConnectionState {
    return this.state;
  }

  disconnect(): void {
    this.state = 'disconnected';
    this.abortController?.abort();
    this.ws?.close(1000, 'Client disconnect');
    this.ws = null;
    this.audioBuffer = [];
  }
}

// Usage example
const stream = new ProductionTranscriptionStream({
  accountId: process.env.SULLY_ACCOUNT_ID!,
  apiKey: process.env.SULLY_API_KEY!,
  sampleRate: 16000,
  language: 'en',
});

stream.on('stateChange', (state) => {
  console.log(`Connection state: ${state}`);
});

stream.on('transcription', (segment) => {
  if (segment.isFinal) {
    console.log(`[Final] ${segment.text}`);
  } else {
    console.log(`[Interim] ${segment.text}`);
  }
});

stream.on('reconnecting', ({ attempt, maxAttempts, delayMs }) => {
  console.log(`Reconnecting (${attempt}/${maxAttempts}) in ${delayMs}ms`);
});

stream.on('error', (error) => {
  console.error('Stream error:', error);
});

await stream.connect();

// Send audio from microphone, file, etc.
// stream.sendAudio(audioChunk);

// When done
// stream.disconnect();

Error Recovery Strategies

ErrorRecovery Strategy
Connection timeoutRetry with backoff, check network
Token expired (401)Fetch new token, reconnect
Rate limited (429)Use Retry-After header, increase backoff
Server error (5xx)Retry with backoff
WebSocket error messageSurface to caller, keep listening for follow-up messages, reconnect if the socket closes
Invalid audio formatCheck sample rate, encoding
Network disconnectReconnect with buffered audio
Always implement a maximum reconnection limit. Infinite reconnection loops can drain device batteries and create unnecessary server load.

Language Support

Sully.ai supports transcription in multiple languages using BCP47 language tags.

Supported Languages

LanguageTagRegional Variants
Englishenen-US, en-GB, en-AU
Spanisheses-US, es-ES, es-MX
Chinesezhzh-CN, zh-TW
Frenchfrfr-FR, fr-CA
Germandede-DE
Portugueseptpt-BR, pt-PT
Japanesejaja-JP
Koreankoko-KR

Multilingual Mode

For conversations that switch between languages, use language=multi:
// File upload with multilingual support
const transcription = await client.audio.transcriptions.create({
  audio: fs.createReadStream('multilingual-visit.mp3'),
  language: 'multi',
});

Language in Streaming

Specify language when connecting to the WebSocket:
wss://api.sully.ai/v1/audio/transcriptions/stream?sample_rate=16000&account_id={id}&api_token={token}&language=es
Audio in languages other than the specified language will be filtered out. Use multi if your conversations include multiple languages.

Choosing Upload vs Stream

Use this decision matrix to select the right approach:
CriterionFile UploadReal-time Stream
Use CasePre-recorded audio, batch processingLive patient visits
Latency RequirementSeconds to minutes acceptableImmediate feedback needed
File SizeAny size up to 100MBN/A (continuous stream)
Network ReliabilitySingle requestRequires stable connection
Implementation ComplexitySimple (HTTP upload + polling)Complex (WebSocket + reconnection)
Offline SupportUpload when onlineRequires active connection

When to Use File Upload

  • Processing recorded audio from devices or archives
  • Batch transcription of multiple files
  • Integration with systems that produce audio files
  • Environments with unreliable network connectivity (upload when stable)
  • Backend processing pipelines

When to Use Real-time Streaming

  • Live transcription during patient visits
  • Providing immediate visual feedback to clinicians
  • Interactive applications where users see text as they speak
  • Reducing perceived latency in clinical workflows
  • Mobile applications with microphone access
Many applications use both approaches: real-time streaming for live visits with immediate feedback, and file upload for processing any recordings that were captured offline.

Next Steps

Generate Notes

Convert transcriptions into structured clinical notes

Webhooks

Get notified when transcriptions complete

TypeScript SDK

Full SDK reference for Node.js applications

Python SDK

Full SDK reference for Python applications