Integrail.ai Help
  • Build agents with Integrail.ai
  • Getting Started
    • Quickstart
  • Agent Studio
    • Design Agents
      • Export/Import agents to JSON
      • Misc
      • LLM
      • Embeddings
      • Text to Image
      • Image to Text
      • Image to Image
      • Image to Video
      • 3D
      • Text to Speech
      • Speech to Text
      • Web
    • Memory Management
    • Benchmark Tool
    • Integrations
      • WIX
      • OpenAI
      • Llama
      • Microsoft
      • Google Vertex AI
      • Fireworks AI
      • Claude
    • Resources
  • SDK Documentation
  • API
  • Troubleshooting
    • Request assistance with agent
Powered by GitBook
On this page
  1. Agent Studio
  2. Design Agents

Speech to Text

OpenAI Whisper 1

The Speech to Text (OpenAI Whisper 1) node allows you to convert audio files into text using AI models designed for audio transcription. Below is a breakdown of its key components:

  • Label: The name of the node, labeled OpenAI Whisper 1 by default. You can rename it based on your specific workflow or use case.

  • Audio: This required field is where you upload the audio file that needs to be transcribed. You can either provide a file URL or upload an audio file directly. Supported formats include .mp3, and the maximum file size is 500 MB.

  • Prompt: This field allows you to provide specific instructions or context for how the transcription should be handled. For example, you can give guidance on certain phrases or terms.

  • Temperature: This controls the randomness of the transcription output. A higher temperature value (closer to 2) produces more creative and varied outputs, while a lower value (closer to 0) makes the output more predictable and consistent.

  • Fallback Outputs: This field allows you to specify an alternative output or action in case the transcription fails, ensuring the workflow continues without disruption.

  • Results: The transcribed text from the audio file will be shown here, which can then be passed to subsequent nodes or used as needed.

Usage

This node is ideal for transcribing audio content into text, whether for generating subtitles, transcribing meetings, or converting spoken content into written format. It can be used in workflows that involve voice data or audio files that need to be converted into text for further processing.


We’d love to hear from you! Reach out to documentation@integrail.ai

PreviousText to SpeechNextWeb

Last updated 6 months ago