Image to Text
The Image to Text node allows you to process an image and generate a descriptive text based on the contents of that image using AI models designed for visual interpretation. Here’s a breakdown of the key settings:
Label: This is the name of the node, which by default is labeled Image to Text. You can rename it based on the specific function you are using it for.
Model: This field specifies the model used for processing images. List of available models in Resources. Pricing details are provided for input and output tokens to help you manage costs.
Image: This required field is where you input the image that the model will analyze. You can either enter a file URL or upload an image directly. Supported formats include
.jpg
,.jpeg
, and.png
, with a maximum file size of 500 MB.User Prompt: This is a required field where you can add instructions or contextual information to guide the model's analysis of the image. For example, you can ask the model to generate a descriptive sentence or answer specific questions about the image.
System Prompt: You can include specific guidelines in this field to control how the model behaves or how detailed the output should be when describing the image.
Temperature: This controls the randomness or creativity of the model’s output. A higher temperature value produces more creative results, while a lower value makes the responses more predictable and focused.
Presence Penalty: This parameter discourages the model from repeating the same tokens or information, ensuring a diverse and fresh output. Higher values encourage the generation of new content.
Max Tokens: This setting defines the maximum number of tokens the model can use when describing the image. Limiting the tokens can control the length of the output.
Top P: This controls the diversity of responses by adjusting how much of the probability mass the model considers. Lower values focus on more likely responses, while higher values allow for more varied descriptions.
Usage
This node is ideal for tasks where an image needs to be analyzed, and a descriptive text or answer is required based on the image content. This is particularly useful for image captioning, generating insights from visual data, or answering questions about images.
We’d love to hear from you! Reach out to documentation@integrail.ai
Last updated