Skip to main content

How to Use Audio Tags in AI Voice Generation

Updated this week

Audio Tags allow you to add expressive context to AI-generated speech. With this feature, you can guide the AI to deliver lines not only with accurate pronunciation but also with the right tone and emotion - such as excited, sad, angry, friendly, or empathetic.

This makes generated voices sound more natural, dynamic, and human-like. Instead of a neutral delivery for every line, you can shape the emotion and feeling of the voice to match your video’s message or mood.


Key Information

  • Audio Tags are supported anywhere you use Text-to-Speech (TTS) in VEED:

    • In Fabric, when generating a video

    • In the Editor, when adding a TTS voiceover

    • In any other workflow where speech is generated from text

  • Tags can be applied at the sentence or phrase level, allowing you to fine-tune tone and emotion line by line.


Beyond Emotions

Audio Tags go beyond simple emotions - they give you creative control over tone, pacing, and character delivery. Below are examples of what you can achieve:

  • Situational Awareness – Tags like [WHISPER], [SHOUTING], or [SIGH] make speech react to the scene - adding tension, softness, or anticipation.

  • Character Performance – Bring personality to your script with tags such as [pirate voice] or [French accent], transforming narration into performance.

  • Emotional Context – Use [sigh], [excited], or [tired] to add realism and depth to your storytelling without re-recording.

  • Narrative Intelligence – Control rhythm and pacing with [pause], [awe], or [dramatic tone] for storytelling that flows naturally.

  • Multi-Character Dialogue – Use [interrupting] or [overlapping] to simulate conversation or quick banter using a single voice model.

  • Delivery Control – Adjust pacing with [pause], [rushed], or [drawn out] for precise timing and emphasis.

  • Accent Emulation – Instantly change accents, e.g. [American accent], [British accent], or [Southern US accent], for global storytelling.


How to Use Audio Tags

  1. Open Your Project

    Open your project in Fabric, the Editor, or any tool that uses Text-to-Speech (TTS).

  2. Add or Write Your Script

    Write or paste your script in the TTS input field.

  3. Add Audio Tags

    Insert emotion or performance tags directly into your text. For example:

    • [angry] I can’t believe you did that!

    • [excited] That’s amazing news!

    • [sad] I’m really going to miss you.

    • [sorrowful] I couldn’t sleep that night. The air was too still, and the moonlight kept sliding through the blinds like it was trying to tell me something. [quietly] And suddenly, that’s when I saw it. [siren] [gunshot]

  4. Preview the Audio

    Play the audio to hear how the AI adjusts tone and emotion based on your tags.

  5. Adjust as Needed

    Mix emotions throughout your script to create expressive, engaging narration.

  6. Save and Export

    Once satisfied, export your project with the enhanced emotional voiceover.


FAQs

  • Q: Which emotions are supported?

    • Supported tags include:

      • Emotional tone: [EXCITED], [NERVOUS], [FRUSTRATED], [TIRED]

      • Reactions: [GASP], [SIGH], [LAUGHS], [GULPS]

      • Volume & energy: [WHISPERING], [SHOUTING], [QUIETLY], [LOUDLY]

      • Pacing & rhythm: [PAUSES], [STAMMERS], [RUSHED]

  • Q: Can I mix different emotions in the same script?

    • Yes. You can apply different tags to individual sentences or phrases for dynamic and expressive delivery.

  • Q: What happens if I don’t use tags?

    • If no tags are used, the AI will generate speech with a neutral tone.

Did this answer your question?