Autonomo AI API Documentation

Welcome to the API documentation for Autonomo AI's WhatsApp API

Autonomo AI's Incoming WhatsApp Webhook

POST /webhook/whatsapp

This endpoint is where Autonomo AI receives incoming messages from WhatsApp (after potentially being processed or adapted from a provider like WAHA). It forwards the message content to an LLM service for processing and sends the generated reply back to the user via WhatsApp.

Security: This endpoint requires an X-API-Key header with a valid API key for authentication, which is internal to Autonomo AI's system.

Request Body (JSON):

The webhook expects a JSON payload containing message details. The critical fields are message.from (sender's ID) and message.text (message content). This format is often a simplified representation after initial parsing from a WhatsApp gateway like WAHA.

{
  "message": {
    "from": "whatsapp:+1234567890",
    "text": "Hello, can you help me with something?"
  },
  "//": "Other fields like 'type', 'timestamp', etc., might be present but are not processed by this webhook if not explicitly documented."
}

Request Headers:

Parameters (in JSON body):

Responses:

200 OK: The webhook successfully received and authenticated the message. The reply will be sent asynchronously.

{
  "status": "ok"
}

400 Bad Request: The payload was invalid or missing required fields.

{
  "error": "Invalid payload"
}

401 Unauthorized: The X-API-Key header was missing or invalid.

{
  "success": false,
  "message": "Unauthorized: Invalid or missing API Key."
}

WAHA Incoming Webhook Formats

This section describes the various JSON payloads that Autonomo AI's backend expects to receive from a configured WAHA instance's webhook, corresponding to different types of WhatsApp events. Autonomo AI's /webhook/whatsapp endpoint would then parse these and potentially convert them to its internal message format for processing.

Text Message Event (event: "message")

Triggered when a user sends a standard text message to the WhatsApp number linked with WAHA.

Example Webhook Payload:

{
  "event": "message",
  "session": "default",
  "payload": {
    "id": "true_11111111111@c.us_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
    "timestamp": 1667561485,
    "from": "11111111111@c.us",
    "fromMe": false,
    "to": "11111111111@c.us",
    "body": "Hi there!",
    "hasMedia": false,
    "ack": 1,
    "vCards": [],
    "_data": {
      "//": "Internal WAHA data"
    }
  }
}

Payload Fields:

Message Reaction Event (event: "message.reaction")

Triggered when a user adds, removes, or changes an emoji reaction to a message.

Example Webhook Payload:

{
    "event": "message.reaction",
    "session": "default",
    "me": {
        "id": "79222222222@c.us",
        "pushName": "WAHA"
    },
    "payload": {
        "id": "false_79111111@c.us_11111111111111111111111111111111",
        "from": "79111111@c.us",
        "fromMe": false,
        "participant": "79111111@c.us",
        "to": "79111111@c.us",
        "timestamp": 1710481111.853,
        "reaction": {
            "text": "🙏",
            "messageId": "true_79111111@c.us_11111111111111111111111111111111"
        }
    },
    "engine": "WEBJS",
    "environment": {
        "version": "2024.3.3",
        "engine": "WEBJS",
        "tier": "PLUS",
        "browser": "/usr/bin/google-chrome-stable"
    }
}

Payload Fields:

Media Message Event (event: "message")

Triggered when a user sends any type of media (image, video, audio, document) to the WhatsApp number linked with WAHA.

Example Webhook Payload:

{
  "event": "message",
  "session": "default",
  "payload": {
    "id": "true_11111111111@c.us_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
    "timestamp": 1667561485,
    "body": "Check this out (caption for the media)!",
    "from": "11111111111@c.us",
    "hasMedia": true,
    "media": {
      "url": "http://localhost:3000/api/files/true_11111111111@c.us_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.jpg",
      "mimetype": "image/jpeg",
      "filename": null,
      "error": null
    }
  }
}

Payload Fields:

Sending Messages via WAHA API (Outgoing)

To send messages and manage chat presence on WhatsApp, Autonomo AI leverages the WAHA API. This section outlines the recommended process and specific API endpoints to ensure compliant and human-like interaction and to avoid being flagged as spam by WhatsApp.

Important Guidelines: How to Process Messages

When processing messages in your bot, it’s important to follow certain steps to avoid being flagged as spam. Here’s a recommended process to follow:

  1. Send seen before processing the message. This can be done by sending a POST /api/sendSeen/ request to the WAHA API.
  2. Start typing before sending a message and wait for a random interval depending on the size of the message. This can be done by sending a POST /api/startTyping/ request.
  3. Stop typing before sending the message. This can be done by sending a POST /api/stopTyping/ request.
  4. Send the text message using the POST /api/sendText request.

By following these steps, you can ensure that your bot processes messages in a way that’s compliant with WhatsApp’s guidelines and reduces the risk of being blocked.

Send Seen

POST /api/sendSeen

Sends a "seen" (read) receipt for all unread messages in a given chat. This is crucial for avoiding blocking and mimicking human behavior.

Request Body (JSON):

{
  "session": "default",
  "chatId": "11111111111@c.us"
}

Parameters (in JSON body):

Responses:

200 OK: Request to send seen receipt was accepted.

{
  "success": true,
  "message": "Seen receipt sent"
}

400 Bad Request: Invalid chatId or missing parameters.

{
  "success": false,
  "message": "Invalid chatId provided."
}

Start Typing

POST /api/startTyping

Sets the chat presence to "typing" for a specified duration, indicating that the bot is preparing a response. This makes interactions feel more natural.

Request Body (JSON):

{
  "session": "default",
  "chatId": "111111111@c.us",
  "presence": "typing"
}

Parameters (in JSON body):

Responses:

200 OK: Typing presence initiated.

{
  "success": true,
  "message": "Typing presence set to 'typing'"
}

Stop Typing

POST /api/stopTyping

Resets the chat presence from "typing" back to "paused" or normal, typically done just before sending the actual message.

Request Body (JSON):

{
  "session": "default",
  "chatId": "111111111@c.us",
  "presence": "paused"
}

Parameters (in JSON body):

Responses:

200 OK: Typing presence stopped.

{
  "success": true,
  "message": "Typing presence set to 'paused'"
}

Send Text Message

POST /api/sendText

Sends a standard text message to a specified WhatsApp chat.

Request Body (JSON):

{
  "session": "default",
  "chatId": "11111111111@c.us",
  "message": "This is your reply from Autonomo AI!"
}

Parameters (in JSON body):

Responses:

200 OK: Message successfully queued for sending.

{
  "success": true,
  "message": "Message sent successfully",
  "id": "true_11111111111@c.us_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
}

400 Bad Request: Invalid chatId or missing message content.

{
  "success": false,
  "message": "Invalid message or chatId provided."
}

500 Internal Server Error: An error occurred on the WAHA server.

{
  "success": false,
  "message": "Failed to send message due to an internal server error."
}

Autonomo Concierge

Realtime Bidirectional Voice Stream

POST /voice/stream

Streams human microphone audio to the AI and returns synthesized audio in real time. Implemented via HTTP/2 streaming or WebSocket.

Operation ID: streamVoice

Security: Requires X-Api-Key header.

Request Body:

Accepts audio data in WAV, WEBM, or generic binary format.

// Example payload would be raw binary audio data
Content-Type: audio/wav
// or
Content-Type: audio/webm
// or
Content-Type: application/octet-stream

(Binary Audio Data)

Request Headers:

  • X-Api-Key (String): Your API key. Required
  • X-Device-ID (String): Optional device identifier. Optional

Responses:

200 OK: Streamed AI voice response (audio or interleaved metadata).

Content Types:

  • audio/mpeg (Binary): Synthesized speech in MP3 format.
  • audio/opus (Binary): Synthesized speech in Opus format.
  • application/json (Object): Metadata about the AI response.
    {
      "type": "metadata",
      "conversation_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
      "timestamp": 1678886400,
      "ai_text": "How can I help you today?",
      "emotion": "neutral",
      "model": "gpt-4"
    }

400 Bad Request: Invalid audio format.

{
  "error": "Invalid audio format",
  "message": "The provided audio data could not be processed."
}

401 Unauthorized: Authentication failed.

{
  "error": "Unauthorized",
  "message": "Invalid or missing API key."
}

500 Server Error: An internal server error occurred.

{
  "error": "Internal Server Error",
  "message": "An unexpected error occurred on the server."
}

Upload Recorded Voice Clip

POST /voice

Non-streaming fallback endpoint: client uploads a full audio clip and receives an AI-generated speech reply.

Operation ID: uploadVoice

Security: Requires X-Api-Key header.

Request Body:

Uses multipart/form-data to upload the audio file and associated metadata.

--boundary
Content-Disposition: form-data; name="voice"; filename="recording.wav"
Content-Type: audio/wav

(Binary Audio Data)
--boundary
Content-Disposition: form-data; name="lang"

en-US
--boundary
Content-Disposition: form-data; name="user"

a1b2c3d4-e5f6-7890-1234-567890abcdef
--boundary--

Request Headers:

  • X-Api-Key (String): Your API key. Required

Parameters (in multipart/form-data):

  • voice (File): Audio file (WAV/WEBM/OPUS, 16 kHz mono). Required
  • lang (String): Language code (e.g., "en-US"). Optional, defaults to system default.
  • user (String): UUID of the user making the request. Optional

Responses:

200 OK: Binary audio (MP3/Opus) representing the AI's speech reply.

Content-Type: audio/mpeg

(Binary MP3 Audio Data)

400 Bad Request: Invalid input (e.g., wrong file format, missing fields).

{
  "error": "Bad Request",
  "message": "Invalid file format or missing required parameters."
}

401 Unauthorized: Authentication failed.

{
  "error": "Unauthorized",
  "message": "Invalid or missing API key."
}

Text Chat Fallback

POST /ai/chat

Text-based alternative for debugging or non-voice clients.

Operation ID: chatAI

Security: Requires X-Api-Key header.

Request Body (JSON):

{
  "session": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
  "text": "What is the weather like today?"
}

Request Headers:

  • X-Api-Key (String): Your API key. Required

Parameters (in JSON body):

  • session (String): Unique identifier for the conversation session (UUID). Required
  • text (String): The user's text message. Required

Responses:

200 OK: AI text reply and optional TTS link.

{
  "reply": "The weather today is sunny with a high of 75 degrees Fahrenheit.",
  "tts_url": "https://cdn.autonomo.codes/tts/reply_12345.mp3"
}

400 Bad Request: Invalid input (e.g., missing required fields).

{
  "error": "Invalid Input",
  "message": "Missing required fields 'session' or 'text'."
}

401 Unauthorized: Authentication failed.

{
  "error": "Unauthorized",
  "message": "Invalid or missing API key."
}