Autonomo AI API Documentation

Welcome to the API documentation for Autonomo AI's WhatsApp API

Autonomo AI's Incoming WhatsApp Webhook

POST /webhook/whatsapp

This endpoint is where Autonomo AI receives incoming messages from WhatsApp (after potentially being processed or adapted from a provider like WAHA). It forwards the message content to an LLM service for processing and sends the generated reply back to the user via WhatsApp.

Security: This endpoint requires an X-API-Key header with a valid API key for authentication, which is internal to Autonomo AI's system.

Request Body (JSON):

The webhook expects a JSON payload containing message details. The critical fields are message.from (sender's ID) and message.text (message content). This format is often a simplified representation after initial parsing from a WhatsApp gateway like WAHA.

{
  "message": {
    "from": "whatsapp:+1234567890",
    "text": "Hello, can you help me with something?"
  },
  "//": "Other fields like 'type', 'timestamp', etc., might be present but are not processed by this webhook if not explicitly documented."
}

Request Headers:

X-API-Key (String): Your secret API key for Autonomo AI. Required

Parameters (in JSON body):

message (Object): The main object containing message details. Required
message.from (String): The sender's unique identifier (e.g., a WhatsApp phone number prefixed with 'whatsapp:+'). Required
message.text (String): The content of the message sent by the user. Required

Responses:

200 OK: The webhook successfully received and authenticated the message. The reply will be sent asynchronously.

{
  "status": "ok"
}

400 Bad Request: The payload was invalid or missing required fields.

{
  "error": "Invalid payload"
}

401 Unauthorized: The X-API-Key header was missing or invalid.

{
  "success": false,
  "message": "Unauthorized: Invalid or missing API Key."
}

WAHA Incoming Webhook Formats

This section describes the various JSON payloads that Autonomo AI's backend expects to receive from a configured WAHA instance's webhook, corresponding to different types of WhatsApp events. Autonomo AI's /webhook/whatsapp endpoint would then parse these and potentially convert them to its internal message format for processing.

Text Message Event (event: "message")

Triggered when a user sends a standard text message to the WhatsApp number linked with WAHA.

Example Webhook Payload:

{
  "event": "message",
  "session": "default",
  "payload": {
    "id": "true_11111111111@c.us_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
    "timestamp": 1667561485,
    "from": "11111111111@c.us",
    "fromMe": false,
    "to": "11111111111@c.us",
    "body": "Hi there!",
    "hasMedia": false,
    "ack": 1,
    "vCards": [],
    "_data": {
      "//": "Internal WAHA data"
    }
  }
}

Payload Fields:

event (String): Type of event, typically "message". Required
session (String): The WAHA session name, e.g., "default". Required
payload (Object): Contains the message details. Required
payload.id (String): Unique ID of the message. Required
payload.timestamp (Number): Unix timestamp of when the message was sent. Required
payload.from (String): Sender's WhatsApp ID (e.g., "11111111111@c.us"). Required
payload.fromMe (Boolean): Indicates if the message was sent by the WAHA instance itself. Required
payload.to (String): Recipient's WhatsApp ID. Required
payload.body (String): The content of the text message. Required
payload.hasMedia (Boolean): false for text messages. Required

Message Reaction Event (event: "message.reaction")

Triggered when a user adds, removes, or changes an emoji reaction to a message.

Example Webhook Payload:

{
    "event": "message.reaction",
    "session": "default",
    "me": {
        "id": "79222222222@c.us",
        "pushName": "WAHA"
    },
    "payload": {
        "id": "false_79111111@c.us_11111111111111111111111111111111",
        "from": "79111111@c.us",
        "fromMe": false,
        "participant": "79111111@c.us",
        "to": "79111111@c.us",
        "timestamp": 1710481111.853,
        "reaction": {
            "text": "🙏",
            "messageId": "true_79111111@c.us_11111111111111111111111111111111"
        }
    },
    "engine": "WEBJS",
    "environment": {
        "version": "2024.3.3",
        "engine": "WEBJS",
        "tier": "PLUS",
        "browser": "/usr/bin/google-chrome-stable"
    }
}

Payload Fields:

event (String): Type of event, "message.reaction". Required
session (String): The WAHA session name. Required
payload (Object): Contains reaction details. Required
payload.id (String): ID of the reaction event. Required
payload.from (String): WhatsApp ID of the user who reacted. Required
payload.reaction (Object): Details of the reaction. Required
payload.reaction.text (String): The emoji character for the reaction (e.g., "🙏"). Required
payload.reaction.messageId (String): The ID of the message to which the reaction was applied. Required

Media Message Event (event: "message")

Triggered when a user sends any type of media (image, video, audio, document) to the WhatsApp number linked with WAHA.

Example Webhook Payload:

{
  "event": "message",
  "session": "default",
  "payload": {
    "id": "true_11111111111@c.us_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
    "timestamp": 1667561485,
    "body": "Check this out (caption for the media)!",
    "from": "11111111111@c.us",
    "hasMedia": true,
    "media": {
      "url": "http://localhost:3000/api/files/true_11111111111@c.us_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.jpg",
      "mimetype": "image/jpeg",
      "filename": null,
      "error": null
    }
  }
}

Payload Fields:

event (String): Type of event, "message". Required
session (String): The WAHA session name. Required
payload (Object): Contains the media message details. Required
payload.id (String): Unique ID of the message. Required
payload.timestamp (Number): Unix timestamp of when the message was sent. Required
payload.from (String): Sender's WhatsApp ID. Required
payload.body (String): The caption associated with the media, if any. Optional
payload.hasMedia (Boolean): true for media messages. Required
payload.media (Object): Details about the media file. Required
payload.media.url (String): URL to download the media file. Required
payload.media.mimetype (String): MIME type of the media (e.g., "image/jpeg", "video/mp4"). Required
payload.media.filename (String): Original filename of the media, if available. Optional
payload.media.error (String): Error message if media download failed. Optional

Sending Messages via WAHA API (Outgoing)

To send messages and manage chat presence on WhatsApp, Autonomo AI leverages the WAHA API. This section outlines the recommended process and specific API endpoints to ensure compliant and human-like interaction and to avoid being flagged as spam by WhatsApp.

Important Guidelines: How to Process Messages

When processing messages in your bot, it’s important to follow certain steps to avoid being flagged as spam. Here’s a recommended process to follow:

Send seen before processing the message. This can be done by sending a POST /api/sendSeen/ request to the WAHA API.
Start typing before sending a message and wait for a random interval depending on the size of the message. This can be done by sending a POST /api/startTyping/ request.
Stop typing before sending the message. This can be done by sending a POST /api/stopTyping/ request.
Send the text message using the POST /api/sendText request.

By following these steps, you can ensure that your bot processes messages in a way that’s compliant with WhatsApp’s guidelines and reduces the risk of being blocked.

Send Seen

POST /api/sendSeen

Sends a "seen" (read) receipt for all unread messages in a given chat. This is crucial for avoiding blocking and mimicking human behavior.

Request Body (JSON):

{
  "session": "default",
  "chatId": "11111111111@c.us"
}

Parameters (in JSON body):

session (String): The WAHA session name, typically "default". Required
chatId (String): The WhatsApp ID of the chat (e.g., "11111111111@c.us"). Required

Responses:

200 OK: Request to send seen receipt was accepted.

{
  "success": true,
  "message": "Seen receipt sent"
}

400 Bad Request: Invalid chatId or missing parameters.

{
  "success": false,
  "message": "Invalid chatId provided."
}

Start Typing

POST /api/startTyping

Sets the chat presence to "typing" for a specified duration, indicating that the bot is preparing a response. This makes interactions feel more natural.

Request Body (JSON):

{
  "session": "default",
  "chatId": "111111111@c.us",
  "presence": "typing"
}

Parameters (in JSON body):

session (String): The WAHA session name. Required
chatId (String): The WhatsApp ID of the chat. Required
presence (String): Must be "typing". Required

Responses:

200 OK: Typing presence initiated.

{
  "success": true,
  "message": "Typing presence set to 'typing'"
}

Stop Typing

POST /api/stopTyping

Resets the chat presence from "typing" back to "paused" or normal, typically done just before sending the actual message.

Request Body (JSON):

{
  "session": "default",
  "chatId": "111111111@c.us",
  "presence": "paused"
}

Parameters (in JSON body):

session (String): The WAHA session name. Required
chatId (String): The WhatsApp ID of the chat. Required
presence (String): Must be "paused". Required

Responses:

200 OK: Typing presence stopped.

{
  "success": true,
  "message": "Typing presence set to 'paused'"
}

Send Text Message

POST /api/sendText

Sends a standard text message to a specified WhatsApp chat.

Request Body (JSON):

{
  "session": "default",
  "chatId": "11111111111@c.us",
  "message": "This is your reply from Autonomo AI!"
}

Parameters (in JSON body):

session (String): The WAHA session name. Required
chatId (String): The WhatsApp ID of the recipient chat. Required
message (String): The text content of the message to send. Required

Responses:

200 OK: Message successfully queued for sending.

{
  "success": true,
  "message": "Message sent successfully",
  "id": "true_11111111111@c.us_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
}

400 Bad Request: Invalid chatId or missing message content.

{
  "success": false,
  "message": "Invalid message or chatId provided."
}

500 Internal Server Error: An error occurred on the WAHA server.

{
  "success": false,
  "message": "Failed to send message due to an internal server error."
}

Autonomo Concierge

Realtime Bidirectional Voice Stream

POST /voice/stream

Streams human microphone audio to the AI and returns synthesized audio in real time. Implemented via HTTP/2 streaming or WebSocket.

Operation ID: streamVoice

Security: Requires X-Api-Key header.

Request Body:

Accepts audio data in WAV, WEBM, or generic binary format.

// Example payload would be raw binary audio data
Content-Type: audio/wav
// or
Content-Type: audio/webm
// or
Content-Type: application/octet-stream

(Binary Audio Data)


            Request Headers:
            
                X-Api-Key (String): Your API key. Required
                X-Device-ID (String): Optional device identifier. Optional
            
            Responses:
            200 OK: Streamed AI voice response (audio or interleaved metadata).
            Content Types:
            
                audio/mpeg (Binary): Synthesized speech in MP3 format.
                audio/opus (Binary): Synthesized speech in Opus format.
                application/json (Object): Metadata about the AI response. {
  "type": "metadata",
  "conversation_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
  "timestamp": 1678886400,
  "ai_text": "How can I help you today?",
  "emotion": "neutral",
  "model": "gpt-4"
}
            
            400 Bad Request: Invalid audio format.
            {
  "error": "Invalid audio format",
  "message": "The provided audio data could not be processed."
}
            401 Unauthorized: Authentication failed.
            {
  "error": "Unauthorized",
  "message": "Invalid or missing API key."
}
            500 Server Error: An internal server error occurred.
            {
  "error": "Internal Server Error",
  "message": "An unexpected error occurred on the server."
}



        
        Upload Recorded Voice Clip
        
            POST /voice
            Non-streaming fallback endpoint: client uploads a full audio clip and receives an AI-generated speech reply.
            Operation ID: uploadVoice
            Security: Requires X-Api-Key header.
            Request Body:
            Uses multipart/form-data to upload the audio file and associated metadata.
            --boundary
Content-Disposition: form-data; name="voice"; filename="recording.wav"
Content-Type: audio/wav

(Binary Audio Data)
--boundary
Content-Disposition: form-data; name="lang"

en-US
--boundary
Content-Disposition: form-data; name="user"

a1b2c3d4-e5f6-7890-1234-567890abcdef
--boundary--
            Request Headers:
            
                X-Api-Key (String): Your API key. Required
            
            Parameters (in multipart/form-data):
            
                voice (File): Audio file (WAV/WEBM/OPUS, 16 kHz mono). Required
                lang (String): Language code (e.g., "en-US"). Optional, defaults to system default.
                user (String): UUID of the user making the request. Optional
            
            Responses:
            200 OK: Binary audio (MP3/Opus) representing the AI's speech reply.
            Content-Type: audio/mpeg

(Binary MP3 Audio Data)
            400 Bad Request: Invalid input (e.g., wrong file format, missing fields).
            {
  "error": "Bad Request",
  "message": "Invalid file format or missing required parameters."
}
            401 Unauthorized: Authentication failed.
            {
  "error": "Unauthorized",
  "message": "Invalid or missing API key."
}
        

        
        Text Chat Fallback
        
            POST /ai/chat
            Text-based alternative for debugging or non-voice clients.
            Operation ID: chatAI
            Security: Requires X-Api-Key header.
            Request Body (JSON):
            {
  "session": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
  "text": "What is the weather like today?"
}
            Request Headers:
            
                X-Api-Key (String): Your API key. Required
            
            Parameters (in JSON body):
            
                session (String): Unique identifier for the conversation session (UUID). Required
                text (String): The user's text message. Required
            
            Responses:
            200 OK: AI text reply and optional TTS link.
            {
  "reply": "The weather today is sunny with a high of 75 degrees Fahrenheit.",
  "tts_url": "https://cdn.autonomo.codes/tts/reply_12345.mp3"
}
            400 Bad Request: Invalid input (e.g., missing required fields).
            {
  "error": "Invalid Input",
  "message": "Missing required fields 'session' or 'text'."
}
            401 Unauthorized: Authentication failed.
            {
  "error": "Unauthorized",
  "message": "Invalid or missing API key."
}