Telegram Automation

Retry Failed Scheduled Telegram Messages

Build resilient delivery with exponential backoff, jitter, idempotency, deduplication, DLQ, and flood-control aware throttling. Keep campaigns reliable 24/7.

Retry architecture for Telegram scheduled messages

Why Scheduled Messages Fail

Transient API/Network Errors

  • 5xx upstream errors
  • DNS / TLS / timeout hiccups
  • 429 Too Many Requests (flood control)

Permanent Errors

  • 400: bad request (invalid chat/message)
  • 403: forbidden (bot not an admin, blocked)
  • 410: chat migrated / deactivated
Classify errors: retry only transient ones. For permanent errors, mark as failed and notify.

Core Retry Strategy (Production-Ready)

Exponential Backoff + Full Jitter

Start with a small delay (e.g., 1–2s) and double each attempt with random jitter up to a cap (e.g., 2–5 min). Jitter avoids synchronized bursts.

Idempotency & Deduplication

Attach an idempotency key to each logical message. Store success records keyed by (chat_id, key) and ignore repeats to prevent duplicates.

Queues & Dead-Letter Queue (DLQ)

Put sends into a queue (ready → retrying → succeeded/failed). After N attempts or T window, move to DLQ for manual review or later replay.

Rate-Limit Aware Throttling

Maintain global and per-chat tokens. On 429, honor retry-after (or a safe window), then resume with backoff. Spread bursts with jitter.

Circuit Breaker

If upstream error rate spikes, open the circuit, pause sends, and gradually close with a half-open probe to protect capacity.

Observability & Alerts

Emit metrics for attempts, success, failure, 429s, backoff time. Add structured logs with job id, chat_id, idempotency key, attempt.

Code Samples with Backoff & Idempotency

Node.js (fetch) – Retry on 5xx/429 with Jitter

import crypto from "node:crypto";

const BOT = process.env.BOT_TOKEN!;
const API = `https://api.telegram.org/bot${BOT}/sendMessage`;

function sleep(ms:number){ return new Promise(r=>setTimeout(r,ms)); }
function jitter(base:number){ return Math.floor(Math.random()*base); }

async function sendWithRetry({chatId, text, key}:{chatId:number|string, text:string, key:string}) {
  const maxAttempts = 7;
  let attempt = 0;
  let delay = 1000; // 1s

  while (attempt < maxAttempts) {
    attempt++;
    const body = new URLSearchParams({
      chat_id: String(chatId),
      text,
      // idempotency key as a custom header proxy (store/compare in your backend)
      // You can also store `key` with (chat_id) in DB on success to dedupe.
    });

    try {
      const res = await fetch(API, { method: "POST", body });
      const ok = res.ok;
      if (ok) return await res.json();

      // 429 handling: respect retry-after when available
      if (res.status === 429) {
        const ra = Number(res.headers.get("retry-after") || "2");
        await sleep((ra * 1000) + jitter(500));
      } else if (res.status >= 500) {
        // transient 5xx
        await sleep(delay + jitter(delay));
        delay = Math.min(delay * 2, 120000); // cap 2 min
      } else {
        // 4xx permanent -> do not retry
        const err = await res.text();
        throw new Error(`Permanent error ${res.status}: ${err}`);
      }
    } catch (e:any) {
      // network timeouts/etc -> retry with backoff
      await sleep(delay + jitter(delay));
      delay = Math.min(delay * 2, 120000);
      if (attempt === maxAttempts) throw e;
    }
  }
}

// usage
sendWithRetry({
  chatId: 123456789,
  text: "Weekly report is ready 📊",
  key: crypto.randomUUID(),
}).catch(console.error);

Python (requests) – Backoff + DLQ Decision

import os, time, random, requests

BOT = os.environ["BOT_TOKEN"]
API = f"https://api.telegram.org/bot{BOT}/sendMessage"

def sleep_ms(ms): time.sleep(ms/1000)

def send_with_retry(chat_id:int, text:str, key:str):
    max_attempts = 7
    delay = 1000
    for attempt in range(1, max_attempts+1):
        try:
            r = requests.post(API, data={"chat_id": chat_id, "text": text}, timeout=10)
            if r.ok:
                return r.json()
            if r.status_code == 429:
                ra = int(r.headers.get("retry-after", "2"))
                sleep_ms(ra*1000 + random.randint(0,500))
            elif 500 <= r.status_code < 600:
                sleep_ms(delay + random.randint(0, delay))
                delay = min(delay*2, 120000)
            else:
                # 4xx permanent
                raise RuntimeError(f"Permanent error {r.status_code}: {r.text}")
        except Exception as e:
            if attempt == max_attempts:
                # push to DLQ here
                raise
            sleep_ms(delay + random.randint(0, delay))
            delay = min(delay*2, 120000)

# usage
# send_with_retry(123456789, "Hello with resilience 🔁", key="order-987654")

Operational Playbook

Per-Chat & Global Throttles

Maintain two token buckets: one per chat_id, one global. This keeps hot groups from starving others and respects flood control.

Batching + Jitter

Send in small batches (e.g., 20–50) with random pause 1–3s between batches to stay under thresholds and reduce spam reports.

Categorize Failures

Tag errors: transient (retry), permanent (no retry), policy (needs permission/admin), content (too large/invalid).

Alerting & SLOs

Alert on spikes in 429/5xx, rising latency, or DLQ growth. Track SLOs for delivery success and time-to-deliver.

Frequently Asked Questions

Which errors should trigger a retry for Telegram Bot API?

Retry on transient errors like 500/502/503/504 and 429 Too Many Requests (respecting the retry-after). Do not retry on 400-type permanent errors (e.g., chat not found, forbidden).

What backoff strategy is best?

Exponential backoff with full jitter is the safest for large fan-out sends. It spreads bursts and reduces thundering herd problems.

How do I avoid duplicate messages during retries?

Use idempotency keys per logical send and deduplicate by storing the last successful send for (chat_id, key). Ignore replays with the same key.

Do I need a Dead Letter Queue (DLQ)?

Yes. After N attempts or T time, move the job to a DLQ for manual inspection or later reprocessing.

What about rate limits?

Throttle globally and per-chat. When you receive 429, pause per the retry-after header or a safe default window and resume with backoff.

Make Your Schedules Unstoppable

Turn failures into recoveries with smart retries, throttling, and DLQ. Confident delivery for every campaign.