Why Scheduled Messages Fail
Transient API/Network Errors
- 5xx upstream errors
- DNS / TLS / timeout hiccups
- 429 Too Many Requests (flood control)
Permanent Errors
- 400: bad request (invalid chat/message)
- 403: forbidden (bot not an admin, blocked)
- 410: chat migrated / deactivated
Core Retry Strategy (Production-Ready)
Exponential Backoff + Full Jitter
Start with a small delay (e.g., 1–2s) and double each attempt with random jitter up to a cap (e.g., 2–5 min). Jitter avoids synchronized bursts.
Idempotency & Deduplication
Attach an idempotency key to each logical message. Store success records keyed by (chat_id, key) and ignore repeats to prevent duplicates.
Queues & Dead-Letter Queue (DLQ)
Put sends into a queue (ready → retrying → succeeded/failed). After N attempts or T window, move to DLQ for manual review or later replay.
Rate-Limit Aware Throttling
Maintain global and per-chat tokens. On 429, honor retry-after (or a safe window), then resume with backoff. Spread bursts with jitter.
Circuit Breaker
If upstream error rate spikes, open the circuit, pause sends, and gradually close with a half-open probe to protect capacity.
Observability & Alerts
Emit metrics for attempts, success, failure, 429s, backoff time. Add structured logs with job id, chat_id, idempotency key, attempt.
Code Samples with Backoff & Idempotency
Node.js (fetch) – Retry on 5xx/429 with Jitter
import crypto from "node:crypto";
const BOT = process.env.BOT_TOKEN!;
const API = `https://api.telegram.org/bot${BOT}/sendMessage`;
function sleep(ms:number){ return new Promise(r=>setTimeout(r,ms)); }
function jitter(base:number){ return Math.floor(Math.random()*base); }
async function sendWithRetry({chatId, text, key}:{chatId:number|string, text:string, key:string}) {
const maxAttempts = 7;
let attempt = 0;
let delay = 1000; // 1s
while (attempt < maxAttempts) {
attempt++;
const body = new URLSearchParams({
chat_id: String(chatId),
text,
// idempotency key as a custom header proxy (store/compare in your backend)
// You can also store `key` with (chat_id) in DB on success to dedupe.
});
try {
const res = await fetch(API, { method: "POST", body });
const ok = res.ok;
if (ok) return await res.json();
// 429 handling: respect retry-after when available
if (res.status === 429) {
const ra = Number(res.headers.get("retry-after") || "2");
await sleep((ra * 1000) + jitter(500));
} else if (res.status >= 500) {
// transient 5xx
await sleep(delay + jitter(delay));
delay = Math.min(delay * 2, 120000); // cap 2 min
} else {
// 4xx permanent -> do not retry
const err = await res.text();
throw new Error(`Permanent error ${res.status}: ${err}`);
}
} catch (e:any) {
// network timeouts/etc -> retry with backoff
await sleep(delay + jitter(delay));
delay = Math.min(delay * 2, 120000);
if (attempt === maxAttempts) throw e;
}
}
}
// usage
sendWithRetry({
chatId: 123456789,
text: "Weekly report is ready 📊",
key: crypto.randomUUID(),
}).catch(console.error);Python (requests) – Backoff + DLQ Decision
import os, time, random, requests
BOT = os.environ["BOT_TOKEN"]
API = f"https://api.telegram.org/bot{BOT}/sendMessage"
def sleep_ms(ms): time.sleep(ms/1000)
def send_with_retry(chat_id:int, text:str, key:str):
max_attempts = 7
delay = 1000
for attempt in range(1, max_attempts+1):
try:
r = requests.post(API, data={"chat_id": chat_id, "text": text}, timeout=10)
if r.ok:
return r.json()
if r.status_code == 429:
ra = int(r.headers.get("retry-after", "2"))
sleep_ms(ra*1000 + random.randint(0,500))
elif 500 <= r.status_code < 600:
sleep_ms(delay + random.randint(0, delay))
delay = min(delay*2, 120000)
else:
# 4xx permanent
raise RuntimeError(f"Permanent error {r.status_code}: {r.text}")
except Exception as e:
if attempt == max_attempts:
# push to DLQ here
raise
sleep_ms(delay + random.randint(0, delay))
delay = min(delay*2, 120000)
# usage
# send_with_retry(123456789, "Hello with resilience 🔁", key="order-987654")Operational Playbook
Per-Chat & Global Throttles
Batching + Jitter
Categorize Failures
Alerting & SLOs
Frequently Asked Questions
Which errors should trigger a retry for Telegram Bot API?▼
Retry on transient errors like 500/502/503/504 and 429 Too Many Requests (respecting the retry-after). Do not retry on 400-type permanent errors (e.g., chat not found, forbidden).
What backoff strategy is best?▼
Exponential backoff with full jitter is the safest for large fan-out sends. It spreads bursts and reduces thundering herd problems.
How do I avoid duplicate messages during retries?▼
Use idempotency keys per logical send and deduplicate by storing the last successful send for (chat_id, key). Ignore replays with the same key.
Do I need a Dead Letter Queue (DLQ)?▼
Yes. After N attempts or T time, move the job to a DLQ for manual inspection or later reprocessing.
What about rate limits?▼
Throttle globally and per-chat. When you receive 429, pause per the retry-after header or a safe default window and resume with backoff.
Make Your Schedules Unstoppable
Turn failures into recoveries with smart retries, throttling, and DLQ. Confident delivery for every campaign.
