SARAH Code — coding by voice, by chat, by intent

Architecture Comparison · 2026

SARAH AI Suite on NVIDIA Dual DGX B300 Servers
vs. OpenClaw / Hermes on a Public-Cloud VPS

Two ways to run an agentic AI platform. One owns the hardware, the memory, the storage, and the network. The other rents all four from a multi-tenant vendor and reaches them over the Public Internet. The architectures are not comparable — and the spec sheets prove it.

Jump to specs · Why the network matters

The Two Architectures

Same agentic workload — answer a customer call, look up the CRM, book the meeting, send the email. Two completely different stacks underneath.

Sovereign · Our Mini Data Center · GB300

SARAH AI Suite

SARAH Spark 2 Router on the customer premise · up to 400 GE backhaul to our Data Centre · Dual DGX B300 Servers there runs the LLM for every call · audio never leaves your premise · zero Public-Internet hop.

Edge (on-prem): SARAH Spark 2 Router · voice path local · audio never leaves the premise
Backhaul: Up to 400 GE · Private Enterprise IP Network · physical fibre · zero Public-Internet hop
DC compute: 72× NVIDIA Blackwell Ultra · GB300 full rack · LLM inference served over the up-to-400-GE backhaul
DC VRAM: 20 TB HBM3e total · 3 GB dedicated per active conversation
Memory bandwidth: 576 TB/s aggregate · per-GPU HBM3e
Storage: Local NVMe at both ends · weights on-DC · per-call working set on-Spark
Vendor reach: Direct peering — Google Cloud, AWS, Azure, Cloudflare
Public-internet exposure: None. The platform is not addressable from the open web.
Tenant model: Single-tenant. The hardware is yours.

Tenant · Public Cloud · Shared

OpenClaw / Hermes on a VPS

Open-source agent framework on a rented GPU instance with Public-Internet ingress.

Compute: 1× shared GPU on a rented instance (A10 / A100 / H100 typical)
VRAM: 16–80 GB on instance · multi-tenant slot · no per-call dedication
Memory bandwidth: ~2–3 TB/s peak · contended with co-tenants
Storage: Cloud block storage · network-attached · ms latency
Network: Shared cloud fabric egressing the Public Internet for any external call
Vendor reach: Public-internet hop to every dependency, even same-cloud services without VPC peering
Public-internet exposure: Full attack surface · public IPs · DDoS vectors
Tenant model: Multi-tenant. Your conversation shares silicon with strangers.

Four Ways to Ask

Voice. Portal. WhatsApp. Telegram.

SARAH Code answers in the channel you opened. Start a feature by voice, watch the diff land in the portal, get the result on WhatsApp or Telegram. Same workspace, same memory, same engineer.

By voice

Call your SARAH number. Describe the change. "Add a refund button to the order page, only show it to admins." SARAH confirms scope and texts you when the diff is ready.

In the portal

An IDE-class workspace under your account. File tree, diff view, conversation log. Watch the work happen, accept or reject the change, push to your branch.

On WhatsApp

Message your SARAH number. Describe the task in plain language, attach screenshots or specs, get the diff back as a document. The fastest way to put SARAH Code in the hands of every operator on your team without onboarding them to anything new.

On Telegram

Type a task. Get a patch attachment back. Long tasks come with a progress note and a follow-up message when the work lands. Built for builders who live on their phone.

Long-Lived Workspace

One repo per customer. Forever.

Your SARAH Code workspace is a real git repository on persistent storage in our Chicago Data Center. It survives across sessions, channels, and devices. Your prior conversations, decisions, and code all live in the same place.

Persistent state

Pick up where you left off. No "what were we building again?" — SARAH already knows the answer.

Isolated

Filesystem and process isolation per customer. Your code is yours alone. We never train on it. We never cache it.

Portable

It is a real git repo. Push to GitHub, GitLab, your own Gitea — anywhere you want. We never lock you in.

SARAH Code vs DIY

Doing it the hard way.

Six layers a senior engineer ends up rebuilding the moment they pick up a raw CLI. SARAH Code ships them all in the first call.

Capability	SARAH Code	Direct CLI / DIY
Voice input	Native, sub-second confirmation, callback for long tasks	Type only. No phone, no callback, no hands-free.
Multi-channel handoff	Voice and portal and Telegram on the same workspace	One channel at a time. Context rebuilt by you each time.
Long-lived repo	Persistent per customer, hosted, backed up nightly	You install, configure, backup, restore.
Setup time	Zero. Call the number. Speak the task.	Install CLI, set keys, configure, learn.
Account + identity	One SARAH login covers voice, code, integrations, smart home	Per-tool accounts, per-tool billing, per-tool keys.
L1 support	Human support on the SARAH side, plus SARAH herself walks you through it	Docs and a community Discord.

Pricing

Per-seat, per-month.

Every tier is delivered over Private Enterprise IP Network connectivity. Zero public-internet hops between you and your workspace.

Pro

$5,000 / user / month

For teams who build software for a living.

10 GB dedicated vRAM · zero contention
Voice + portal + WhatsApp + Telegram
Long-lived per-user repo
Plan Mode + autonomous task runs
Custom permission profiles
Voice callback included
Private Enterprise IP Network delivery
Priority support

Enterprise

$10,000 / user / month

For organizations who need sovereign, isolated, regulated-industry-grade software engineering at scale.

20 GB dedicated vRAM · zero contention
Everything in Pro
Dedicated workspace volume
Private Enterprise IP Network with named-circuit option
Named technical contact + SLA
Co-branded portal option
Annual or multi-year contract

Detailed Specifications

Eight layers, side by side.

Compute, memory, storage, network, security, sovereignty, cost. Every layer of an AI platform measured against its real-world counterpart.

Layer	SARAH AI Suite (NVIDIA Dual DGX B300 Servers)	OpenClaw / Hermes on a Public-Cloud VPS
Edge / DC architecture	On-prem SARAH Spark 2 Router handles the voice path locally · Dual DGX B300 Servers in our DC handles inference · up to 400 GE between them · audio never leaves the premise	Everything on one rented GPU in someone else's region · every stage contends for the same VRAM slot
GPU silicon	72× NVIDIA Blackwell Ultra · GB300 full rack · Light Matter chips & switches · LLM-only workload	1× shared instance GPU · whatever the cloud vendor schedules you
VRAM (total)	20 TB HBM3e · single coherent pool	16–80 GB on the instance · ends at the box boundary
VRAM (per call)	3 GB dedicated · isolated to that conversation · zero contention	No per-call allocation · whatever the runtime scrapes from a shared pool
Memory bandwidth	576 TB/s aggregate	~2–3 TB/s peak per GPU · degrades under noisy-neighbour load
Model storage	Local NVMe · ~670 GB Deep Thinker + ~244 GB Doer · loaded once, served forever	Cloud block storage or HuggingFace pull at boot · re-downloaded on instance restart
Per-call working memory	128K-token context window held in dedicated VRAM for the life of the call	Context window survives only as long as the shared GPU lets it
Backbone network	Up to 400 GE from SARAH Spark 2 Router to Dual DGX B300 Servers · Private Enterprise IP Network · physical fibre interconnect	Shared cloud-vendor fabric · TCP over the open internet for anything external
Public-internet exposure	None. The platform is unreachable from the open web by design.	Public IPs · open ports · part of the cloud-vendor's blast radius
External-vendor reach	Direct peering with Google Cloud, AWS, Azure, Cloudflare · private interconnect, no public hop	Public-internet egress to every service, even same-cloud APIs unless you build VPC peering yourself
Inference latency	Sub-400 ms first-word · streaming TTS · parallel sentence synthesis	Variable: cold-start + queue + cloud-network hops + shared GPU contention
Tenant model	Single-tenant · the silicon is physically yours	Multi-tenant · your conversation shares hardware with arbitrary strangers
Data sovereignty	100% on your premises (or our PEIPN) · data never crosses borders unless you say so	Vendor terms govern what they do with your prompts and outputs
Cost model	Buy once, own forever · zero per-token meter · zero per-block charge	Per-token, per-second-GPU, per-egress-GB · the meter never stops
Vendor lock-in	None. The hardware and the software are yours; open-source LLMs fine-tuned in-house.	Cloud vendor + framework vendor + occasional model vendor — three locks per workflow
Failure domain	A single rack you can see · 394 restore points · 200 kW EMG off-grid power	A region in someone else's data centre. Their outage is your outage.
Compliance posture	SOC 2 / ISO 27001 / GDPR / CCPA / HIPAA / PCI DSS · examiner-ready audit trail	Inherits cloud-vendor SOC 2 + your own scaffolding · audit trail you have to build

The Network Layer

Up to 400 GE connectivity to our Data Centre · via the SARAH Spark 2 Router · through to the Dual DGX B300 Servers.

Up to 400 GE backhaul between the on-prem SARAH Spark 2 Router and our Data Centre. Only the prompt and response text traverse the long-haul link — your audio never leaves your premise. No Public-Internet hop. No shared pipe.

Direct peering with the major hyperscalers

SARAH AI Suite's Private Enterprise IP Network terminates directly into the four interconnect fabrics that run most of the world's cloud workloads. When SARAH needs to read a Google Sheet, post to an S3 bucket, hit an Azure Cognitive endpoint, or push through Cloudflare — none of those packets touch the open internet. They ride a private cross-connect.

GCPGoogle CloudDirect peering

AWSAmazon Web ServicesDirect peering

AZMicrosoft AzureDirect peering

CFCloudflareDirect peering

4 TB/E

Layer-2 fibre backbone

10 GE

Edge minimum

Hops through the open internet

1 VLAN

Per client site · zero exposure to other tenants

Every client site runs in its own VLAN on the PEIPN. The physical fibre is shared with our other clients, but the Layer-2 boundary is yours alone — no broadcast, no ARP visibility, no inter-tenant traffic ever lands on your interface. Your private network ends at your premises, full stop.

The OpenClaw / Hermes VPS comparison: a public IP, a TCP egress over a shared cloud fabric, a Public-Internet hop to every external dependency, and a full attack surface that the public web can probe at will. Same workload. Two universes of risk.

The Cost Reality

The meter is the point.

An open-source agent framework on a rented GPU is "free" the way a treadmill at a gym is free — you pay for everything attached to it. SARAH AI Suite does not have a meter to attach.

Cost item	SARAH AI Suite	OpenClaw / Hermes on a VPS
GPU instance time	Included · the silicon is yours	Per-second meter · 24/7 to keep the agent warm
Token throughput	No per-token meter · run it as hard as the silicon will go	Per-token bill if you use a hosted LLM behind the framework
Egress bandwidth	Direct peering · effectively flat-rate inside the PEIPN	Per-GB egress meter to every external destination
Storage I/O	Local NVMe · no IOPS bill	Per-GB-month + per-IOPS on cloud block storage
Idle cost	Zero. Idle silicon is silicon you already own.	The VPS is billing the moment you spin it up — even at 3am with nobody calling
Year-3 cost trajectory	Maintenance only ($300K/yr Enterprise · $3M/yr Hyperscale)	Same line items, same meters, three more years of inflation

OpenClaw and Hermes are good open-source agent frameworks. Run on a public-cloud VPS, they will get you a demo. They will not get you an enterprise. Once the conversation matters, the architecture decides everything — and a sovereign, GB300-class platform on a 4 TB/E private fibre network is a different category of system than a multi-tenant agent on a rented GPU.

200×

More memory bandwidth (GB300 vs A10)

3 GB

Dedicated VRAM per call · zero contention

Public-internet hops in the call path

SARAH Code. Coding by voice, by chat, by intent.

SARAH AI Suite on NVIDIA Dual DGX B300 Servers
vs. OpenClaw / Hermes on a Public-Cloud VPS

The Two Architectures

SARAH AI Suite

OpenClaw / Hermes on a VPS

Voice. Portal. WhatsApp. Telegram.

By voice

In the portal

On WhatsApp

On Telegram

One repo per customer. Forever.

Persistent state

Isolated

Portable

Doing it the hard way.

Per-seat, per-month.

Pro

Enterprise

Eight layers, side by side.

Up to 400 GE connectivity to our Data Centre · via the SARAH Spark 2 Router · through to the Dual DGX B300 Servers.

Direct peering with the major hyperscalers

The meter is the point.

Open a SARAH Code workspace.

SARAH AI Suite on NVIDIA Dual DGX B300 Serversvs. OpenClaw / Hermes on a Public-Cloud VPS

The Two Architectures

SARAH AI Suite

OpenClaw / Hermes on a VPS

Voice. Portal. WhatsApp. Telegram.

By voice

In the portal

On WhatsApp

On Telegram

One repo per customer. Forever.

Persistent state

Isolated

Portable

Doing it the hard way.

Per-seat, per-month.

Pro

Enterprise

Eight layers, side by side.

Up to 400 GE connectivity to our Data Centre · via the SARAH Spark 2 Router · through to the Dual DGX B300 Servers.

Direct peering with the major hyperscalers

The meter is the point.

Open a SARAH Code workspace.

SARAH AI Suite on NVIDIA Dual DGX B300 Servers
vs. OpenClaw / Hermes on a Public-Cloud VPS