Pricing · platform
Your private AI environment, running 10× faster. Truly sovereign.
Coding assistant and inference endpoints with the latest open-source models —GLM 5.1, Qwen 3.6, Llama 3.3, DeepSeek V3.5— on dedicated NVIDIA B200 GPUs in Madrid. Your code and prompts never leave the perimeter.
10×
Faster than a MacBook M4 Max on the same model
3.2×
Faster than an RTX 6000 Ada workstation
95 ms
Time to first token (2k prompt)
3-5
Concurrent developers per slice
How the slice works
Your slice is yours. By hardware. All the time.
We use NVIDIA Multi-Instance GPU (MIG): the B200 is physically partitioned into isolated instances. Each slice has its own compute, HBM3e memory, cache, and bandwidth. You don't compete with anyone for cycles. Your 1/4 is always your 1/4, even when the rest of the GPU is maxed out.
- Hardware isolation (not time-slicing, not virtualization): SMs, memory, and cache are physically separated between slices.
- Guaranteed bandwidth: your share of HBM3e doesn't slow down if other customers saturate their slice.
- Reserved 24/7 with a monthly contract, or on-demand by the hour when you hit traffic peaks.
Each slice = isolated SMs + HBM3e + L2 cache + NVDEC/NVENC · no noisy neighbor
Real speed
Same models — only the place they run changes.
Tokens per second, single-user inference, Llama 3.3 70B and Qwen 3.6 Coder 32B. The gap isn't subtle — and it decides whether a coding assistant feels instant or frustrating.
Sources: NVIDIA MLPerf Inference v4.1 · Blackwell whitepaper · vLLM · Apple MLX · LocalLLaMA. Conservative numbers.
MacBook Pro M4 Max
128 GB unified · MLX · Q4
RTX 6000 Ada
48 GB · AWQ-4bit · workstation
1/4 B200 · GPU Solutions
MIG · 48 GB HBM3e · native FP8
LLM inference is memory-bandwidth bound, not FLOPS-bound. HBM3e delivers ~2× the bandwidth of RTX 6000 Ada's GDDR6 and ~4× the M4 Max's unified memory — that's why a B200 slice beats both on the same models. Large models (72B+, MoE) don't fit on workstations without quality loss. On B200 they fit at native FP8 precision.
Why a dedicated slice
Your AI, inside your perimeter. No exceptions.
With a public API, your prompts train the next model and your data crosses three continents before returning. With a dedicated slice in Madrid, nothing leaves. Same model, isolated environment, compliance by design — and on top, 10× faster.
What happens in your slice, stays in your slice
Privacy, compliance and sovereignty built in. Not add-ons.
Data in Spain, 100%
Prompts, embeddings and responses never leave Madrid. Zero CLOUD Act exposure, zero US sub-processors, zero international transfers for Legal to sign.
Private model and context
Your B200 slice is yours with MIG hardware isolation. Your inputs don't train the next model, and your throughput doesn't depend on the tenant next door. Nobody else touches your weights.
ISO 27001 + ENS Media included
Your auditor gets the certificates directly. Your CISO closes due diligence without expanding the SoA. No extra audits, no ambiguous DPAs.
Dedicated endpoint, not shared
Private HTTPS with mTLS + VPN, only reachable from your IPs. No enforced rate limits, no inference queues. The latency is yours, 24/7.
InfiniBand co-location
Your pod, your storage and your tokens live in the same rack, wired over InfiniBand. Fewer hops, lower latency, zero cross-region egress. Your multi-step agent doesn't choke on the network.
The analogy
Madrid → New York is the same 5,750 km. By ship or by plane.
By ship
5,750 km
10 days
By plane
5,750 km
7 hours
Nobody pays for kilometers. You pay to arrive on time.
Same thing in AI
One million Llama 3.3 70B tokens. Depending on where it runs.
MacBook M4 Max · 12 t/s
1M tokens
23 hours
RTX 6000 Ada · 35 t/s
1M tokens
8 hours
1/4 B200 at GPU Solutions · 115 t/s
1M tokens
2.4 hours
Same work done. One tenth the time your team spends waiting.
And time pays for itself too
Operational savings are a side effect. They still cover the slice 5× over.
Team
10 devs
× 80 €/h
Idle time
30 min/day
× 220 workdays
Annual cost lost
88,000 €
1100 h/year idle
Annual 1/4 slice
14,280 €
1/4 slice reserved
Return on time
+ 73,720 €/year
6× the sliceThe real reason to switch is sovereignty and compliance. Recovered time is the bonus that wins over Finance.
Your data, your model, your latency. And your team stops waiting, too.
Mix them
Three modes. You build the combo.
Reserve a slice for your own model. Add hourly bursts when traffic spikes. And pull Token Factory tokens for a big model when you don't want to manage the GPU. All in the same cluster, all sovereign, each line billed separately — no surprises.
€/month · dedicated GPU
Fixed monthly fee for a 24/7 MIG slice. The GPU is yours: start and stop whenever without losing the assignment. Ideal for dev teams and stable production.
Best for stable production
€/hour · pay as you use
Spin up a slice or a full GPU and pay hourly until you shut it down. No commitment, no reservation needed. Available immediately via dashboard or API.
Best for spikes and POCs
€/1M tokens · Token Factory
Pay only for the tokens the model generates. No GPU management. Call the private HTTPS endpoint from your app. Perfect for variable-scale production inference.
Best for product inference
GPU Compute with MIG
From 1/4 to full cluster. Always dedicated.
Three MIG slice sizes (1/4, 1/2, full GPU), plus the HGX 8× cluster for training and enterprise workloads. Same API, same per-slice latency, scale from prototype to production without migration.
01 / Slice
1/4
B200
Coding assistant for 3-5 devs · light fine-tuning · models up to 70B with large context. The entry point.
Reserved
1.190 €/month
On-demand
1,95 €/hour
02 / Half
1/2
B200
Real production for 8-12 devs · 70B inference at native FP8 precision · training of small-to-mid models.
Reserved
2.290 €/month
On-demand
3,95 €/hour
03 / Full B200
1 ×
B200
72B models at FP8 full precision · high-throughput inference for teams of 15+ devs · distributed training.
Reserved
5.990 €/month
On-demand
7,90 €/hour
04 / HGX Cluster
8 ×B200
8× B200 with intra-node NVLink 5 and inter-node InfiniBand NDR · foundation model training · inference at scale · dedicated enterprise compliance.
Token Factory
The latest open-source models. Served fast.
We charge a bit more per million tokens. In return, your prompts and context never leave Madrid — and tokens are generated in the same cluster where your pod lives, wired over InfiniBand. More sovereignty, and because they sit right next to you, more speed.
Prices in euros per million tokens, pay-as-you-go, public list for retail volume. Speed in tokens/second single-user on a 1/4 B200 slice; 1/2 and full scale proportionally. High volume or your own fine-tuned model on a dedicated slice? We deploy on a private endpoint at a negotiated rate — ask us.
Where your code lives
Three places. One gives you all of them.
There's no always-right option. There's one that combines speed, privacy, and capacity — and two that force a trade-off.
Local, on your machine
Maximum physical privacy — nothing leaves the device — but bounded by RAM and bandwidth. Big models don't fit or run slowly. Your laptop is unusable during inference.
Gains privacy · loses speed and capacity
Third-party API
Fast, with powerful models, but every prompt travels to someone else's servers, with variable retention policies and jurisdiction that shifts per provider. Internal compliance will cost you hours.
Gains speed · loses privacy
Dedicated cluster in Madrid
B200 cluster speed with HBM3e, latest-gen models at native precision, VM-level isolation. Prompts and code process here. 100% Spanish data residency, ISO 27001 and ENS Media certified.
Speed · privacy · capacity
All plans include
Tailored proposal
Every use case is different. Tell us what you want to do and we'll send you a concrete proposal in under 24 hours.