Pricing · platform

Your private AI environment, running 10× faster. Truly sovereign.

Coding assistant and inference endpoints with the latest open-source models —GLM 5.1, Qwen 3.6, Llama 3.3, DeepSeek V3.5— on dedicated NVIDIA B200 GPUs in Madrid. Your code and prompts never leave the perimeter.

10×

Faster than a MacBook M4 Max on the same model

3.2×

Faster than an RTX 6000 Ada workstation

95 ms

Time to first token (2k prompt)

3-5

Concurrent developers per slice

How the slice works

Your slice is yours. By hardware. All the time.

We use NVIDIA Multi-Instance GPU (MIG): the B200 is physically partitioned into isolated instances. Each slice has its own compute, HBM3e memory, cache, and bandwidth. You don't compete with anyone for cycles. Your 1/4 is always your 1/4, even when the rest of the GPU is maxed out.

Hardware isolation (not time-slicing, not virtualization): SMs, memory, and cache are physically separated between slices.
Guaranteed bandwidth: your share of HBM3e doesn't slow down if other customers saturate their slice.
Reserved 24/7 with a monthly contract, or on-demand by the hour when you hit traffic peaks.

NVIDIA B200 · MIG

192 GB HBM3e

1/4

48 GB

1/4

48 GB

1/4

48 GB

1/4

48 GB

Dedicated compute · memory · cacheBandwidth per slice · ~2 TB/s

Each slice = isolated SMs + HBM3e + L2 cache + NVDEC/NVENC · no noisy neighbor

Real speed

Same models — only the place they run changes.

Tokens per second, single-user inference, Llama 3.3 70B and Qwen 3.6 Coder 32B. The gap isn't subtle — and it decides whether a coding assistant feels instant or frustrating.

Sources: NVIDIA MLPerf Inference v4.1 · Blackwell whitepaper · vLLM · Apple MLX · LocalLLaMA. Conservative numbers.

swipe to see the full table

Metric

MacBook Pro M4 Max

128 GB unified · MLX · Q4

RTX 6000 Ada

48 GB · AWQ-4bit · workstation

1/4 B200 · GPU Solutions

MIG · 48 GB HBM3e · native FP8

Available memory

≈ 96 GB usable

48 GB GDDR6

48 GB HBM3e

Memory bandwidth

546 GB/s

960 GB/s

≈ 2 TB/s

Peak compute

34 TFLOPS FP16

365 TFLOPS FP8

1,1 PFLOPS FP8

Llama 3.3 70B

12 tok/s

36 tok/s

115 tok/s

Qwen 3.6 Coder 32B

48 tok/s

88 tok/s

320 tok/s

GLM 5.1 235B · MoE

22 tok/s

62 tok/s

205 tok/s

TTFT · 2k prompt

820 ms

450 ms

95 ms

Concurrent devs

1-2

3-5

Context

Senior engineer laptop

Workstation ~€8,500

From €750/month · no CapEx

LLM inference is memory-bandwidth bound, not FLOPS-bound. HBM3e delivers ~2× the bandwidth of RTX 6000 Ada's GDDR6 and ~4× the M4 Max's unified memory — that's why a B200 slice beats both on the same models. Large models (72B+, MoE) don't fit on workstations without quality loss. On B200 they fit at native FP8 precision.

Why a dedicated slice

Your AI, inside your perimeter. No exceptions.

With a public API, your prompts train the next model and your data crosses three continents before returning. With a dedicated slice in Madrid, nothing leaves. Same model, isolated environment, compliance by design — and on top, 10× faster.

What happens in your slice, stays in your slice

Privacy, compliance and sovereignty built in. Not add-ons.

Data in Spain, 100%

Prompts, embeddings and responses never leave Madrid. Zero CLOUD Act exposure, zero US sub-processors, zero international transfers for Legal to sign.

Private model and context

Your B200 slice is yours with MIG hardware isolation. Your inputs don't train the next model, and your throughput doesn't depend on the tenant next door. Nobody else touches your weights.

ISO 27001 + ENS Media included

Your auditor gets the certificates directly. Your CISO closes due diligence without expanding the SoA. No extra audits, no ambiguous DPAs.

Dedicated endpoint, not shared

Private HTTPS with mTLS + VPN, only reachable from your IPs. No enforced rate limits, no inference queues. The latency is yours, 24/7.

InfiniBand co-location

Your pod, your storage and your tokens live in the same rack, wired over InfiniBand. Fewer hops, lower latency, zero cross-region egress. Your multi-step agent doesn't choke on the network.

The analogy

Madrid → New York is the same 5,750 km. By ship or by plane.

By ship

5,750 km

10 days

By plane

5,750 km

7 hours

Nobody pays for kilometers. You pay to arrive on time.

Same thing in AI

One million Llama 3.3 70B tokens. Depending on where it runs.

MacBook M4 Max · 12 t/s

1M tokens

23 hours

RTX 6000 Ada · 35 t/s

1M tokens

8 hours

1/4 B200 at GPU Solutions · 115 t/s

1M tokens

2.4 hours

Same work done. One tenth the time your team spends waiting.

And time pays for itself too

Operational savings are a side effect. They still cover the slice 5× over.

Team

10 devs

× 80 €/h

Idle time

30 min/day

× 220 workdays

Annual cost lost

88,000 €

1100 h/year idle

Annual 1/4 slice

14,280 €

1/4 slice reserved

Return on time

+ 73,720 €/year

6× the slice

The real reason to switch is sovereignty and compliance. Recovered time is the bonus that wins over Finance.

Your data, your model, your latency. And your team stops waiting, too.

Mix them

Three modes. You build the combo.

Reserve a slice for your own model. Add hourly bursts when traffic spikes. And pull Token Factory tokens for a big model when you don't want to manage the GPU. All in the same cluster, all sovereign, each line billed separately — no surprises.

01 / Reserved€/month

€/month · dedicated GPU

Fixed monthly fee for a 24/7 MIG slice. The GPU is yours: start and stop whenever without losing the assignment. Ideal for dev teams and stable production.

Best for stable production

02 / On-demand€/hour

€/hour · pay as you use

Spin up a slice or a full GPU and pay hourly until you shut it down. No commitment, no reservation needed. Available immediately via dashboard or API.

Best for spikes and POCs

03 / Endpoints€/1M tokens

€/1M tokens · Token Factory

Pay only for the tokens the model generates. No GPU management. Call the private HTTPS endpoint from your app. Perfect for variable-scale production inference.

Best for product inference

GPU Compute with MIG

From 1/4 to full cluster. Always dedicated.

Three MIG slice sizes (1/4, 1/2, full GPU), plus the HGX 8× cluster for training and enterprise workloads. Same API, same per-slice latency, scale from prototype to production without migration.

01 / Slice

1/4

B200

Memory48 GB HBM3e

Bandwidth≈ 2 TB/s

Coding assistant for 3-5 devs · light fine-tuning · models up to 70B with large context. The entry point.

Reserved

1.190 €/month

On-demand

1,95 €/hour

Get started →

02 / Half

1/2

B200

The latest open-source models. Served fast.

We charge a bit more per million tokens. In return, your prompts and context never leave Madrid — and tokens are generated in the same cluster where your pod lives, wired over InfiniBand. More sovereignty, and because they sit right next to you, more speed.

ModelParamsContextInput / 1MOutput / 1MSpeed (1/4 B200)

GLMGLM 5.1New

235B · MoE200k€0,90€2,40180 t/s

QwenQwen 3.6

72B256k€0,70€1,80140 t/s

QwenQwen 3.6 CoderCoding

32B256k€0,40€1,10320 t/s

QwenQwen 3.6Fast

14B128k€0,20€0,55540 t/s

MetaLlama 3.3

70B128k€0,60€1,60115 t/s

DeepSeekDeepSeek V3.5Fast

236B · MoE128k€0,45€1,20220 t/s

MistralMistral Large 3

123B128k€0,85€2,2095 t/s

Prices in euros per million tokens, pay-as-you-go, public list for retail volume. Speed in tokens/second single-user on a 1/4 B200 slice; 1/2 and full scale proportionally. High volume or your own fine-tuned model on a dedicated slice? We deploy on a private endpoint at a negotiated rate — ask us.

Where your code lives

Three places. One gives you all of them.

There's no always-right option. There's one that combines speed, privacy, and capacity — and two that force a trade-off.

01 / On your laptop

Local, on your machine

Maximum physical privacy — nothing leaves the device — but bounded by RAM and bandwidth. Big models don't fit or run slowly. Your laptop is unusable during inference.

Speed15

Privacy70

Model capacity20

Gains privacy · loses speed and capacity

02 / Public API

Third-party API

Fast, with powerful models, but every prompt travels to someone else's servers, with variable retention policies and jurisdiction that shifts per provider. Internal compliance will cost you hours.

Speed80

Privacy15

Model capacity85

Gains speed · loses privacy

03 / Your slice at GPU SolutionsBalanced

Dedicated cluster in Madrid

B200 cluster speed with HBM3e, latest-gen models at native precision, VM-level isolation. Prompts and code process here. 100% Spanish data residency, ISO 27001 and ENS Media certified.

Speed95

Privacy100

Model capacity100

Speed · privacy · capacity

All plans include

ISO 27001 + ENS Media

100% data in Spain

VM-level isolation

Encrypted storage

Support in Spanish and English

No vendor lock-in

Tailored proposal

Every use case is different. Tell us what you want to do and we'll send you a concrete proposal in under 24 hours.

Request proposal →