With Privatemode, you get on-prem like privacy combined with cloud like convenience and flexibility.
Privatemode is the first AI API service to offer true end-to-end privacy using hardware-based confidential computing. Simply run the Privatemode Encryption Proxy, which provides an OpenAI-compatible API while seamlessly encrypting your data.
Your data remains encrypted at all times—only the AI can process it within its secure, confidential computing environment.
docker run -p 8080:8080 \
ghcr.io/edgelesssys/continuum/continuum-proxy:latest \
--apiKey <your-api-token>
The proxy verifies the integrity of the Privatemode service using confindential computing-based remote attestation. The proxy also encrypts all data before sending and decrypts data it receives.
curl localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4",
"messages": [
{
"role": "system",
"content": "Hello Privatemode!"
}
]
}'
import requests
import json
def run_continuum(url, port, prompt):
endpoint = f"http://{url}:{port}/v1/chat/completions"
payload = {
"model": "ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4",
"messages": [{"role": "system", "content": prompt}],
}
response = requests.post(endpoint, headers={"Content-Type": "application/json"}, json=payload)
response.raise_for_status() # Raise an error for bad responses
return response.json()
if __name__ == "__main__":
try:
response = run_continuum("localhost", 8080, "Hello Privatemode")
print(response['choices'][0]['message']['content'])
except Exception as e:
print(f"Error: {e}")
import fetch from "node-fetch";
async function runPrivatemode(url, port, prompt) {
const endpoint = `http://${url}:${port}/v1/chat/completions`;
const payload = {
model: "ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4",
messages: [{ role: "system", content: prompt }],
};
try {
const response = await fetch(endpoint, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(payload),
});
if (!response.ok) {
throw new Error(`Error ${response.status}: ${response.statusText}`);
}
return await response.json();
} catch (error) {
throw new Error(`Request failed: ${error.message}`);
}
}
// Example usage
(async () => {
try {
const response = await runPrivatemode("localhost", 8080, "Hello Privatemode");
console.log(response.choices[0].message.content);
} catch (error) {
console.log(`Error: ${error.message}`);
}
})();
The proxy is compatible with the OpenAI Chat API.
You now have programmatic access to an end-to-end secure AI. Process your own sensitive data or provide trustworthy services to others—the possibilities are endless.
Privatemode enforces hardware-based runtime encryption and remote attestation at every stage, leveraging AMD SEV-SNP and Nvidia H100 confidential-computing features.
Access Llama 3.3 70B (quantized) or other high-performance models like DeepSeek R1 (coming soon).
Process > 1,000 tokens per second with consistent low-latency responses.
Seamlessly switch from OpenAI to our compatible chat API.
Monitor token usage in real-time and pay only for what you use.
With Privatemode, you get a dynamically scalable API that offers the same confidentiality as an on-premise solution – minus the infrastructure costs and complexity.
Privatemode provides a turn-key technical solution to your internal and external discussions on data privacy, data security, and compliance in connection with AI.
With Privatemode, you can focus on building your product and instead of building out your own AI model hosting capabilities.
With Privatemode, you can even process your sensitive data with AI.