Private ChatGPT for Your Team in a Day: Open WebUI + Ollama + LDAP Auth

Your team uses ChatGPT. Some people are pasting client contracts into it. Someone else dumped the company’s database schema in there to get SQL help. Legal has no idea this is happening, and neither does your CISO — yet.

This is the real reason to self-host your own AI stack. Not because it’s cheaper (though it is, once the hardware pays itself off). It’s because your data never leaves your network, you control which models run, and you can wire it straight into your existing directory service so access is tied to employment status like any other internal tool.

This guide gets you from zero to a production-hardened Open WebUI + Ollama deployment with LDAP/Active Directory authentication in a single working day. No cloud dependencies, no per-seat SaaS billing, no mystery about where your prompts end up.

Official repos you’ll need:

Ollama: https://github.com/ollama/ollama
Open WebUI: https://github.com/open-webui/open-webui

What You’re Building

The stack is straightforward:

Ollama — handles model downloads, inference, and the OpenAI-compatible API. Runs entirely on your hardware.
Open WebUI — the ChatGPT-like frontend. Conversation history, user management, model switching, RAG pipeline. Connects to Ollama over the internal Docker network.
Nginx — TLS termination and reverse proxy. Keeps Ollama’s API off the internet entirely.
Your existing LDAP/AD — Open WebUI talks to it directly at login time. No separate SSO service needed.

Ollama is never exposed externally. Open WebUI is the only surface reachable from the browser. Nginx enforces TLS and rate-limits the API endpoint.

Hardware Requirements

You need to be honest about this before buying anything. The GPU is the bottleneck, not the CPU.

Model size	VRAM needed	Minimum RAM	Comfortable RAM
7B (Mistral, Llama 3)	6–8 GB	16 GB	16 GB
13B (Llama 3.1 13B)	10–12 GB	32 GB	32 GB
32B (Qwen 2.5 32B)	20–24 GB	64 GB	64 GB
70B (Llama 3.3 70B)	40–48 GB	96 GB	128 GB

CPU-only inference works, but anything above 7B becomes genuinely painful for users. A single NVIDIA RTX 3090/4090 covers 7B and 13B models comfortably for a small team of 10–20 people. For larger teams or bigger models, look at used A100s or multi-GPU setups.

Software prerequisites:

Docker Engine 24+ and Docker Compose v2
NVIDIA Container Toolkit if you’re using a GPU (nvidia-container-toolkit)
A domain with a valid TLS cert (Let’s Encrypt is fine)
LDAP/AD server reachable from the Docker host

Project Layout

ai-stack/
├── docker-compose.yml
├── .env
├── nginx/
│   ├── nginx.conf
│   └── ssl/
│       ├── fullchain.pem
│       └── privkey.pem
└── data/
    ├── ollama/
    └── openwebui/

Create it:

mkdir -p ai-stack/{nginx/ssl,data/ollama,data/openwebui}
cd ai-stack

The Docker Compose File

# docker-compose.yml
services:

  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    # Remove the deploy block entirely if you have no GPU
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    volumes:
      - ./data/ollama:/root/.ollama
    networks:
      - ai-internal
    # Ollama listens on 11434 but we DO NOT expose this port externally.
    # Open WebUI reaches it via the internal Docker network only.

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    depends_on:
      - ollama
    volumes:
      - ./data/openwebui:/app/backend/data
    networks:
      - ai-internal
    environment:
      # Point at Ollama over the internal network
      - OLLAMA_BASE_URL=http://ollama:11434

      # LDAP configuration
      - ENABLE_LDAP=${LDAP_ENABLED}
      - LDAP_SERVER_HOST=${LDAP_HOST}
      - LDAP_SERVER_PORT=${LDAP_PORT}
      - LDAP_USE_TLS=${LDAP_USE_TLS}
      - LDAP_CA_CERT_FILE=${LDAP_CA_CERT}
      - LDAP_ATTRIBUTE_FOR_USERNAME=${LDAP_UID_ATTR}
      - LDAP_APP_DN=${LDAP_BIND_DN}
      - LDAP_APP_PASSWORD=${LDAP_BIND_PASSWORD}
      - LDAP_SEARCH_BASE=${LDAP_SEARCH_BASE}
      - LDAP_SEARCH_FILTERS=${LDAP_SEARCH_FILTER}

      # Security
      - WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
      - WEBUI_URL=https://${DOMAIN}

      # Disable open registration — LDAP users only
      - ENABLE_SIGNUP=false

      # Optional: default models for new users
      - DEFAULT_MODELS=${DEFAULT_MODELS}

  nginx:
    image: nginx:stable-alpine
    container_name: nginx
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/ssl:/etc/nginx/ssl:ro
    depends_on:
      - open-webui
    networks:
      - ai-internal

networks:
  ai-internal:
    driver: bridge
    # This network is internal — nothing in here is reachable from outside
    # except via the nginx container's published ports

The .env File

# .env — keep this out of git (add to .gitignore immediately)

DOMAIN=ai.yourcompany.com

# Generate with: openssl rand -hex 32
WEBUI_SECRET_KEY=replace_with_random_64_char_hex

# LDAP / Active Directory
LDAP_ENABLED=true
LDAP_HOST=ldap.yourcompany.com
LDAP_PORT=636
LDAP_USE_TLS=true
# Path inside the container if you mount a custom CA cert, otherwise leave empty
LDAP_CA_CERT=

# Bind account — use a read-only service account, never a domain admin
LDAP_BIND_DN=cn=svc-openwebui,ou=ServiceAccounts,dc=yourcompany,dc=com
LDAP_BIND_PASSWORD=your_service_account_password

LDAP_SEARCH_BASE=ou=Users,dc=yourcompany,dc=com
# Filter to an AD security group for granular access control
LDAP_SEARCH_FILTER=(memberOf=CN=AI-Users,ou=Groups,dc=yourcompany,dc=com)

# The LDAP attribute that becomes the username in Open WebUI
# Use 'uid' for OpenLDAP, 'sAMAccountName' for Active Directory
LDAP_UID_ATTR=sAMAccountName

DEFAULT_MODELS=llama3.2:latest

Gotcha — service account permissions: Never bind as a domain admin. Create a dedicated read-only service account with permissions scoped to search the user OU. If that account’s password leaks, the blast radius is a search query, not a domain compromise.

Nginx Configuration

# nginx/nginx.conf
events {
    worker_connections 1024;
}

http {
    # Rate limiting: 10 req/s per IP, burst up to 20
    limit_req_zone $binary_remote_addr zone=webui:10m rate=10r/s;

    # Redirect all HTTP to HTTPS
    server {
        listen 80;
        server_name ai.yourcompany.com;
        return 301 https://$host$request_uri;
    }

    server {
        listen 443 ssl;
        server_name ai.yourcompany.com;

        ssl_certificate     /etc/nginx/ssl/fullchain.pem;
        ssl_certificate_key /etc/nginx/ssl/privkey.pem;

        # Modern TLS only — drop anything below 1.2
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384;
        ssl_prefer_server_ciphers off;
        ssl_session_cache shared:SSL:10m;

        # Security headers
        add_header Strict-Transport-Security "max-age=63072000; includeSubDomains" always;
        add_header X-Content-Type-Options nosniff;
        add_header X-Frame-Options SAMEORIGIN;
        add_header Referrer-Policy strict-origin-when-cross-origin;

        # Proxy to Open WebUI
        location / {
            limit_req zone=webui burst=20 nodelay;

            proxy_pass http://open-webui:8080;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            # Required for Open WebUI's streaming responses
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";

            # Long timeout for slow model inference
            proxy_read_timeout 300s;
            proxy_send_timeout 300s;
        }
    }
}

Gotcha — streaming timeout: The default Nginx proxy timeout is 60 seconds. A 70B model generating a long response can easily blow past that, and the user gets a broken connection mid-stream. Set proxy_read_timeout to at least 300s. For very large models on slower hardware, go higher.

First Boot and Model Setup

Bring the stack up:

docker compose up -d
docker compose logs -f open-webui

Wait until you see Application startup complete in the Open WebUI logs.

Pull your first model. Connect to the Ollama container directly:

# Pull Llama 3.2 3B — fast, fits in 4GB VRAM, good for testing
docker exec -it ollama ollama pull llama3.2

# Pull the 8B variant for better quality
docker exec -it ollama ollama pull llama3.1:8b

# Verify what's available
docker exec -it ollama ollama list

Model files land in ./data/ollama/models. On a fast connection, a 7B model is roughly 4GB and downloads in a few minutes.

LDAP Wiring — What Actually Happens

Open WebUI does not cache LDAP credentials. Every login attempt binds to your directory using the service account, searches for the user, and if found, verifies the password against the directory directly. This means:

Account disable in AD takes effect immediately on next login
No stale session tokens if you fire someone
The LDAP_SEARCH_FILTER is your access control list — scope it to an AD group and only members of that group can log in

For Active Directory, your filter will look like:

(memberOf=CN=AI-Users,OU=Groups,DC=yourcompany,DC=com)

For OpenLDAP with posixGroup membership:

(|(memberUid=%s)(uid=%s))

The %s is replaced with the username at query time by Open WebUI.

Gotcha — nested group membership in AD: AD’s memberOf attribute is not recursive by default at the LDAP level. If your user is in AI-Users via a nested group, the filter will fail. Use the memberOf:1.2.840.113556.1.4.1941:= LDAP_MATCHING_RULE_IN_CHAIN syntax to handle nesting: (memberOf:1.2.840.113556.1.4.1941:=CN=AI-Users,OU=Groups,DC=yourcompany,DC=com).

First Admin Account

On very first startup, before LDAP is active, navigate to https://ai.yourcompany.com and create the admin account manually. This becomes the local fallback admin — keep the credentials in your password manager. After that, set ENABLE_SIGNUP=false in your .env and restart the container. All future logins go through LDAP.

The admin account can:

Manage which models are visible to which users/groups
Set per-user or per-group rate limits and context window sizes
Enable or disable features like web search, image generation, document RAG

GPU Passthrough — Common Pitfalls

If you have an NVIDIA GPU, install the toolkit before bringing the stack up:

# Debian/Ubuntu
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify the container sees the GPU:

docker exec -it ollama nvidia-smi

Gotcha — GPU not visible after toolkit install: If nvidia-smi works on the host but fails inside the container, you likely forgot to restart the Docker daemon after nvidia-ctk runtime configure. Also double-check that your Docker Compose version supports the deploy.resources.reservations.devices syntax — anything below Compose v2 silently ignores it.

Gotcha — model loads into RAM instead of VRAM: Ollama logs [cpu] next to layer offloading when it can’t fit the model in VRAM. This isn’t an error, but inference will be 10–20x slower. Check docker exec -it ollama ollama ps to see how many layers are GPU-offloaded vs. CPU-offloaded.

Persistent Storage and Backups

The only two directories you need to back up:

./data/ollama/     — model weights (large, but re-downloadable if needed)
./data/openwebui/  — conversation history, user accounts, settings (irreplaceable)

The OpenWebUI data directory contains a SQLite database (webui.db). Back it up with:

# Safe backup while container is running — SQLite WAL mode handles this fine
sqlite3 ./data/openwebui/webui.db ".backup '/backup/webui-$(date +%Y%m%d).db'"

Add that to a cron job. Daily is enough for most teams.

Model weights can be excluded from frequent backups — they’re large and re-downloadable. Just keep a note of which models you’re running (docker exec ollama ollama list).

Gotchas: The List

LDAP over TLS with self-signed certs. If your internal LDAP uses a certificate from your own CA, you need to mount that CA cert into the Open WebUI container and set LDAP_CA_CERT to its path. Without it, the TLS handshake fails silently and users just see "Invalid credentials" with nothing useful in the logs.

# In docker-compose.yml, under open-webui volumes:
- ./certs/internal-ca.pem:/certs/internal-ca.pem:ro

# In .env:
LDAP_CA_CERT=/certs/internal-ca.pem

Ollama exposed on 0.0.0.0 by default. If you ever expose port 11434 in Docker Compose (maybe for debugging), it’s reachable from anywhere on the host’s network with zero authentication. Never publish that port in production. Ollama has no auth layer.

Open WebUI container restarts reset in-memory state, not DB state. Conversations and users persist in the SQLite DB in the mounted volume. But if you upgrade the image and the schema changes, you might need to run a migration. Check the Open WebUI release notes before pulling main in production.

Model context windows and RAM. Loading a 13B model with a 128K context window requires significantly more VRAM than the same model at 4K context. Open WebUI lets you set the context length per-session. Train your users to keep it reasonable or you’ll start seeing OOM errors on the GPU.

Production Hardening Checklist

Nginx rate limiting configured (done above)
HSTS header with includeSubDomains
Ollama port NOT published in Docker Compose
ENABLE_SIGNUP=false after admin account creation
LDAP service account is read-only, scoped to user OU
LDAP connection uses TLS (port 636, not 389)
.env file has 600 permissions, excluded from git
WEBUI_SECRET_KEY is random 64+ character hex
Daily backup of ./data/openwebui/webui.db
Log rotation configured for Docker container logs
Firewall: only ports 80 and 443 open externally

For log rotation, add this to /etc/docker/daemon.json:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "50m",
    "max-file": "5"
  }
}

Restart Docker after changing daemon config.

Day-Two Operations

Adding a new model: docker exec -it ollama ollama pull modelname. It appears in Open WebUI immediately. No restart needed.

Restricting model access by user role: Open WebUI has a role system — users can be Admins, regular Users, or pending. You can expose specific models only to certain roles from the admin panel under Workspace → Models.

Monitoring inference load: docker exec -it ollama ollama ps shows currently loaded models and their GPU/CPU layer split. For proper metrics, Ollama exposes a /api/version and model-level stats — wire it to Prometheus with the community exporter if you care about utilization dashboards.

Upgrading: Pull the new image, bring down the stack, bring it back up. The SQLite DB handles schema migrations automatically on startup. Watch the logs on first boot after an upgrade.

docker compose pull
docker compose down
docker compose up -d
docker compose logs -f open-webui

This stack handles a team of 20–50 comfortably on a single well-specced server. Your data stays on your hardware, access is tied to your existing directory, and you’re not dependent on any third-party uptime or pricing decisions. The whole thing costs whatever your server hardware costs, which after the first year is essentially nothing compared to per-seat SaaS pricing at scale.