Most Keycloak tutorials show you a single-node docker run command and call it a day. That works fine for a Saturday afternoon demo. The moment you point real users at it, you find out that Keycloak’s distributed session cache has opinions, your LDAP tree doesn’t map cleanly to its user model, and the default login page looks like it was designed in 2009. This article covers the whole path from that toy setup to something you’d actually trust with SSO for 50,000 users.
Official repo: https://github.com/keycloak/keycloak
The Architecture You’re Aiming For
Before touching a config file, commit the mental model: Keycloak nodes are stateless application servers. All session and token state lives in an embedded Infinispan cache that can replicate across cluster members. The database (Postgres, ideally) holds realm config, user attributes from local storage, and persistent sessions. Your load balancer sits in front and must be sticky only for initial login redirects — not for the full session lifetime.
Get this wrong and you end up with intermittent 401s that reproduce only under load.
Database First, Always
Keycloak will happily start against H2. H2 is a trap. One restart and your realm config is gone.
Use Postgres. Pin the schema migration to a specific Keycloak version. Never let two cluster nodes run migrations simultaneously — use KC_DB_MIGRATE=false on all but one node during upgrades, or use an init container to run the migration job.
# postgres.yml — run this before the Keycloak stack
services:
postgres:
image: postgres:16-alpine
environment:
POSTGRES_DB: keycloak
POSTGRES_USER: keycloak
POSTGRES_PASSWORD: ${KC_DB_PASSWORD}
volumes:
- pg_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U keycloak"]
interval: 10s
timeout: 5s
retries: 5
networks:
- kc_internal
volumes:
pg_data:
networks:
kc_internal:
external: true
Clustered Keycloak with Docker Compose
The following compose file runs two Keycloak nodes. Infinispan cluster discovery uses JGROUPS_DISCOVERY_PROTOCOL=JDBC_PING — it writes member records to the Postgres database. This is reliable, requires zero multicast, and works across Docker networks, VMs, and cloud VPCs without special networking.
# keycloak-cluster.yml
services:
kc1:
image: quay.io/keycloak/keycloak:25.0.2
command: start --optimized
environment:
# Database
KC_DB: postgres
KC_DB_URL: jdbc:postgresql://postgres:5432/keycloak
KC_DB_USERNAME: keycloak
KC_DB_PASSWORD: ${KC_DB_PASSWORD}
# Clustering — JDBC_PING uses DB for member discovery, no multicast needed
KC_CACHE: ispn
KC_CACHE_STACK: jdbc-ping
JGROUPS_DISCOVERY_PROTOCOL: JDBC_PING
JGROUPS_DISCOVERY_PROPERTIES: >
datasource_jndi_name=java:jboss/datasources/KeycloakDS,
initialize_sql="CREATE TABLE IF NOT EXISTS JGROUPSPING
(own_addr VARCHAR(200) NOT NULL, cluster_name VARCHAR(200) NOT NULL,
ping_data BYTEA, constraint PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name))"
# Hostname — must be the public-facing URL, not the container hostname
KC_HOSTNAME: https://auth.example.com
KC_HOSTNAME_STRICT: "true"
KC_PROXY: edge # TLS terminated at the load balancer
# Admin credentials (first boot only; change via UI afterward)
KEYCLOAK_ADMIN: ${KC_ADMIN_USER}
KEYCLOAK_ADMIN_PASSWORD: ${KC_ADMIN_PASSWORD}
# Logging
KC_LOG_LEVEL: INFO
KC_LOG: console
# Performance tuning
JAVA_OPTS_APPEND: >
-Xms512m -Xmx1024m
-XX:+UseG1GC
-Djava.net.preferIPv4Stack=true
volumes:
- ./themes:/opt/keycloak/themes # custom themes
- ./providers:/opt/keycloak/providers # custom SPIs / JARs
networks:
- kc_internal
- kc_public
depends_on:
postgres:
condition: service_healthy
restart: unless-stopped
kc2:
# Identical to kc1 — Keycloak nodes are symmetric
extends:
service: kc1
nginx:
image: nginx:1.27-alpine
volumes:
- ./nginx/keycloak.conf:/etc/nginx/conf.d/default.conf:ro
- /etc/letsencrypt:/etc/letsencrypt:ro
ports:
- "443:443"
- "80:80"
networks:
- kc_public
depends_on:
- kc1
- kc2
restart: unless-stopped
networks:
kc_internal:
external: true
kc_public:
driver: bridge
Nginx upstream config:
# nginx/keycloak.conf
upstream keycloak {
# ip_hash for initial login redirect stickiness only
ip_hash;
server kc1:8080 fail_timeout=10s max_fails=3;
server kc2:8080 fail_timeout=10s max_fails=3;
}
server {
listen 443 ssl http2;
server_name auth.example.com;
ssl_certificate /etc/letsencrypt/live/auth.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/auth.example.com/privkey.pem;
# Forward real client IP for Keycloak's audit logs
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header Host $host;
location / {
proxy_pass http://keycloak;
proxy_connect_timeout 10s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
}
server {
listen 80;
server_name auth.example.com;
return 301 https://$host$request_uri;
}
Gotcha:
KC_HOSTNAMEmust exactly match the URL your users hit, including the scheme. If it doesn’t, Keycloak will issue tokens with the wrongissclaim and every app will reject them immediately. Also,KC_PROXY: edgetells Keycloak to trustX-Forwarded-*headers — without this, it sees all traffic as plain HTTP and breaks redirects.
Custom Themes
The default Keycloak UI screams "enterprise Java 2015." Your users shouldn’t have to suffer for it. Keycloak’s theme system is layered: you inherit from base or keycloak, override only what you need.
Theme structure for a login page:
themes/
└── my-company/
├── login/
│ ├── theme.properties # declares parent, imports
│ ├── resources/
│ │ ├── css/
│ │ │ └── login.css
│ │ └── img/
│ │ └── logo.png
│ └── templates/
│ └── login.ftl # overrides only the login form
└── account/
└── theme.properties # use keycloak.v3 for the new account console
themes/my-company/login/theme.properties:
parent=keycloak
import=common/keycloak
# Replace the default stylesheet, keep everything else from parent
styles=css/login.css
# Variables available in Freemarker templates
kcBodyClass=my-company-login
themes/my-company/login/templates/login.ftl — start by copying from the Keycloak source, then gut the parts you don’t need. The critical Freemarker variables:
<#-- login.ftl — stripped down to essentials -->
<#import "template.ftl" as layout>
<@layout.registrationLayout displayMessage=!messagesPerField.existsError('username','password') displayInfo=realm.password && realm.registrationAllowed && !registrationDisabled??; section>
<#if section = "header">
<img src="${url.resourcesPath}/img/logo.png" alt="${realm.displayName}" class="logo"/>
</#if>
<#if section = "form">
<form action="${url.loginAction}" method="post">
<input type="hidden" id="id-hidden-input" name="credentialId"
<#if auth.selectedCredential?has_content>value="${auth.selectedCredential}"</#if>/>
<div class="form-group">
<label for="username">${msg("usernameOrEmail")}</label>
<input id="username" name="username" type="text" autofocus autocomplete="username"
value="${(login.username!'')?html}"
class="<#if messagesPerField.existsError('username','password')>error</#if>"/>
</div>
<div class="form-group">
<label for="password">${msg("password")}</label>
<input id="password" name="password" type="password" autocomplete="current-password"
class="<#if messagesPerField.existsError('username','password')>error</#if>"/>
</div>
<input type="submit" value="${msg("doLogIn")}" class="btn-primary"/>
</form>
</#if>
</@layout.registrationLayout>
Assign the theme to your realm in Realm Settings → Themes → Login Theme.
Gotcha: Keycloak caches themes aggressively. During development, set
KC_SPI_THEME_STATIC_MAX_AGE=-1andKC_SPI_THEME_CACHE_THEMES=false. Forgetting to revert this in production is a performance tax you’ll pay on every page render.
User Federation with LDAP / Active Directory
"Just point it at LDAP" is how every federation story starts. Reality: your LDAP tree has structural quirks, group membership lives in memberOf attributes that need special handling, and service account permissions are rarely what Keycloak needs.
The cleanest path is a dedicated read-only bind account in LDAP with access scoped to exactly the OUs Keycloak needs. Never use a domain admin for this.
Key LDAP federation settings (via Admin UI or kcadm.sh):
# Create the LDAP federation using kcadm — reproducible in IaC
kcadm.sh create components -r your-realm \
-s name="corporate-ldap" \
-s providerId=ldap \
-s providerType=org.keycloak.storage.UserStorageProvider \
-s 'config.vendor=["ad"]' \
-s 'config.connectionUrl=["ldap://dc1.corp.example.com:389"]' \
-s 'config.bindDn=["CN=keycloak-svc,OU=ServiceAccounts,DC=corp,DC=example,DC=com"]' \
-s "config.bindCredential=[\"${LDAP_BIND_PASSWORD}\"]" \
-s 'config.usersDn=["OU=Users,DC=corp,DC=example,DC=com"]' \
-s 'config.userObjectClasses=["person,organizationalPerson,user"]' \
-s 'config.searchScope=["2"]' \
-s 'config.useTruststoreSpi=["ldapsOnly"]' \
-s 'config.connectionTimeout=["5000"]' \
-s 'config.readTimeout=["10000"]' \
-s 'config.pagination=["true"]' \
-s 'config.batchSizeForSync=["500"]' \
-s 'config.fullSyncPeriod=["604800"]' \
-s 'config.changedSyncPeriod=["3600"]' \
-s 'config.importEnabled=["true"]' \
-s 'config.syncRegistrations=["false"]'
Group mapper — map LDAP memberOf to Keycloak groups:
kcadm.sh create components -r your-realm \
-s name="group-mapper" \
-s providerId=group-ldap-mapper \
-s providerType=org.keycloak.storage.ldap.mappers.LDAPStorageMapper \
-s parentId=<ldap-component-id> \
-s 'config.groups.dn=["OU=Groups,DC=corp,DC=example,DC=com"]' \
-s 'config.group.object.classes=["group"]' \
-s 'config.membership.attribute.type=["DN"]' \
-s 'config.membership.ldap.attribute=["member"]' \
-s 'config.membership.user.ldap.attribute=["distinguishedName"]' \
-s 'config.mode=["READ_ONLY"]' \
-s 'config.user.roles.retrieve.strategy=["LOAD_GROUPS_BY_MEMBER_ATTRIBUTE_RECURSIVELY"]' \
-s 'config.drop.non.existing.groups.during.sync=["true"]'
Gotcha: AD’s
memberattribute only stores direct members by default. Recursive group membership (LOAD_GROUPS_BY_MEMBER_ATTRIBUTE_RECURSIVELY) hits AD hard on large directories. Either enable AD’smemberOfrange retrieval or pre-flatten groups in an auxiliary OU. Also: if your AD uses referrals, setconfig.referral=["follow"]— otherwise Keycloak silently drops users from child domains.
For social login (Google, GitHub, etc.), add identity providers under Identity Providers in the realm. The only non-obvious part is the First Login Flow setting — by default it asks users to confirm their email even if it matches an existing local account. Override this with a custom first-broker-login flow if you want transparent account linking.
Custom Authentication Flows
This is where Keycloak’s power really shows — and where most people give up. Authentication flows are a directed graph of authenticators. You can add MFA, risk-based auth, custom challenges, or completely replace the login sequence.
Real use case: require TOTP only for users in a specific group, skip it for internal IPs.
- Go to Authentication → Flows → browser → Copy (never edit the built-in flows, they cannot be reset without a full realm reset).
- Name it
browser-conditional-otp. - In the copied flow, find the
Browser - Conditional OTPsub-flow. Set it toCONDITIONAL. - Add two conditions:
Condition - User Configured— REQUIRED (ensures TOTP is only required if the user has it set up)Condition - User Role— REQUIRED, configure it to checkrealm-role:requires-2fa
- Bind this flow to the realm’s Browser Flow.
For conditional IP-based bypass, you need a custom authenticator SPI. The scaffold:
// src/main/java/com/example/keycloak/ConditionalIpAuthenticator.java
package com.example.keycloak;
import org.keycloak.authentication.AuthenticationFlowContext;
import org.keycloak.authentication.Authenticator;
import org.keycloak.models.*;
public class ConditionalIpAuthenticator implements Authenticator {
private static final String TRUSTED_CIDR_CONFIG = "trusted-cidr";
@Override
public void authenticate(AuthenticationFlowContext ctx) {
String clientIp = ctx.getConnection().getRemoteAddr();
String trustedCidr = ctx.getAuthenticatorConfig()
.getConfig().getOrDefault(TRUSTED_CIDR_CONFIG, "");
if (!trustedCidr.isBlank() && isInCidr(clientIp, trustedCidr)) {
// Skip MFA for trusted networks — set a session note for downstream checks
ctx.getAuthenticationSession().setAuthNote("ip-trusted", "true");
ctx.success();
} else {
ctx.attempted(); // Let the flow continue to the next authenticator
}
}
@Override public void action(AuthenticationFlowContext ctx) {}
@Override public boolean requiresUser() { return false; }
@Override public boolean configuredFor(KeycloakSession s, RealmModel r, UserModel u) { return true; }
@Override public void setRequiredActions(KeycloakSession s, RealmModel r, UserModel u) {}
@Override public void close() {}
private boolean isInCidr(String ip, String cidr) {
// Real implementation: use Apache Commons Net SubnetUtils or similar
// Never roll your own CIDR math in production
return new org.apache.commons.net.util.SubnetUtils(cidr)
.getInfo().isInRange(ip);
}
}
Build it as a JAR with the standard Keycloak SPI service file at META-INF/services/org.keycloak.authentication.AuthenticatorFactory, drop it into providers/, and restart the containers. The factory implementation wires it into the UI — see the Keycloak SPI docs for the boilerplate.
Gotcha: Custom authenticator JARs must be compiled against the same Keycloak version you’re running. The internal APIs shift between major versions. Pin
keycloak-coreandkeycloak-server-spiin yourpom.xmlto match. Also: after adding a new JAR, Keycloak needs a--optimizedrebuild if you’re using the quarkus build. Rundocker exec kc1 /opt/keycloak/bin/kc.sh buildor rebuild your image.
Production Hardening Checklist
Session limits: Set SSO Session Max and SSO Session Idle in Realm Settings to sane values. The defaults (10 hours idle, 10 hours max) are far too long for anything consumer-facing.
Brute force protection: Enable it. Set a lockout policy. Configure maxLoginFailures: 5, waitIncrements: 60, maxWait: 900. Do this in kcadm.sh so it’s in your IaC:
kcadm.sh update realms/your-realm \
-s bruteForceProtected=true \
-s failureFactor=5 \
-s waitIncrementSeconds=60 \
-s maxFailureWaitSeconds=900 \
-s minimumQuickLoginWaitSeconds=60
Token lifetimes: Access tokens should be short (5 minutes for APIs, 15 for web apps). Refresh tokens can be longer. Never issue access tokens that live for hours — that defeats the purpose of revocation.
Audit logging: Keycloak emits login events, but they’re discarded by default after 30 days. Export them to a log aggregator (Loki, Elasticsearch) via the KC_LOG JSON format. Set KC_LOG_CONSOLE_FORMAT=json and ship stdout.
Metrics: Enable the metrics endpoint with KC_METRICS_ENABLED=true. Scrape /metrics with Prometheus. The critical metrics are keycloak_logins_total, keycloak_failed_login_attempts_total, and Infinispan cache hit rates. A sudden spike in login failures before a corresponding spike in lockouts means someone is using distributed credentials and evading per-IP limits.
Gotcha: The
/metricsand/healthendpoints are on port 9000 by default in Keycloak 23+. Don’t expose this port through your public load balancer. Bind it to an internal monitoring network only. I’ve seen installations that exposed admin metrics to the public internet and then wondered why their Grafana dashboard was public knowledge.
Realm Export and GitOps
Manual realm configuration through the UI doesn’t scale across environments. Export your realm:
# Export to a single file (excludes user passwords, includes everything else)
docker exec kc1 /opt/keycloak/bin/kc.sh export \
--dir /tmp/realm-export \
--realm your-realm \
--users realm_file
docker cp kc1:/tmp/realm-export/your-realm-realm.json ./realm-exports/
Commit this JSON to git. On deploy, import it:
# Import on first boot via env var
KC_IMPORT=/opt/keycloak/data/import/your-realm-realm.json
Mount the export file into the container and set KC_IMPORT. Keycloak will skip the import if the realm already exists — no idempotency problems.
The real value here is diffing realm changes in PRs. Clients, scopes, federation configs, flow assignments — all reviewable in git. No more "who changed the token lifetime and when."
Where Things Will Break
Two failure modes that hit almost everyone:
Clock skew. Keycloak signs tokens with a timestamp. If your application server’s clock is more than 30 seconds off from the Keycloak server’s clock, token validation fails with Token is not active. Run chronyc tracking on every node. Use the same NTP source everywhere. This sounds obvious until it’s 3am and you’re staring at 401s from a cloud instance that drifted after a live migration.
Infinispan split-brain. If network partitivity between cluster nodes causes the cache to split, users will experience intermittent session loss. The JDBC_PING discovery helps but doesn’t prevent this entirely. Add a readiness probe that checks /health/ready and remove split-brain nodes from the load balancer rotation automatically. Kubernetes handles this well; Docker Compose needs a separate health-check script or an external orchestrator.
Both of these are "works fine in staging, fails in production" problems because staging usually runs single-node and on a single host. Test your HA setup under actual load before go-live.
Keycloak has a reputation for being complex, and honestly it earns that reputation. But once it’s running properly — clustered, themed, federated, with flows that match your actual security requirements — it’s remarkably solid. The configuration surface is large because identity is genuinely complex. The answer isn’t to reach for a SaaS IdP that hides the complexity; it’s to understand each layer and configure it deliberately.