Text Generation WebUI
Gradio web UI for running large language models like LLaMA, GPT-J, and others locally
Overview
Text Generation WebUI, developed by oobabooga, is a powerful Gradio-based web interface that democratizes access to large language models by allowing users to run sophisticated AI models like LLaMA, GPT-J, Alpaca, and Vicuna locally on their own hardware. Originally created as an open-source alternative to expensive cloud-based AI services, this tool has become the go-to solution for researchers, developers, and AI enthusiasts who need full control over their language model deployments without the privacy concerns or usage limitations of commercial APIs. The WebUI supports multiple model formats including GGUF, GPTQ, AWQ, and EXL2, making it compatible with quantized models that can run efficiently on consumer-grade GPUs. This Docker configuration leverages NVIDIA GPU acceleration to provide optimal performance for inference tasks while maintaining the flexibility to experiment with different models and parameters. The stack combines the WebUI's intuitive interface with REST API endpoints, enabling both interactive chat sessions and programmatic access for integration with external applications. Organizations and individual users benefit from this setup by gaining complete ownership of their AI infrastructure, eliminating per-token costs, ensuring data privacy, and having the freedom to fine-tune models with custom datasets. This combination is particularly valuable for teams developing AI-powered applications, researchers conducting experiments with language models, and privacy-conscious users who need to process sensitive text data without sending it to third-party services.
Key Features
- Multi-format model support including GGUF, GPTQ, AWQ, and EXL2 quantized models for efficient GPU memory usage
- Dual interface design with Gradio web UI for interactive use and REST API endpoints for programmatic integration
- Advanced sampling parameters control including temperature, top-p, top-k, and repetition penalty adjustments
- Character card system for roleplay scenarios with persistent conversation contexts and custom personas
- LoRA (Low-Rank Adaptation) support for loading and switching between fine-tuned model variants
- Extension system allowing custom plugins for specialized text processing and generation tasks
- GPU memory optimization with automatic model sharding across multiple NVIDIA GPUs
- Preset management for saving and sharing optimal generation parameters for specific use cases
Common Use Cases
- 1AI research labs running experiments with different language models and fine-tuning approaches
- 2Software development teams integrating local LLM capabilities into applications without API dependencies
- 3Content creators and writers using AI assistance for brainstorming, editing, and creative writing projects
- 4Educational institutions teaching AI concepts while maintaining data privacy and controlling costs
- 5Healthcare organizations processing sensitive patient data with HIPAA compliance requirements
- 6Legal firms analyzing documents and generating legal content without exposing confidential information
- 7Game developers creating dynamic NPC dialogue systems and procedural narrative content
Prerequisites
- NVIDIA GPU with at least 8GB VRAM for running 7B parameter models, 16GB+ recommended for larger models
- Docker and Docker Compose with NVIDIA Container Toolkit installed and configured
- CUDA 11.8 or newer drivers compatible with your GPU hardware
- At least 32GB system RAM for loading large models and handling inference operations
- 50GB+ free disk space for model storage, as popular models range from 4GB to 30GB each
- Understanding of language model concepts and familiarity with HuggingFace model repository structure
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 text-generation-webui: 3 image: ghcr.io/oobabooga/text-generation-webui:latest4 container_name: text-generation-webui5 restart: unless-stopped6 ports: 7 - "${WEBUI_PORT:-7860}:7860"8 - "${API_PORT:-5000}:5000"9 - "${API_STREAM_PORT:-5005}:5005"10 volumes: 11 - ./models:/app/models12 - ./loras:/app/loras13 - ./prompts:/app/prompts14 - ./presets:/app/presets15 - ./characters:/app/characters16 - ./extensions:/app/extensions17 environment: 18 - NVIDIA_VISIBLE_DEVICES=all19 - CLI_ARGS=${CLI_ARGS:---listen --api}20 deploy: 21 resources: 22 reservations: 23 devices: 24 - driver: nvidia25 count: all26 capabilities: [gpu].env Template
.env
1# Text Generation WebUI Configuration2WEBUI_PORT=78603API_PORT=50004API_STREAM_PORT=500556# CLI arguments7# --listen: Listen on all interfaces8# --api: Enable API9# --extensions: Load extensions10CLI_ARGS=--listen --apiUsage Notes
- 1Requires NVIDIA GPU with CUDA support
- 2WebUI at http://localhost:7860
- 3API endpoint at http://localhost:5000
- 4Download models to ./models folder (HuggingFace)
- 5Supports GGUF, GPTQ, AWQ, EXL2 formats
- 6Character cards for roleplay in ./characters
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 text-generation-webui:5 image: ghcr.io/oobabooga/text-generation-webui:latest6 container_name: text-generation-webui7 restart: unless-stopped8 ports:9 - "${WEBUI_PORT:-7860}:7860"10 - "${API_PORT:-5000}:5000"11 - "${API_STREAM_PORT:-5005}:5005"12 volumes:13 - ./models:/app/models14 - ./loras:/app/loras15 - ./prompts:/app/prompts16 - ./presets:/app/presets17 - ./characters:/app/characters18 - ./extensions:/app/extensions19 environment:20 - NVIDIA_VISIBLE_DEVICES=all21 - CLI_ARGS=${CLI_ARGS:---listen --api}22 deploy:23 resources:24 reservations:25 devices:26 - driver: nvidia27 count: all28 capabilities: [gpu]29EOF3031# 2. Create the .env file32cat > .env << 'EOF'33# Text Generation WebUI Configuration34WEBUI_PORT=786035API_PORT=500036API_STREAM_PORT=50053738# CLI arguments39# --listen: Listen on all interfaces40# --api: Enable API41# --extensions: Load extensions42CLI_ARGS=--listen --api43EOF4445# 3. Start the services46docker compose up -d4748# 4. View logs49docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/text-generation-webui/run | bashTroubleshooting
- CUDA out of memory errors: Reduce model size, enable CPU offloading in settings, or use more aggressive quantization like 4-bit GPTQ models
- Models fail to load with 'unsupported format' error: Ensure model files are in supported formats (GGUF, safetensors) and not corrupted during download
- WebUI shows 'No CUDA devices available': Verify nvidia-docker2 installation and restart Docker daemon, check NVIDIA_VISIBLE_DEVICES environment variable
- API requests timeout or return empty responses: Adjust the CLI_ARGS to include proper API flags and increase timeout values in client applications
- Character cards not appearing in interface: Ensure JSON character files are properly formatted and placed in the mounted characters directory with correct permissions
- Extensions not loading properly: Check extension compatibility with current WebUI version and verify Python dependencies are installed in container
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Components
oobabooga/text-generation-webui
Tags
#ai#llm#text-generation#llama#gpt#gpu
Category
AI & Machine LearningAd Space
Shortcuts: C CopyF FavoriteD Download