NEWS BLOG

Development of an LLM system at AMCON

Integrating artificial intelligence (AI) into business processes is no longer a luxury — it’s a necessity. At AMCON, we are dedicated to developing efficient and secure AI systems to optimize our internal workflows. A key focus lies in the development of a local Large Language Model (LLM) that operates with Retrieval-Augmented Generation (RAG) and ReACT agents. Data privacy is a top priority: all data remains on local servers and is never transmitted to the internet.

AI Terminology and Key Concepts

What are LLMs?

Large Language Models (LLMs) such as GPT-4, Llama 3.2, or Gemini 2.0 are AI systems designed to process and generate natural language. They can analyze text, create summaries, or even generate code. These models are based on transformer architectures and use mechanisms like attention to make context-aware predictions.

Embedding & RAG

Embedding models convert text into numerical vectors, enabling AI systems to detect relationships between words, sentences, or entire documents. RAG (Retrieval-Augmented Generation) enhances LLMs by incorporating external knowledge sources. This means that the model doesn’t rely solely on its pre-trained data but can retrieve relevant information from internal databases or documents. RAG works in two phases:

The retrieval phase, which identifies relevant documents.
The generation phase, where the AI formulates an informed response based on the retrieved content.

ReACT-Agenten

ReACT (Reasoning + Acting) is an approach that allows AI agents to do more than just respond to prompts. They are capable of complex reasoning and action execution. For example, they can intelligently evaluate data from tools like Jira or Confluence. By combining reinforcement learning with rule-based decision-making, these agents can autonomously prioritize requests and delegate tasks. They have access to a range of tools which they use as needed — such as querying APIs, searching internal data sources, or retrieving external information.

Use Case: Local LLM at AMCON

Technological Implementation

The system is built on several key components:

Data & Retrieval
Documents and content from Jira and Confluence are systematically processed and prepared for the LLM. This is achieved using a combination of regular expressions for text extraction and transformer-based models for semantic analysis.

Embedding & RAG
Documents are transformed into vector representations, enabling the model to access relevant information effectively. For this, we use the multilingual-e5-large embedding model.

Backend & Model Access
An internal API connects various applications with the AI system. We use FastAPI to build the API interface and ONNXfor efficient model inference.
The ReACT agent can communicate with the APIs of Confluence and Jira to retrieve relevant information on demand. This is facilitated through the Atlassian Python API, which is tailored for both Jira and Confluence.

Frontend
A user-friendly web interface for seamless interaction with the AI model is currently under development.

Technologies in Use

- Vector storage & retrieval: ChromaDB
- Language models: Llama 3.2 and Gemma
- Frameworks: LangChain and LangGraph orchestrate complex interactions between agents, data sources, and retrieval components.

Development Challenges

Data Privacy and Security

Since all data is stored locally, we had to ensure that the system never accesses external servers. Robust access controls are essential so that only authorized users can view sensitive information. We implement this using OAuth 2.0 and role-based access control (RBAC).

Model Optimization

Running an LLM locally requires powerful hardware resources. We optimized the model to strike a balance between performance and efficiency. Techniques such as quantization (e.g. via Hugging Face’s bitsandbytes) and knowledge distillation were used to reduce memory and compute requirements.

Integration with Existing Systems

Seamlessly integrating with Jira and Confluence posed a challenge due to diverse data formats. Developing specialized retrievers played a critical role. By combining keyword-based and semantic search methods, we ensure highly efficient and precise information retrieval.

Outlook: The Future of AI at AMCON

In the coming years, we will focus on continuously improving and expanding the system:

Inference Optimization

Accelerating response times and improving precision through specialized model architectures and more efficient processing pipelines.

Expanded Agent Functionality

Development and implementation of new AI agents tailored to specific business tasks, including:

Code Improvement Agent:
Analyzes existing code, identifies optimization opportunities, and suggests more efficient solutions.
Onboarding Agent:
Assists new employees by providing relevant documentation and company policies during their onboarding process.
Project Management Assistant:
Supports task coordination, resource allocation, and tracking of project progress.
Templating Agent:
Automates the creation of templates and standard documents to boost operational efficiency.

Next-Generation Language Models

Integration of newer and more powerful LLMs, such as future versions of Llama, DeepSeek, and domain-specific embedding models.

Adaptive Tool Use by Agents

Agents will be equipped with a flexible toolchain, allowing them to dynamically select and apply tools such as API queries, semantic search, or external data retrieval, depending on the task.

Enhanced Retrieval Methods

Implementation of hybrid search techniques that combine semantic embedding with classical indexing for more accurate and context-aware information retrieval.

Scalability and Infrastructure

Evaluation and adoption of new hardware and software solutions to scale the AI architecture, including decentralized processing and specialized AI accelerators.

Improved Integration with Business Processes

Ensuring seamless integration of the AI system into existing workflows and applications to further increase operational efficiency.

Conclusion

With the development of a local LLM system, AMCON has taken a significant step toward AI-driven process optimization. Through the use of RAG and ReACT agents, company knowledge can be leveraged more effectively — without compromising data privacy.
The future of AI at AMCON lies in the continuous advancement of these technologies to create maximum value for employees. At the same time, we remain closely aligned with proven DevOps and MLOps strategies to streamline the development cycle and ensure continuous improvement.