Aoban Paper

A Technical Overview & Release of Aoban 2.3-50M-HeavyL

Abstract

This paper presents Aoban 2.3-50M-HeavyL, a transformer-based language model developed as part of the Aoban AI research initiative. The model focuses on high-speed processing of casual language, internet slang, and general conversational text while maintaining a compact parameter footprint. Aoban 2.3-50M-HeavyL was trained on a diverse dataset of texts from Discord and Hand-Written conversations.

Aoban 2.3-50M-HeavyL is designed to prioritize adaptability, expressive generation, and real-time interaction rather than strict factual reasoning. By leveraging a moderately deep transformer architecture with optimized attention mechanisms, the model aims to balance performance, efficiency, and creative flexibility.

Model Release

We officially announce the release of Aoban 2.3-50M-HeavyL, a 50-million-parameter Heavy Layer (HeavyL) model intended for experimental deployment, research, and creative systems. This version expands upon earlier Aoban releases by significantly improving contextual persistence, slang comprehension, and stylistic variation.

Aoban 2.3-50M-HeavyL is an experimental research model. Outputs may be inconsistent, hallucinatory, or stylistically unstable.

Architecture

Aoban 2.3-50M-HeavyL is built upon the Transformer architecture introduced in “Attention Is All You Need”. The model relies entirely on self-attention mechanisms, allowing it to capture long-range dependencies without recurrence or convolution.

The architecture consists of 10 transformer layers, each configured with 8 self-attention heads and a 512-dimensional hidden representation. This design enables parallel processing of tokens and efficient utilization of attention bandwidth across different semantic subspaces.

The designation HeavyL reflects the model’s emphasis on denser internal representations per layer rather than extreme depth. This approach favors fast inference and expressive internal states over very deep stacking.

Tokenization

The model uses Byte Pair Encoding (BPE) for tokenization. BPE allows Aoban 2.3-50M-HeavyL to flexibly represent slang, abbreviations, creative spellings, emojis, and mixed-language input without requiring an excessively large vocabulary.

This tokenization strategy is particularly effective for informal and internet-based text, where strict word boundaries are often inconsistent or intentionally violated.

Training Philosophy

Aoban 2.3-50M-HeavyL was trained with a focus on research rather than task-specific optimization. The goal was not to maximize benchmark performance, but to study how LLMs react with specific training data.

As a result, the model exhibits strong creative tendencies and rapid response generation, but may underperform on structured reasoning or deterministic tasks.

Capabilities & Limitations

The model excels at handling casual dialogue, slang-heavy inputs, and general message processing at high speed. However, it may struggle with simple conversational grounding tasks such as greetings, intent clarification, or strict instruction following.

For these reasons, lighter Aoban models (such as Aoban 2.1) may be better suited for structured interaction pipelines, while HeavyL is intended for experimentation and expressive output.

Conclusion

Aoban 2.3-50M-HeavyL represents a key milestone in the evolution of the Aoban AI ecosystem. It serves as a research platform for exploring expressive transformer behavior within a constrained parameter budget and provides insight into tradeoffs between speed, depth, and conversational flexibility.