AI Planet has introduced Buddhi-128K-Chat-7B, an open-source chat model distinguished by its expansive 128,000-token context window. This advancement enables the model to process and retain extensive contextual information, enhancing its performance in tasks requiring deep context understanding.
Source: Internet
Model Architecture
Buddhi-128K-Chat-7B is fine-tuned from the Mistral-7B Instruct v0.2 base model, selected for its superior reasoning capabilities. The Mistral-7B architecture incorporates features such as Grouped-Query Attention and a Byte-fallback BPE tokenizer, originally supporting a maximum of 32,768 position embeddings. To extend this to 128K, the Yet another Rope Extension (YaRN) technique was employed, modifying positional embeddings to accommodate the increased context length.
Read on →