|
|
|
|
Welcome to Ecne Project!
Chatbot revamp in-progress
by Dundell2 on 2024-12-28
I am in the process of revamping the chatbot interface.
This includes:
- Fixing Memory issues (1/2 done): Optimizing context memory to remove leaks causing OOM issues with massive prompts/chat history
- Document uploading(1/4th done): Creating a process to upload a document and parse text into context.
- Voice activation(1/2 done): new feature for a hands-free option with a assignable trigger phrase.
- Speech to Speech: Remaking my voice feature into a simpler, more sustainable process.
- Ogma's dual Models: Finally bring back Ogma under QwQ+Coder just like Ecne, but 12 t/s with potentially 32k context.
Chaining LLM calls +
by Dundell2 on 2024-12-11
Implementing Chain LLM Calls, fixing up a new deployment of QwQ or Coder 32B as an IQ3 for testing, Checking out options for 14B mini model for EcneMini deployment, and switching to fully Apache 2.0 licensed models 100% instead of mixing with research-only licenses.
Chain LLM's consist of simple Architect + Coder design, for a finished developers process from a simple request. This might be both implemented in Ogma and Ecne as QwQ Architect + Qwen 2.5 32B Coder Instruct as the code provider. This has shown to provide an overall 5% boost in quality output, but also a full detail plan and instructions for the user to follow. (This does include the downside of taking twice as long.) This option in the startin phase will be a simple "Chain" button to activate.
Ogma Upgrade (QwQ)
by Dundell2 on 2024-11-30
Ogma has been upgraded to QwQ 32B model. This is the newest Qwen Apache 2.0. This model is experimental, but top tier thought processing. Currently set to Ogma's model for testing and combining its reasoning with Ecne's Coding capabilities. Also upgraded Ogma's backend and added Speculative Decoding, which has provided an overall 45% faster inference speeds. This has also added Q8 context, doubling Ogma's window to 32k tokens.
Please note, due to Ogma's average speed and very limited batched request pool, he will be limited to select users for development. In-process is a rework of a 3rd speedy and high batched job model that will be integrated into gameplay officially.
Speculative Decoding inbound
by Dundell2 on 2024-11-26
Implemented on Ecne, and soon to be implemented on Ogma. This process uses a smaller "Draft" Model to speed inference request. Tests from others have shown an increase in 50~60% into Ogma's design. Overall provided a x2 speed boost in Ecne's inference speed. Still needing to do a full Quality checks to make sure this is an acceptable feature.
|
|
|
|
|