Finetuning RWKV 14bn with QLORA in 4Bit

It was surprisingly easy to get this working, and I think that's a good thing. First I looked at existing LORA implementations of RWKV which I discovered from the very helpful RWKV Discord. The link I found in the discord landed me at "How to Train Your Raven", shout out…

Trying to steer LLM output towards correctness using MIPS

What if you could find the candidate token that when combined with the previous outputs would produce the smallest cosine distance to the input prompt for each step of the generation, with the idea that this would align it more closely with the input prompt and prevent the model from going off task.…