Memory in LLM - Search News

Nvidia says it can shrink LLM memory 20x without changing model weights

Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory costs and time-to-first-token by up to 8x for multi-turn AI applications.

TMCnet

Breaking the 100M Token Limit: EverMind's MSA Architecture Achieves Efficient End-to-End Long-Term Memory for LLMs

This approach can be viewed as a memory plug-in for large models, providing a fresh perspective and direction for solving the ...

14d

Google PM open-sources Always On Memory Agent, ditching vector databases for LLM-driven persistent memory

Enterprise AI teams are moving beyond single-turn assistants and into systems expected to remember preferences, preserve ...

Morningstar

EverMemOS Redefines Efficiency in AI Memory, Surpassing LLM Full-Context Perfomances with Far Fewer Tokens in Open Evaluation

The evaluation framework was developed to address a critical bottleneck in the AI industry: the absence of consistent, transparent methods to measure memory quality. Today's agents rely on a ...

WFXG

Nota AI Reduces Memory Usage of Upstage's Solar LLM by 72%, Demonstrating Proprietary Quantization Technology

Nota AI, an AI optimization technology company behind the Nota AI brand, announced that it has developed a next-generation quantization technology that significantly compresses the size of Solar, a ...

InfoWorld

Unlocking LLM superpowers: How PagedAttention helps the memory maze

Large language models (LLMs) like GPT and PaLM are transforming how we work and interact, powering everything from programming assistants to universal chatbots. But here’s the catch: running these ...

Semiconductor Engineering

HW-based Heterogeneous Memory Management for LLM Inferencing (KAIST, Stanford Unversity)

A new technical paper titled “Hardware-based Heterogeneous Memory Management for Large Language Model Inference” was published by researchers at KAIST and Stanford University. “A large language model ...

SDxCentral

SK Telecom to solve AI memory blight with Hynix in 2026

South Korean operator SK Telecom (SKT) claimed it can solve memory supply chain issues using SK Hynix wares as it continues ...

Semiconductor Engineering

Dynamic KV Cache Scheduling in Heterogeneous Memory Systems for LLM Inference (Rensselaer Polytechnic Institute, IBM)

A new technical paper titled “Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System” was published by researchers at Rensselaer Polytechnic Institute and IBM. “Large ...

EurekAlert!

SNU researchers develop AI technology that compresses LLM chatbot ‘conversation memory’ by 3–4 times

In long conversations, chatbots generate large “conversation memories” (KV). KVzip selectively retains only the information useful for any future question, autonomously verifying and compressing its ...

No Film School on MSN

In news that surprises no one, LLM usage leads to less brain activity

Last year, MIT published a paper titled, "Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task." It's over 200 pages, but you can read it yourself.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results