Credit: Image generated by VentureBeat with Stable Diffusion 3.5

Differentiable Adaptive Merging is accelerating SLMs for enterprises

23 Oct 2024, 21:32 by Sean Michael Kerner · VentureBeat

Model merging is a fundamental AI process that enables organizations to reuse and combine existing trained models to achieve specific goals.

There are various ways that enterprises can use model merging today, but many approaches are complex. A new approach known as Differentiable Adaptive Merging (DAM) could be the answer, providing a solution to the current challenges of model merging. DAM offers an innovative solution to combining AI models while potentially reducing computational costs.

Arcee AI, a company focusing on efficient, specialized small language models, is leading the charge on DAM research. The company, which raised funding in May 2024, has evolved from providing model training tools to becoming a full-fledged model delivery platform with both open-source and commercial offerings.

How DAM creates a new path forward for model merging

Merging can help companies combine models specialized in different areas to create a new model capable in both areas.

The basic concept of merging data is very well understood with structured data and databases. However, merging models is more abstract than merging structured data, as the internal representations of the models are not as interpretable.

Thomas Gauthier-Caron, research engineer at Arcee AI and one of the authors of the DAM research explained to VentureBeat that traditional model merging has often relied on evolutionary algorithms. That approach can potentially be slow and unpredictable. DAM takes a different approach by leveraging established machine learning (ML) optimization techniques.

Gauthier-Caron explained that DAM aims to solve the problem of complexity in the model merging process. The company’s existing library, MergeKit, is useful for merging different models, but it is complex due to the various methods and parameters involved.

“We were wondering, can we make this easier, can we get the machine to optimize this for us, instead of us being in the weeds tweaking all of these parameters?” Gauthier-Caron said.

Instead of just mixing the models directly, DAM adjusts based on how much each model contributes. DAM uses scaling coefficients for each column in the models’ weight matrices. It automatically learns the best settings for these coefficients by testing how well the combined model performs, comparing the output with the original models and then adjusting the coefficients to get better results.

According to the research, DAM performs competitively with or better than existing methods like evolutionary merging, DARE-TIES and Model Soups. The technology represents a significant departure from existing approaches, according to Gauthier-Caron. He described evolutionary merging as a slow process, where it’s not entirely clear up front how good the result will be or how long the merge process should run.

Merging is not an Mixture of Experts approach

Data scientists combine models in many different ways. Among the increasingly popular approaches is the Mixture of Experts (MoE).

Gauthier-Caron emphasized model merging with DAM is something very different from MoE. He explained that MoE is a specific architecture that can be used to train language models.

The basic concept behind model merging is that it starts from the point where the organization already has trained models. Training these models usually costs a lot of money, so engineers aim to reuse existing trained models.

Practical applications and benefits of DAM for enterprise AI

One of DAM’s key advantages is its ability to combine specialized models efficiently.

One such example provided by Gauthier-Caron is if an organization wanted to combine a Japanese model with a math model. The goal of that combination is to make a model that’s good at math in Japanese, without the need to retrain. That’s one area where DAM can potentially excel.

The technology is particularly relevant for enterprise adoption of generative AI, where efficiency and cost considerations are paramount. Helping to create more efficient ways of operating at reduced cost is a key goal for Arcee overall. That’s why DAM research is important to both the company and ultimately its users too.

“Enterprise adoption of gen AI boils down to efficiency, availability, scalability and cost,” Mark McQuade, co-founder and CEO of Arcee AI told VentureBeat.