Conversation

Notices

Embed this notice
Greg Wilson (gvwilson@mastodon.social)'s status on Friday, 19-Jan-2024 00:10:54 JST Greg Wilson

Irugalbandara et al 2024: "A Trade-off Analysis of Replacing Proprietary LLMs with Open Source SLMs in Production" https://arxiv.org/abs/2312.14972 "We present a systematic evaluation methodology for…Small Language Models (SLMs) and their tradeoffs when replacing a proprietary LLM APIs…across 9 SLMs and 29 variants, we observe competitive quality-of-results for our use case, significant performance consistency improvement, and a cost reduction of 5x-29x when compared to OpenAI GPT-4." #nwit
In conversation about 10 months ago from mastodon.social permalink
Attachments
1. Domain not in remote thumbnail source whitelist: arxiv.org
  
  A Trade-off Analysis of Replacing Proprietary LLMs with Open Source SLMs in Production
  
  Many companies rely on APIs of managed AI models such as OpenAI's GPT-4 to create AI-enabled experiences in their products. Along with the benefits of ease of use and shortened time to production, this reliance on proprietary APIs has downsides in terms of model control, performance reliability, up-time predictability, and cost. At the same time, there has been a flurry of open source small language models (SLMs) that have been made available for commercial use. However, their readiness to replace existing capabilities remains unclear, and a systematic approach to test these models is not readily available. In this paper, we present a systematic evaluation methodology for, and characterization of, modern open source SLMs and their trade-offs when replacing a proprietary LLM APIs for a real-world product feature. We have designed SLaM, an automated analysis tool that enables the quantitative and qualitative testing of product features utilizing arbitrary SLMs. Using SLaM, we examine both the quality and the performance characteristics of modern SLMs relative to an existing customer-facing OpenAI-based implementation. We find that across 9 SLMs and 29 variants, we observe competitive quality-of-results for our use case, significant performance consistency improvement, and a cost reduction of 5x-29x when compared to OpenAI GPT-4.

Public

Conversation

Notices

Feeds