I am trying to find information on a very specific kind of usage for mobile applications.
You may have heard of the following ones:
-General chats: Chaton, ChatBox, Ask AI, Chat & Ask AI, Chat Nova, etc.
-Image GenAI: Face Dance, Wonder AI, Remini, etc.
Most of those apps are basically improved version of the most popular GenAI solutions. Like to call them “ChatGPT on steroid” or “Dall-E on steroid”.
There are different sources of information saying that some of those apps are essentially relying on existing technologies like GPT3.5 or 4 (for the general chats) or largely popular model such as Stable diffusion (for the Image GenAI).
Still, something bothers me: If those apps were just “wrappers” of existing commercial GenAI models, the cost of just calling a commercial API would probably be gigantic considering the high amount of users & the time they spend on apps.
Would anyone here have experience in creating similar B2C solutions that might extensively rely on commercial GenAI API such as the GPT one? I’d like to better understand the rational of using a commercial API rather than self hosting a model like Llama 2 - 3. I assume there are plenty of parameters to considers, the cost but also the technical complexity, time to go to market, operational constraints, etc.
This is a very interesting topic. Using commercial APIs like GPT-4 allows for quick and robust integration, but the costs can become substantial as you scale. Self-hosted models like LLaMA can reduce these costs but require significant setup and maintenance. APIs are scalable and reliable, ideal for initial stages, while self-hosting demands a skilled team to manage operations. Some companies adopt a hybrid approach, starting with APIs and gradually shifting to self-hosted models to balance cost and performance. I hope this insight is helpful.
Thanks for the insight, that’s very interesting. After making additional research, kind of makes sense to proceed like that.
I have made a quick calculation. Popular chat app tend to cost about $5 / user per month when using GPT 3.5 API for conversation mostly. Those apps tend to charge $7.5 / week. So even with the marketing cost to acquire user, we could assume those apps are indeed profitable when just calling GPT API. I also noticed that many of them tend to limit the actual usage to <50 messages / day, obviously to save on API costs.
However, I am having troubles finding what would be the right threshold in terms of users to shift toward a self hosted model. Do you have any clues about this? I mean, looks like hosting Llame2 cost like €1,000 / month with a basic virtual machien that has a entry-level GPU. But what I can’t figure how is the number of machines or the GPU level you’ll need to handle multiple requests at the same time without latency.