Add Custom Billing Units for Pricing Along with Price Per Request
complete
A
Ahmed Ibrahim
Sulu.sh offers several advantages over other APIs, such as pay-as-you-go pricing and instant payouts.
However, applying the pay-as-you-go concept to services like AI, Text-to-Speech, Speech-to-Text, and other AI-related services—where billing is typically based on custom units like minutes or tokens—presents challenges.
Let's take the Speech-to-Text API as an example. The typical approach is to offer an API built on infrastructure from providers like Google Speech-to-Text, OpenAI, or even a model hosted on private servers.
However, I price my API based on the number of hours or minutes the media file requires for processing, as that's what my actual cost is as an API provider. If I were to price it based on the number of requests instead of the number of minutes the uploaded file contains, one of two scenarios could occur—both of which might force me to stop providing the API.
In the first scenario, I could set the price per request at the maximum possible cost to avoid any losses in the worst-case scenario. This would lead to high costs for users with minimal usage, making the API unattractive to them due to the high cost relative to their small usage.
In the second scenario, I could set the price based on the average or a lower estimate, but this could lead to situations where a user makes small requests that involve a high number of minutes or hours for each request, causing my costs to skyrocket and making it unsustainable for me to continue offering the API.
These examples apply to the entire new generation of APIs that use custom billing units, such as:
- AI vision (billed based on image size, i.e., the number of pixels),
- Text-to-Speech (billed based on the number of characters),
- AI text (billed based on the number of processed tokens),
- AI text-image (billed based on the number of processed tokens and images),
- Multimedia models that handle multiple content types, such as text, image, voice, and video in a single request. This means one standard request can consume multiple billing units.
The last example highlights the need for the ability to consume multiple billing units in a single request. In my opinion, this concept is very new and not yet popular because most companies offering these models are still in beta or not yet available to the public.
However, the other use cases mentioned above are very popular and in high demand due to the AI era, which has led companies and developers to integrate AI into their applications, prompting them to seek API providers that offer accurate pay-as-you-go billing.
As an API provider, the best approach I have experienced and found stable is to add the consumed billing units for each request in the header of the response that I will return it to Sulu, which will bill the user for that amount. For example, in Speech-to-Text, if the user uploads a file with 10 minutes to process, I will add this custom header to the response:
"billing": "minutes=10"
If you support multiple custom billing units, it could be in another use case:
"billing": "minutes=10, characters=400"
.Samuel Alarco Cantos
complete
Samuel Alarco Cantos
It has been a long time. We have implemented this feature in our new API product. Things are rolling out as we speak. If you are still interested, would love to demo it to you.
A
Ahmed Ibrahim
Hello,
Thank you for continuing to develop such a great platform.
I want to highlight something important, especially for AI APIs. In addition to providing custom billing units, there is another need.
The API owner should be able to call a specific sulu endpoint to charge the user with custom units.
This requirement arises from the following scenarios:
Some APIs are asynchronous, meaning the developer calls my API, and the API instantly returns a response containing a specific ID. This ID can then be used with another endpoint within the same API product to check the status of the result.
Why use this pattern? Why don’t we return the result as soon as it is ready?
Because some APIs, such as AI video generation, AI audio cloning, and emerging AI trends like reasoning models and deep search models, can take up to 30 minutes to process, as seen in deep search and AI video creation.
For some of these APIs, billing depends on the final result. For example, the consumed tokens are calculated based on the output, whether in text or image processing for AI APIs.
As a provider of AI APIs, I need to ensure that I can at least cover the cost of my APIs when users consume them.
Another challenge with synchronous APIs is that you would need to keep a worker or cloud function running and maintain an open connection for 30 minutes or longer. This approach is inefficient, as it consumes server resources and can lead to timeouts or connection failures, making it impractical for long-running AI tasks.
I understand that this is an advanced and complex solution to implement, but it would make the platform stand out compared to similar services in the market, especially with the evolving demands of AI technology.
Thank you, and I look forward to seeing custom billing units in production.
Samuel Alarco Cantos
in progress
Samuel Alarco Cantos
This is amazing Ahmed, thank you very much for going into so much detail. It is now very clear to me how this is something that Sulu can differentiate itself with, and something that is clearly needed for this new generation of APIs as you very well identify it. I will share this post with the whole team. This was already on our roadmap but it is becoming more and more clear how urgent it is.
Will get to work on this ASAP.
A
Ahmed Ibrahim
I updated the post to include a more accurate example and additional details to demonstrate the need for this type of integration.
Samuel Alarco Cantos
Samuel Alarco Cantos
I think this is a really cool idea, and is perfect for what Sulu is building.
Samuel Alarco Cantos
under review