Sulu.sh offers several advantages over other APIs, such as pay-as-you-go pricing and instant payouts.
However, applying the pay-as-you-go concept to services like AI, Text-to-Speech, Speech-to-Text, and other AI-related services—where billing is typically based on custom units like minutes or tokens—presents challenges.
Let's take the Speech-to-Text API as an example. The typical approach is to offer an API built on infrastructure from providers like Google Speech-to-Text, OpenAI, or even a model hosted on private servers.
However, I price my API based on the number of hours or minutes the media file requires for processing, as that's what my actual cost is as an API provider. If I were to price it based on the number of requests instead of the number of minutes the uploaded file contains, one of two scenarios could occur—both of which might force me to stop providing the API.
In the first scenario, I could set the price per request at the maximum possible cost to avoid any losses in the worst-case scenario. This would lead to high costs for users with minimal usage, making the API unattractive to them due to the high cost relative to their small usage.
In the second scenario, I could set the price based on the average or a lower estimate, but this could lead to situations where a user makes small requests that involve a high number of minutes or hours for each request, causing my costs to skyrocket and making it unsustainable for me to continue offering the API.
These examples apply to the entire new generation of APIs that use custom billing units, such as:
  • AI vision (billed based on image size, i.e., the number of pixels),
  • Text-to-Speech (billed based on the number of characters),
  • AI text (billed based on the number of processed tokens),
  • AI text-image (billed based on the number of processed tokens and images),
  • Multimedia models that handle multiple content types, such as text, image, voice, and video in a single request. This means one standard request can consume multiple billing units.
The last example highlights the need for the ability to consume multiple billing units in a single request. In my opinion, this concept is very new and not yet popular because most companies offering these models are still in beta or not yet available to the public.
However, the other use cases mentioned above are very popular and in high demand due to the AI era, which has led companies and developers to integrate AI into their applications, prompting them to seek API providers that offer accurate pay-as-you-go billing.
As an API provider, the best approach I have experienced and found stable is to add the consumed billing units for each request in the header of the response that I will return it to Sulu, which will bill the user for that amount. For example, in Speech-to-Text, if the user uploads a file with 10 minutes to process, I will add this custom header to the response:
"billing": "minutes=10"
If you support multiple custom billing units, it could be in another use case:
"billing": "minutes=10, characters=400"
.