
What is the cost?
In some situations, price is very important, especially when some tasks will be repeated many times. While the cost of one answer may be fractions of a cent, they’ll add up. On big data assembly lines, downgrading to a cheap option can make the difference between a financial success and failure.
In other jobs, the price won’t matter. Maybe the prompt will only be run a few times. Maybe the price is much lower than the value of the job. Scrimping on the LLM makes little sense in these cases because spending extra for a bigger, fancier model won’t break the budget.
Was the model trained on synthetic data?
Some LLMs are trained on synthetic data created by other models. When things go correctly, the model doesn’t absorb any false bits of information. But when things go wrong, the models can lose precision. Some draw an analogy to the way that copies of copies of copies grow blurred and lose detail. Others compare the process to audio feedback between an amplifier and a microphone.
Is the training set copyrighted?
Some LLM creators cut corners when they started building their training set by including pirated books. Anthropic, for example, has announced a settlement to a class action lawsuit for some books that are still under copyright. Other lawsuits are still pending. The claim is that the models may produce something close to the copyrighted material when prompted the right way. If your use cases may end up asking for answers that might turn up plagiarized or pirated material, you should look for some assurances about how the training set was chosen.

