Of course there are. You have size of the context window and how many tokens are allowed. You have how many billions of parameters the model was trained on. You have metrics on different efficacies of training type like RAG or RLHF. There’s lag between query and response and how many hundreds of milliseconds it takes to receive audio responses to voice prompts, or how many modes can used to engage it.
On top of that, most releases have a model card which lays out how the system was setup and what to expect in the results. There’s human rankings based on how capable the model is, and ranks that are calculated based on tests designed for PhDs or mathematicians. You can determine how good it is as writing and repairing its own code.
https://huggingface.co/learn/nlp-course/chapter4/4
Details matter here, but assume fancy Google for now with most goods available today to consumers