I recently stumbled upon this site that publishes really good benchmarks for all the different models.
For example: Here is how is how DeepSeek does against all the various benchmarks.