Slow down?

stephen.parkes · November 20, 2024, 8:19am

Hi all,
I’ve been experiencing slow inference the last 24 hours or so. The tool I developed for the hackathon used to run in under 5 minutes on the slow computer I’ve been using (and most times under 3) and it’s over 10 minutes now with lots more errors.

It was so fast previously I used to hit the rate limits during testing sometimes on a tool that only makes up to 9 LLM calls

Is anyone else experiencing a slowdown over the last day or so?

user32 · November 20, 2024, 8:47am

Hello Stephen,

We will look into the slow inference issue you’re facing and will update you soon. If you have any additional information or observations, please feel free to share them with us.
Thank you

Best regards,
Rohit Vyawahare

stephen.parkes · November 20, 2024, 8:59am

Thanks Rohit,

I’m trying to do some more testing at my end but I’m also doing my day job on the other computer for a few hours. If I find anything interesting I’ll keep you up to date.

Steve

user32 · November 20, 2024, 10:30am

Hello Stephen,

Thank you for your message. To better understand the issue you’re experiencing, could you kindly confirm if you have tried running the models using your community email address, or you’ve used a different ID? Additionally, could you specify which model(s) you are noticing the slowness with?

Your feedback will help us investigate the issue more effectively.

Best regards,
Rohit

coby.adams · November 21, 2024, 2:12am

@stephen.parkes Can you let me know what context lengths your prompts are?

stu · November 23, 2024, 12:35pm

Remove max_tokens parameter.

I also encountered a significant slowdown- and even timeout issues this week for Samba API code I’ve not touched in weeks. I was able to pin it down to ‘max_tokens’ parameter that was well below the Llama 3.1 70b limit. It worked fine again, once I commented out max_tokens.