[Feature Request]: support non-stream chat completion.

We need to support non-stream completion in the API as well. Currently, if the stream=True is not specified, API returns error 400:

code:

    completion = client.chat.completions.create(
      model="Meta-Llama-3.1-8B-Instruct",
      messages=[
        {"role": "user", "content": prompt}
      ],
      # stream=True,
      **kwargs,
    )

stacktrace:

    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'code': None, 'message': 'Current API supports stream mode. Add "stream": true in the payload', 'param': None, 'type': 'unsupported_config_error'}

This might break some existing agent applications and not provide a smooth user experience.

2 Likes

Thanks for flagging. Will have some updates to share on this shortly! Are you in the process of building an agentic ai app?

Hi I’m not building anything serious, just playing around.

I notice this could break some llama_index functionalities, as it calls llm with non-steaming api under the hood.

I was just testing this today and non stream worked fine on the 405b.

1 Like

Verified. Thank you!