[Feature Request]: support non-stream chat completion.

hxy9243 · September 5, 2024, 11:40pm

We need to support non-stream completion in the API as well. Currently, if the stream=True is not specified, API returns error 400:

code:

    completion = client.chat.completions.create(
      model="Meta-Llama-3.1-8B-Instruct",
      messages=[
        {"role": "user", "content": prompt}
      ],
      # stream=True,
      **kwargs,
    )

stacktrace:

    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'code': None, 'message': 'Current API supports stream mode. Add "stream": true in the payload', 'param': None, 'type': 'unsupported_config_error'}

This might break some existing agent applications and not provide a smooth user experience.

vasanth.mohan · September 6, 2024, 1:55am

Thanks for flagging. Will have some updates to share on this shortly! Are you in the process of building an agentic ai app?

hxy9243 · September 6, 2024, 4:47am

Hi I’m not building anything serious, just playing around.

I notice this could break some llama_index functionalities, as it calls llm with non-steaming api under the hood.

sam.saffron · September 12, 2024, 1:29am

I was just testing this today and non stream worked fine on the 405b.

hxy9243 · September 12, 2024, 5:46am

Verified. Thank you!