New TIL: How streaming LLM APIs work I put together...

Simon's posts Post Back to profile

Simon Willison

New TIL: How streaming LLM APIs work

I put together some notes after poking around with the OpenAI, Anthropic and Google Gemini streaming APIs

https://til.simonwillison.net/llms/streaming-llm-apis

Like 21 September at 21:25 | Open on fedi.simonwillison.net

6 comments

Kev_Prime

@simon nice breakdown of the requests and expected responses.

21 September at 21:49 | Open on infosec.exchange

velaia

@simon Great post, Simon.

Do you have any idea why all 3 providers use POST and not GET that would work with the EventSource API?

21 September at 23:46 | Open on kosmos.social

Simon Willison

@velaia my guess is that OpenAI did that first because they were worried prompts would be too long to send over GET, then everyone else followed their lead

21 September at 23:48 | Open on fedi.simonwillison.net

Leon Brocard

@simon @velaia Nice. I wonder in the future if the QUERY method is accepted then we can then run server sent events with large payload requests. https://www.ietf.org/archive/id/draft-ietf-httpbis-safe-method-w-body-02.html

22 September at 6:56 | Open on fosstodon.org

Stefan Eissing

@simon Nice.

Little note: on a recent curl, you can POST JSON with `curl --json <string>, saving the header setting.
Also, `--no-buffer` should no longer be necessary.

Update: `--no-buffer` always `fflush()`es the output in curl. So it might still be beneficiary.

22 September at 7:23 | Open on chaos.social

Simon Willison

Updated my TIL with example JavaScript code for streaming events from a fetch() POST API (using an async iterator function) https://til.simonwillison.net/llms/streaming-llm-apis#user-content-bonus--2-processing-streaming-events-in-javascript-with-fetch

22 September at 16:27 | Open on fedi.simonwillison.net

Go Up