New TIL: How streaming LLM APIs work
I put together some notes after poking around with the OpenAI, Anthropic and Google Gemini streaming APIs
New TIL: How streaming LLM APIs work I put together some notes after poking around with the OpenAI, Anthropic and Google Gemini streaming APIs 6 comments
@velaia my guess is that OpenAI did that first because they were worried prompts would be too long to send over GET, then everyone else followed their lead @simon @velaia Nice. I wonder in the future if the QUERY method is accepted then we can then run server sent events with large payload requests. https://www.ietf.org/archive/id/draft-ietf-httpbis-safe-method-w-body-02.html @simon Nice. Little note: on a recent curl, you can POST JSON with `curl --json <string>, saving the header setting. Update: `--no-buffer` always `fflush()`es the output in curl. So it might still be beneficiary. Updated my TIL with example JavaScript code for streaming events from a fetch() POST API (using an async iterator function) https://til.simonwillison.net/llms/streaming-llm-apis#user-content-bonus--2-processing-streaming-events-in-javascript-with-fetch |
@simon nice breakdown of the requests and expected responses.