JSON River – Parse JSON incrementally as it streams in

(github.com)

130 points | by rickcarlino 5 days ago

22 comments

rictic 4 hours ago
Hi HN! Didn't expect this to be on the front page today! I should really release all the optimizations that've been landing lately, the version on github is about twice as fast as what's released on npm.
I wrote it when I was doing prototyping on doing streaming rendering of UIs defined by JSON generated by LLMs. Using constrained generation you can essentially hand the model a JSON serializable type, and it will always give you back a value that obeys that type, but the big models are slow enough that incremental rendering makes a big difference in the UX.
I'm pretty proud of the testing that's gone into this project. It's fairly exhaustively tested. If you can find a value that it parses differently than JSON.parse, or a place where it disobeys the 5+1 invariants documented in the README I'd be impressed (and thankful!).
This API, where you get a series of partial values, is designed to be easy to render with any of the `UI = f(state)` libraries like React or Lit, though you may need to short circuit some memoization or early exiting since whenever possible jsonriver will mutate existing values rather than creating new ones.
[-]
- stevage 1 hour ago
  Suggestion: make it clearer in the readme what happens with malformed input.
  I can imagine it being useful to have a made where you never emit strings until they are final, also. I don't entirely understand why strings are emitted incrementally but numbers aren't.
  [-]
  - rictic 1 minute ago
    Good feedback! Just updated the README with the following:
    > The parse function also matches JSON.parse's behavior for invalid input. If the input stream cannot be parsed as the start of a valid JSON document, then parsing halts and an error is thrown. More precisely, the promise returned by the next method on the AsyncIterable rejects with an Error. Likewise if the input stream closes prematurely.
    As for why strings are emitted incrementally, it's just that I was often dealing with long strings produced slowly by LLMs. JSON encoded numbers can be big in theory, but there's no practical reason to do so as almost everyone decodes them as 64bit floats.
  - xp84 31 minutes ago
    Seems useful to me in the context of something like a progressively rendered UI. A large block of text appearing a few characters at a time would be fine, but a number that represents something like a display metric (say, a position, or font-size) going from 0 to 0.5 or from 1 to 1000, would result in goofy gyrations on-screen that don't make any sense. Or imagine if it was just fields in the app's data.
    Name: John Smith. Birth Year: A.D. 1 [Customer is a Senior: 2,024 years old]
    Name: John Smith. Birth year: A.D. 19 [Customer is a Senior: 2,006 years old]
    Name: John Smith. Birth year: A.D. 199 [Customer is a Senior: 1,826 years old]
    Name: John Smith. Birth year: 1997
syx 5 hours ago
For those wondering about the use case, this is very useful when enabling streaming for structured output in LLM responses, such as JSON responses. For my local Raspberry Pi agent I needed something performant, I've been using streaming-json-js [1], but development appears to have been a bit dormant over the past year. I'll definitely take a look at your jsonriver and see how it compares!
[1] https://github.com/karminski/streaming-json-js
[-]
- rokkamokka 4 hours ago
  For LLMs I recommend just doing NDJSON, that is, newline delimited json. It's much simpler to implement
  [-]
  - rictic 3 hours ago
    Do any LLMs support constrained generation of newline delimited json? Or have you found that they're generally reliable enough that you don't need to do constrained sampling?
    [-]
    - sprobertson 1 hour ago
      not for the standard hosted APIs using structured output or function calling, best you can get is an array
  - stevage 1 hour ago
    I love NDJSON in general. I use it a lot for spatial data processing (GDAL calls it GeoJsonSeq).
- cjonas 5 hours ago
  Particularly for REACT style agents that use a "final" tool call to end the run.
simonw 5 hours ago
If anyone needs to do this in Python I've had success with both ijson and jiter - notes here: https://til.simonwillison.net/json/ijson-stream and https://simonwillison.net/2024/Sep/22/jiter/
carterschonwald 4 hours ago
Oh fun, I wrote a similar library in 2015 for Haskell. There is an annoying gotcha to deal with: there are sequences of valid characters that can be parsed incorrectly if you’re doing incremental chunks, namely if “0.0” is split across two input chunks you can get a token stream with two valid float literals rather than 1! Namely “0” and “.0”, which is just a really annoying wart of json float syntax.
[-]
- rictic 4 hours ago
  Yeah, getting numbers correct was one of the trickier wrinkles in the project. https://github.com/rictic/jsonriver/blob/5515be978bb564e9bdc...
- tracnar 4 hours ago
  Don't you need to wait for some kind of delimiter (like ",", "]", "}", newline, EOF) before parsing something else than a string?
- yonatan8070 4 hours ago
  An "off the top of my head" solution to this would be not to yield tokens until a terminating character (comma, \n, }).
jlundberg 1 hour ago
I really like just encoding each object as JSON and then concatinating them with a new line between.
Allows parsing and streaming without any special libraries and allow for an unlimited amount of data (with objects being reasonably sized).
Usually gives these files the .jsonlines suffix when stored on disk.
Allows for batch process without requiring huge amounts of memory.
magicalhippo 5 days ago
I wrote a more traditional JSON parser for my microcontooller project. You could iterate over elements and it would return "needs more data" if it was unable to proceed. You could then call it again after fetching more. Then just simple state machines to consume the objects.
The benefit with that was that you didn't need the memory to store the deserialized JSON object in memory.
This seems to be more oriented towards interactivity, which is an interesting use-case I hadn't thought about.
[-]
- rickcarlino 5 days ago
  I found this because I am interested in streaming responses that populate a user interface quickly, or use spinners if it is loading still
Xmd5a 2 hours ago
I wrote something similar that can also produce JSON incrementally from other streaming data sources. It combines a streaming JSON parser with streaming strings and a streaming regex engine.
Concretely, it means I can call an LLM, wrap its output stream in a streaming string, and treat it like a regular string. No need for print loops, it’s all handled behind the scenes. I can chain transformations (joining strings, splitting them with regexes, capturing substrings, etc.) and serialize the results into JSON progressively, building lazy sequences or maps on the fly.
The benefit is that I can start processing and emitting structured data immediately, without waiting for the LLM’s full response. Filtered output can be shown to users as it arrives, with near-zero added latency (aside from regex lookaheads).
chrchr 3 hours ago
I did something like this for Python [1]. The application I worked on at the time had a feature allowing users to import and export their data as a JSON document, and users often had enough data to make this cumbersome, especially with serialization and deserialization overhead. My implementation can also generate JSON documents as they stream out, from Python generators. The incremental JSON parsing was a little difficult to use, but incremental generation was an immediate win. We generated JSON documents from database results row-by-row and streamed the output to the web server, never producing the entire document in memory.
[1] https://github.com/chrchr/flojay
holdenc137 5 hours ago
I don't get it (and I'd call this cumulative not incremental)
Why not at least wait until the key is complete - what's the use in a partial key?
[-]
- xg15 3 hours ago
  Doesn't it do exactly that?
  > As a consequence of 1 and 5, we only add a property to an object once we have the entire key and enough of the value to know that value's type.
- rictic 4 hours ago
  Cumulative is a good term too. I come from the browser world where it's typically called incremental parsing, e.g. when web browsers parse and render HTML as it streams in over the wire. I was doing the same thing with JSON from LLMs.
- simonw 5 hours ago
  If you're building a UI that renders output from a streaming LLM you might get back something which looks like this:
```
  {"role": "assistant", "text": "Here's that Python code you aske
```
  Incomplete parsing with incomplete strings is still useful in order to render that to your end user while it's still streaming in.
  [-]
  - trevor-e 3 hours ago
    In this example the value is incomplete, not the key.
  - cozzyd 5 hours ago
    incomplete strings could be fun in certain cases
    {"cleanup_cmd":"rm -rf /home/foo/.tmp" }
    [-]
    - stronglikedan 3 hours ago
      If any part of that value actually made it, unchecked, to execution, then you have bigger problems than partial JSON keys/values.
    - rictic 5 hours ago
      Yeah, another fun one is string enums. Could tread "DeleteIfEmpty" as "Delete".
      [-]
      - Waterluvian 4 hours ago
        I imagine if you reason about incomplete strings as a sort of “unparsed data” where you might store or transport or render it raw (like a string version of printing response.data instead of response.json()), but not act on it (compare, concat, etc), it’s a reasonably safe model?
        I’m imagining it in my mental model as being typed “unknown”. Anything that prevents accidental use as if it were a whole string… I imagine a more complex type with an “isComplete” flag of sorts would be more powerful but a bit of a blunderbuss.
AaronFriel 5 hours ago
Oh, this is quite similar to an online parser I'd written a few years ago[1]. I have some worked examples on how to use it with the now-standard Chat Completions API for LLMs to stream and filter structured outputs (aka JSON). This is the underlying technology for a "Copilot" or "AI" application I worked on in my last role.
Like yours, I'm sure, these incremental or online parser libraries are orders of magnitude faster[2] than alternatives for parsing LLM tool calls for the very simple reason that alternative approaches repeatedly parse the entire concatenated response, which requires buffering the entire payload, repeatedly allocating new objects, and for an N token response, you parse the first token N times! All of the "industry standard" approaches here are quadratic, which is going to scale quite poorly as LLMs generate larger and larger responses to meet application needs, and users want low latency outputs.
One of the most useful features of this approach is filtering LLM tool calls on the server and passing through a subset of the parse events to the client. This makes it relatively easy to put moderation, metadata capture, and other requirements in a single tool call, while still providing low latency streaming UI. It also avoids the problem with many moderation APIs where for cost or speed reasons, one might delegate to a smaller, cheaper model to generate output in a side-channel of the normal output stream. This not only doesn't scale, but it also means the more powerful model is unaware of these requirements, or you end up with a "flash of unapproved content" due to moderation delays, etc.
I found that it was extremely helpful to work at the level of parse events, but recognize that building partial values is also important, so I'm working on something similar in Rust[3], but taking a more holistic view and building more of an "AI SDK" akin to Vercel's, but written in Rust.
[1] https://github.com/aaronfriel/fn-stream
[2] https://github.com/vercel/ai/pull/1883
[3] https://github.com/aaronfriel/jsonmodem
(These are my own opinions, not those of my employer, etc. etc.)
rixed 58 minutes ago
So SAX, but for json?
Plus ça change, et plus c'est la même chose.
eric-p7 3 hours ago
"has no dependencies, and uses only standard features of JavaScript so it works in any JS environment."
Then I see a Node style import and npm. When did Node/NPM stop being dependencies and become standardized by JavaScript? Where's my raw es6 module?
[-]
- jcla1 2 hours ago
  FWIW the import syntax is now part of standard JS, according to the ECMAScript 2026 specification:
  https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
  And node seems to be used only as a dev dependency, to test, benchmark and build/package the project. If you'd be inclined you can use the project's code as-is elsewhere, i.e. in the browser.
- rictic 2 hours ago
  Bare module specifiers aren't just for Node! Deno and browsers support import maps e.g.
  The library doesn't use any APIs beyond those in the JS standard, so I'm pretty confident it will work everywhere, but happy to publish in more places and run more tests. Any in particular that you'd like to see?
  [-]
  - o11c 1 hour ago
    Mostly unrelated, but does anyone know how are you supposed to make path-less module specifiers work for Node if you are not using npm but rather system-installed JS packages (Debian etc. install node-* packages into /usr/share/nodejs/)? With `require` it just works, but with `import` it errors and suggests passing the absolute path (even though it clearly knows what path ...).
    For some reason everybody in the JS world takes "download and execute random software from the Internet" as the only way to do things.
    [-]
    - rictic 1 hour ago
      Try import maps, something like:
      { "imports": { "express": "/usr/share/nodejs/express/index.js", "another-module": "/usr/share/nodejs/another-module/index.js" } }
      Then run node like: `node --import-map=./import-map.json app.js`
      The Debian approach of having global versions of libraries seems like it's solving a different problem than the ones I have. I want each application to track and version its own dependencies, so that upgrading a dependency for one doesn't break another, and so that I can go back to an old project and be reasonably confident it'll still work. That ultimately led me to nix.
      [-]
      - o11c 58 minutes ago
        I have a simpler solution to the latter problem: if upgrading a dependency package breaks anything (barring multi-year deprecation, limited-time experimental previews, etc.), I blacklist it and never install that package ever again. After all, they are clearly lacking on either their testing infrastructure or their development guidelines.
        It's amazing how much the quality of installed software improves when you do this. Something our industry desperately needs.
mattvr 4 hours ago
You could also use JSON Merge Patch (RFC 7396) for a similar use case.
(The downside of JSON Merge Patch is it doesn't support concatenating string values, so you must send a value like `{"msg": "Hello World"}` as one message, you can't join `{"msg": "Hello"}` with `{"msg": " World")`.)
[1] https://github.com/pierreinglebert/json-merge-patch
[-]
keleftheriou 3 hours ago
Thanks for sharing!
Roughly how does it compare with https://github.com/promplate/partial-json-parser-js ?
seanalltogether 5 hours ago
Maybe I'm wrong but it seems like you would only want to parse partial values for objects and arrays, but not strings or numbers. Objects and arrays can be unbounded so it makes sense to process what you can, when you can, whereas a string or number usually is not.
[-]
- rictic 5 hours ago
  Numbers, booleans, and nulls are atomic with jsonriver, you get them all at once only when they're complete.
  For my use case I wanted streaming parse of strings, I was rendering JSON produced by an LLM, for incrementally rendering a UI, and some of the strings were long enough (descriptions) that it was nice to see them render incrementally.
- everforward 5 hours ago
  It could be useful if you're doing something with the string that operates sequentially anyways (i.e. block-by-block AES, or SHA sums).
  I _think_ the intended use of this is for people with bad internet connections so your UI can show data that's already been received without waiting for a full response. I.e. if their connection is 1KB/s and you send an 8KB JSON blob that's mostly a single text field, you can show them the first kilobyte after a second rather than waiting 8 seconds to get the whole blob.
  At first I thought maybe it was for handling gigantic JSON blobs that you don't want to entirely load into memory, but the API looks like it still loads the whole thing into memory.
- xg15 3 hours ago
  There is json that has very long string literals. Usually, it's either long-ish text or HTML content, or base64-encoded binary data.
  So I'd definitely count strings as "unbounded" as well.
- AaronFriel 5 hours ago
  If you're generating long reports, code, etc. with an LLM, partial strings matter quite a lot for user experience.
zahlman 4 hours ago
> If you gave this to jsonriver one byte at a time it would yield this sequence of values:
Does it create a new value each time, or just mutate the existing one and keep yielding it?
[-]
- rictic 4 hours ago
  It mutates the existing value and yields it again (unless the toplevel value is a string, because strings are immutable in JS).
alganet 5 hours ago
Interesting approach.
I would expect an object JSON stream to be more like a SAX parser though. It's familiar, fast and simple.
Any thougts on not chosing the SAX approach?
[-]
- rictic 4 hours ago
  SAX is often better if you don't need the full final result, especially if you can throw away most of the data after it's been processed. The nice part about this API is that you just get a DeepPartial<FinalResult> so the code to handle a partial result is basically the same as the code to handle the final result.
- benatkin 5 hours ago
  I think this is a lot like etree in python's streaming approach for XML, but with a simpler API, and incremental text parsing. With etree in python, you can access the incomplete tree data and not have to worry about events. So it's missing the SAX API part of a SAX approach, but is built like some real world libraries that use the SAX approach, which end up having a hybrid of events and trees.
  [-]
  - alganet 5 hours ago
    It seems to be convenient for some cases. A large object with many keys, for example.
    I don't see it as particularly convenient if I want to stream a large array of small independent objects and read each one of them once, then discard it. The incremental parsed array would get bigger and bigger, eventually containing all the objects I wanted to discard. I would also need to move my array pointer to the last element at each increment.
    jq and JSON.sh have similar incremental "mini-object-before-complete" approaches to parsing JSON. However, they do include some tools to shape those mini-objects (pruning, selecting, and so on). Also, they're tuned for pipes (new line is the event), which caters to shell and text-processing tools. I wonder what would be the analogue for that in a higher language.
    [-]
    - benatkin 4 hours ago
      This is more versatile than it seems at first glance. Under invariants, it shows that you have arrays/objects only being mutated, so you have stable references. You could use a WeakSet to observe new children of an item coming in. You also may not even need manage this directly - you could debounce and just re-render a UI component by returning a modified virtual DOM. Or if you had a visualization in d3, it would automatically notice which ones are new.
      [-]
      - alganet 3 hours ago
        It does sound very practical indeed.
EGreg 2 hours ago
I recently also wrote a streaming JSON parser in PHP. In case anyone is interested, I would love to get your feedback. It’s designed to work independently or with the rest of our system.
https://github.com/Qbix/Platform/blob/main/platform/classes/...
codesnik 5 hours ago
I can't imagine a usecase. Ok, you receive incremental updates, which could be useful, but how to find out that json object is actually received in full already?
[-]
- Supermancho 5 hours ago
  When you want to pull multi-gig JSON files and not wait for the full file before processing is where I first used this.
  [-]
  - rictic 4 hours ago
    Funnily enough, this was one of the first users of jsonriver at google. A team needed to parse more JSON than most JS VMs will allow you to fit into a single string, so they had no choice but to use a streaming parser.
- philipallstar 5 hours ago
  When its closing brace or square bracket appears.
  EDIT: this is totally wrong and the question is right.
  [-]
  - rising-sky 5 hours ago
    Actually, not quite how this works. You always get valid JSON, as in this sequence from the readme:
```json {"name": "Al"} {"name": "Ale"} ```
So the braces are always closed
jauntywundrkind 4 hours ago
It's no longer active, but Oboe.js did great stuff for a decade+ in this field! It has some very nice APIs for consuming. https://github.com/jimhigson/oboe.js/
It's less about incrementally parsing objects, and more about picking paths and shapes out from a feed. If you're doing something like array/newline delimited json, it's a great tool for reading things out as they arrive. Also great for example for feed parsing.
quotemstr 3 hours ago
Awesome. You know what would be EVEN COOLER?
Given a schema and a JSON message prefix, parse the complete message but substitute missing field values with Promise objects. Likewise, represent lists as lazy sequences. Add a pubsub system.
florians 5 hours ago
Noteworthy: Contributions by Claude
[-]
- rictic 4 hours ago
  Is true. I wrote a ton of tests, testing just about everything I can think of, including using a reverse parser I wrote to exhaustively generate the simplest 65k json values, ensuring that it succeeds with the same values and fails on the same cases as JSON.parse.
  Then added benchmarks and started doing optimization, getting it ~10x faster than my initial naive implementation. Then I threw agents at it, and between Claude, Gemini, and Codex we were able to make it an additional 2x faster.