-
Notifications
You must be signed in to change notification settings - Fork 623
Added reader for JSONEachRow format. Updated documentation and examples#2871
Added reader for JSONEachRow format. Updated documentation and examples #2871chernser wants to merge 9 commits into
Conversation
Repository collaborators can run the JMH benchmark suite against this PR by commenting:
/benchmark
Optional regression threshold override (Δ% on Time or Alloc/op; defaults to 10%):
/benchmark threshold=15
Only one benchmark run per PR is active at a time — issuing a new /benchmark comment cancels the previous run. After the run finishes a separate comment will be posted comparing it against the latest scheduled run on main; the PR check fails if any benchmark regresses by more than the threshold.
Client V2 CoverageCoverage Report
Class Coverage
|
JDBC V2 CoverageCoverage Report
Class Coverage
|
JDBC V1 CoverageCoverage Report
Class Coverage
|
Client V1 CoverageCoverage Report
Class Coverage
|
Triage
Category: feature • Risk: high
Summary
This PR adds JSONEachRow format support to client-v2 and jdbc-v2 via a pluggable JSON parser abstraction (JsonParser / JsonParserFactory), with bundled factories for Jackson and Gson (both as provided scope). To accommodate text-format readers alongside binary ones, the existing ClickHouseBinaryFormatReader interface is gutted — all typed accessors are moved to a new ClickHouseFormatReader base interface, ClickHouseBinaryFormatReader becomes a thin marker sub-interface, and a new ClickHouseTextFormatReader specialisation is added. ResultSetImpl is updated to accept either reader type, StatementImpl routes JSONEachRow responses to the new JSONEachRowFormatReader, and ConnectionImpl gains reflection-based factory instantiation from a new jdbc_json_parser_factory JDBC URL property. ClickHouseDataType.DATA_TYPE_TO_CLASS is promoted from package-private to public for schema inference. The diff is 5 793 additions / 648 deletions across clickhouse-data/, client-v2/, and jdbc-v2/.
What this impacts
client-v2reader API —ClickHouseBinaryFormatReaderis now a sub-interface; callers still compile but any code implementing or subclassing it directly must now satisfyClickHouseFormatReaderjdbc-v2/ResultSetImpl—protected readerfield type widened fromClickHouseBinaryFormatReadertoClickHouseFormatReader; binary-incompatible for any subclassesjdbc-v2/ResultSetImplconstructors — parameter type changed, breaking source/binary compatibility for direct instantiationClickHouseDataType.DATA_TYPE_TO_CLASSvisibility change (package-private→public) incom.clickhouse.data- New JDBC connection property
jdbc_json_parser_factory; new client settingjson_disable_number_quoting - Reflection-based factory instantiation at connection-creation time in
ConnectionImpl
Concerns
- Cross-module refactor (high rule): touches
clickhouse-data/,client-v2/, andjdbc-v2/(≥ 3 modules). - Readers and Writers (high rule):
ClickHouseBinaryFormatReader— the central reader interface forclient-v2— has its entire method surface restructured. - Public API shape changed (high rule):
ResultSetImplprotectedfield and two public constructors changed fromClickHouseBinaryFormatReadertoClickHouseFormatReader; this is a binary-incompatible change for consumers that subclassResultSetImplor call its constructors directly. - Type system (high rule):
ClickHouseDataType.DATA_TYPE_TO_CLASSvisibility change incom.clickhouse.data. - Large diff (high rule): 5 793 additions far exceeds the 400-line threshold; reviewer may wish to request a split (reader refactor vs. new feature).
- Security — reflection:
ConnectionImpl.instantiateJsonParserFactoryloads and instantiates an arbitrary class name from JDBC URL properties; the context-classloader approach is reasonable, but the code path should be verified for multi-tenant environments. - Schema inference hazard:
JSONEachRowFormatReaderinfers column types from the first row only; an empty result set yields an empty schema and subsequent accessors may misbehave silently. - The embedded Cursor AI bot labelled this "medium risk"; the repository's triage rules place it firmly at high.
Required reviewer action
- At least one human reviewer required.
chernser
commented
Jun 16, 2026
@cursor review
chernser
commented
Jun 16, 2026
@cursor review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Bugbot reviewed your changes and found no new issues!
1 issue from previous review remains unresolved.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 2091837. Configure here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9f4287b. Configure here.
Quality Gate Failed Quality Gate failed
Failed conditions
13.9% Duplication on New Code (required ≤ 3%)
Uh oh!
There was an error while loading. Please reload this page.
Summary
General
Client V2 mainly supports binary formats. However there is a demand and practical need to support also formats from JSON family because of JSON popularity and effectiveness to represent complex structured data.
There is no specific JSON reader because any application can make a request via client and read input stream with favorite JSON parser. However creating such reader would help to bring JSON parsing to JDBC. As interface is already define only type mapping and some glue code is required.
Goal of this PR is to add harness for text format readers. New common interface class is created to let abstract readers. Dedicated interfaces for binary and text formats will have very specific methods.
Client has no intent to include all JSON parsing libraries and all classes are implemented in isolated way - they are not referenced by default.
New json reader has important part of the code that adopts primitive types to java ones. This conversion is required, for example, in JDBC for needs of ResultSetImpl.
Client Support
Client is the main component to implement JSON support. It should be in the style of extension or plug-in. No direct references to any JSON libraries should be. User will configure library instance and client should have a way to use it. Therefor next problems should be addressed:
There are two libraries we will support - GSON (https://github.com/google/gson) and Jackson (https://github.com/FasterXML/jackson). Both libraries has root class that accepts configuration and customized for user needs and both root classes create a parser or reader that is bound to
InputStream.JSON parser will be used in reader and in this case instantiation is an application task. In general text format support is not a goal for the client so no dedicated method for creating readers for non-ClickHouse formats. This solved problem with customization as parser is instantiated by user.
As both libraries create IO stream bound entities to work with JSON it will be convenient to provide sort of a factory. This class will become an abstraction that used to create a wrapper for JSON parsing library. Another abstraction is
JsonParserinterface that is used by reader to iterate thru rows.JDBC
JDBC Driver is often use when minimal custom code is expected and it is the place where we have to provide selection between JSON processing libraries. This should be implemented by providing class name of
JsonParserFactoryimplementation. It solves problem of instantiation. Besides user may specify own implementation class name if customization is required. Instantiation will be performed at connection creation phase.JDBC driver will create JSON reader if
JsonParserFactoryis defined.Checklist
Delete items not relevant to your PR:
Note
Medium Risk
Large new public API and query-time server setting injection; behavior is opt-in for quoting and binary readers remain the default path, with extensive tests mitigating regression risk.
Overview
Introduces JSONEachRow support in client-v2 via a pluggable text-format stack, while refactoring format readers so binary and text encodings share one API.
Reader model: Typed accessors move to new
ClickHouseFormatReader;ClickHouseBinaryFormatReaderandClickHouseTextFormatReaderspecialize binary vs text.JSONEachRowFormatReaderstreams rows through aJsonParser/JsonParserFactory, with bundled Jackson and Gson factories (Jackson/Gson deps areprovidedso the core client does not hard-depend on either library).JSONEachRow behavior: The reader infers
TableSchemafrom the first row (SchemaUtils+ public, unmodifiableClickHouseDataType.DATA_TYPE_TO_CLASS), exposes the same typed getters as binary readers where applicable, and defers parse errors so already-buffered valid rows are not dropped on a bad following line.Query integration: New
json_disable_number_quotingconfig; when set forJSONEachRowqueries,Client#queryapplies ClickHouseoutput_format_json_quote_*server settings so large integers, floats, and decimals can be emitted as JSON numbers. Explicit server settings are otherwise left alone.Coverage includes broad unit/integration tests for the reader, parsers, schema inference, and the opt-in quoting behavior.
Reviewed by Cursor Bugbot for commit 9f4287b. Bugbot is set up for automated code reviews on this repo. Configure here.