ClickHouse Connect Controller API | ClickHouse Documents (2023)

Observation:Passing keyword arguments is recommended for most API methods due to the number of possible arguments, many of which are optional.

client initialization

Oclickhouse_connect.driver.clientThe class provides the main interface between a Python application and the ClickHouse database server. Use theclickhouse_connect.get_clientfunction to get a Customer instance, which accepts the following arguments:

connection arguments

ParameterTypeStandardDescription
InterfacecallehttpDeve ser http o https.
hostcalleNoneThe host name or IP address of the ClickHouse server. If it is not defined,local hostIt will be used.
portaAnd t8123 o 8443The ClickHouse HTTP or HTTPS port. If not defined, default is 8123 or 8443 ifsafe=TRUEoInterface=https.
UsernamecalleNoneThe ClickHouse username. If not defined, thestandardThe ClickHouse user will be used.
passwordcalle<empty string>the password forUsername.
databasecalleNoneThe default database for the connection. If not set, ClickHouse Connect will use the default database forUsername.
safeboolFALSEUse https/TLS. This overrides the inferred values ​​of the interface or port arguments.
dsncalleNoneA string in standard DSN (Data Source Name) format. Other connection values ​​(such as host or user) will be extracted from this string if not defined otherwise.
compressbool o cadenaTRUEEnable compression for ClickHouse HTTP inserts and query results. To seeAdditional options (compression)
query_limitAnd t0 (unlimited)Maximum number of rows to return for anyconsultationanswer. Set to zero to return unlimited rows. Note that large query limits can cause out-of-memory exceptions if results are not passed, since all results are loaded into memory at once.
query_retriesAnd t2Maximum number of attempts for aconsultationorder. Only "repeatable" HTTP responses will be repeated.domainoinsertThe controller does not automatically retry requests to avoid unintentional duplicate requests.
connect_timeoutAnd t10HTTP connection timeout in seconds.
send_receive_timeoutAnd t300Send/receive timeout for the HTTP connection in seconds.
Customer namecalleNoneclient_name added to the HTTP header of the user agent. Configure it to track customer queries in the ClickHouse.query_log system.
send_progressboolTRUEDeprecated as of v0.5.9, does nothing.
pool_mgrobject<Default Group Admin>Ourllib3PoolManager library to use. For advanced use cases that require multiple connection pools for different hosts.
servidor proxy HTTPcalleNoneHTTP proxy address (equivalent to setting the HTTP_PROXY environment variable).
https_proxycalleNoneHTTPS proxy address (equivalent to setting the HTTPS_PROXY environment variable).

Argumentos HTTPS/TLS

ParameterTypeStandardDescription
to checkboolTRUEValidate the ClickHouse server's TLS/SSL certificate (hostname, expiration, etc.) if using HTTPS/TLS.
as_certcalleNoneSeto check=TRUE, the file path to the root of the certificate authority to validate the ClickHouse server certificate, in .pem format. Ignored if the check is False. This is not required if the ClickHouse server certificate is a global trusted root verified by the operating system.
client_certcalleNoneFile path to a TLS client certificate in .pem format (for mutual TLS authentication). The file must contain a complete certificate chain, including intermediate certificates.
client_cert_keycalleNoneFile path for the private key of the client certificate. Required if the private key is not included in the key file of the client certificate.
server_host_namecalleNoneThe ClickHouse server hostname identified by the CN or SNI of your TLS certificate. Set this to avoid SSL errors when connecting through a proxy or tunnel with a different hostname

configuration argument

Finally, thesettingsargument in favorget_customerit is used to pass additional ClickHouse configurations to the server for each client request. Please note that in most cases, users withjust reading=1Access cannot change settings sent with a query, so ClickHouse Connect will discard these settings in the final request and log a warning.

ContextDescription
buffer sizeBuffer size (in bytes) used by ClickHouse Server before writing to the HTTP channel.
session idA unique session ID to associate related queries on the server. Required for temporary tables.
compressWhether the ClickHouse server should compress the POST response data. This setting should only be used for "raw" queries.
decompressIf the data sent to the ClickHouse server must be decompressed. This setting should only be used for "raw" inserts.
quota_keyThe quota key associated with these requests. See the ClickHouse server documentation on quotas.
session_checkIt is used to check the status of the session.
expired sessionThe number of seconds of inactivity before the idle timeout identified by the session ID expires and is no longer considered valid. The preset value is 60 seconds.
wait_end_of_queryStores the entire response on the ClickHouse server. This configuration is required to return summary information. It is set automatically whensend_progress=TRUE.

For other ClickHouse configurations that can be sent with each query, seethe ClickHouse documentation.

Customer Creation Examples

  • Without any parameters, a ClickHouse Connect client will connect to the default HTTP port onlocal hostwith default username and no password:
matterclickhouse_connect

client=clickhouse_connect.get_customer()
client.server_version
For[2]: '22.10.1.98'
  • Connection to an external secure ClickHouse server (https)
matterclickhouse_connect

client=clickhouse_connect.get_customer(host='play.clickhouse.com',safe=TRUE,porta=443,of the user='play',password='house of clicks')
client.domain('Select time zone()')
For[2]: 'Etc/UTC'
  • Connection with a session ID and other custom connection parameters and ClickHouse settings.
matterclickhouse_connect

client=clickhouse_connect.get_customer(host='play.clickhouse.com',
of the user='play',
password='house of clicks',
porta=443,
session id='example_session_1',
connect_timeout=15,
database='github',
settings={'distribuido_ddl_task_timeout':300)
client.database
For[2]: 'github'
(Video) [Kubernetes Tutorial] How to Install The clickhouse-operator | ClickHouse on Kubernetes

Common method arguments

Multiple client methods use one or both of the common methodsparametersmisettingsarguments These keyword arguments are described below.

Parameter argument

Cliente ClickHouse Connectconsultation*midomainmethods accept an optionalparameterskeyword argument used to bind Python expressions to a ClickHouse value expression. There are two types of binding available.

server side binding

Click House Supportserver side bindingfor most query values, where the bound value is sent separately from the query as an HTTP query parameter. ClickHouse Connect will add the appropriate query parameters if it detects a binding expression of the form {<name>:<data type>}. For server-side binding, theparametersThe argument must be a Python dictionary.

  • Server-side binding with python dictionary, datetime value, and string value
matterDate and Time

my meeting=Date and Time.Date and Time(2022, 10, 01, 15, 20, 5)

parameters= {'low hill': 'my table',v1': my meeting, 'v2': "a string with single quotes""}
client.consultation('SELECT * FROM {table:Identifier} WHERE data >= {v1:DateTime} AND string ILIKE {v2:String}',parameters=parameters)

# Generate the following query on the server
# SELECT * FROM my_table WHERE date >= '2022-10-01 15:20:05' AND string ILIKE 'uma string com aspas simples\''

Client side connection

ClickHouse Connect also supports client-side parameter binding, which can allow for more flexibility in model SQL query generation. For the client-side binding, theparametersThe argument must be a dictionary or a string. Client side link uses pythonprintf stylestring format for parameter substitution.

Note that, unlike server-side binding, client-side binding does not work for database identifiers, such as database, table, or column names, since Python-style formatting does not you can distinguish between the different types of strings and they must have a different format ( backticks or double quotes for database identifiers, single quotes for data values).

  • Example with python dictionary, datetime value, and string escape
matterDate and Time

my meeting=Date and Time.Date and Time(2022, 10, 01, 15, 20, 5)

parameters= {'v1':my meeting, 'v2': "a string with single quotes"}
client.consultation('SELECT * FROM some_table WHERE data >= %(v1)s AND string ILIKE %(v2)s',parameters=parameters)

# Generate the following query:
# SELECT * FROM some_table WHERE data >= '2022-10-01 15:20:05' AND string ILIKE 'uma string com aspas simples\''
  • Example with Python Sequence (Tuple), Float64 and IPv4Address
matterIP adress

parameters= (35200,44,IP adress.IPv4 address(0x443d04fe))
client.consultation('SELECT * FROM some_table WHERE metric >= %s AND ip_address = %s',parameters=parameters)

# Generate the following query:
# SELECT * FROM some_table WHERE metric >= 35200.44 AND ip_address = '68.61.4.254''
(Video) [Kubernetes Tutorial] Set Up Replicas For Your ClickHouse Clusters | ClickHouse on Kubernetes

configuration argument

All SQL Client ClickHouse main database server accepts an optionsettingskeyword argument used to pass the ClickHouse serverUser settingsfor the included SQL statement. EITHERsettingsargument must be a dictionary. Each element must be a ClickHouse configuration name and its associated value. Note that the values ​​will be converted to strings when sent to the server as query parameters.

As with client-level settings, ClickHouse Connect will clear any settings that the server marks asjust reading=1, with an associated log message. Settings that only apply to queries through the ClickHouse HTTP interface are always valid. These settings are described in theget_customer API.

Example of using the ClickHouse configuration:

settings= {'merge_tree_min_rows_for_concurrent_read': 65535,
'session id': 'sesion_1234',
'use_skip_indexes': FALSE}
client.consultation("SELECT event_type, sum(timeout) FROM event_errors WHERE event_time > '2022-08-01'",settings=settings)

ClientdomainMethod

use theclient.commandmethod for submitting SQL queries to ClickHouse Server that normally return no data or return a simple single value instead of an entire data set. This method takes the following parameters:

ParameterTypeStandardDescription
cmdcalleMandatoryA ClickHouse SQL statement that returns a single value or a single row of values.
parametersdictate or iterableNoneverParameter description.
datafour bytesNoneOptional data to be included with the command as the POST body.
settingssayingNoneverconfiguration description.
use_databaseboolTRUEUse the customer database (specified when creating the customer). False means that the command will use the ClickHouse Server default database for the logged in user.
  • domaincan be used for DDL statements
client.domain('CREATE TABLE test_command (col_1 String, col_2 DateTime) Mecanismo MergeTree ORDER BY tuple()')
client.domain('SHOW CREATE TABLE test_command')
For[6]: 'CREATE TABLE default.test_command\\n(\\n `col_1` String,\\n `col_2` DateTime\\n)\\nENGINE = MergeTree\\nORDER BY tuple()\\nSETTINGS index_granularity = 8192'
  • domaincan also be used for simple queries that only return a single row
result=client.domain('SELECT count() FROM system.tables')
result
For[7]: 110
(Video) [Kubernetes Tutorial] How to Set Up Persistent Storage for Your ClickHouse Cluster

ClientconsultationMethod

Oconsult.the.customerThe method is the primary way to retrieve a single "batch" data set from ClickHouse Server. It uses the NativeClickHouse format over HTTP to stream large data sets (up to about a million rows) efficiently. This method takes the following parameters.

ParameterTypeStandardDescription
consultationcalleMandatoryLa consulta ClickHouse SQL SELECT o DESCRIBE.
parametersdictate or iterableNoneverParameter description.
settingssayingNoneverconfiguration description.
query_formatssayingNoneSpecify data type format for result values. See Advanced use (reading formats)
column_formatssayingNoneData type format per column. See Advanced use (reading formats)
codingcalleNoneEncoding used to encode ClickHouse string columns into Python strings. The Python default isUTF-8if it is not defined.
use_noneboolTRUEusing pythonNonetype for null values ​​of ClickHouse. If False, use a default data type (such as 0) for ClickHouse nulls. Note: The default is False for numpy/Pandas for performance reasons.
column_orientedboolFALSEReturns results as a column string instead of a line string. Useful for transforming Python data into other column-oriented data formats.
query_tzcalleNoneA time zone name from the zoneinfo database. This time zone will be applied to all datetime or Pandas Timestamp objects returned by the query.
column_tzssayingNoneA dictionary from the column name to the time zone name. Asquery_tz, but allows you to specify different time zones for different columns.
use_na_valuesboolTRUEUse missing Pandas types, such as pandas.NA and pandas.NaT, for ClickHouse NULLs. only relevant to thequery_dfmiquery_df_streammethods.
contextquery contextNoneA reusable QueryContext object can be used to wrap the arguments of the above method. To seeAdvanced Queries (QueryContexts)

The QueryResult object

a basisconsultationThe method returns a QueryResult object with the following public properties:

  • result_rows-- An array of the returned data in the form of a Rowstring, where each row element is a sequence of column values.
  • result_columns-- An array of the returned data in the form of a Sequence of Columns, where each element in the column is a sequence of the row values ​​for that column
  • column_names-- A tuple of strings representing the names of the columns in theresult_set
  • column_types-- A tuple of ClickHouseType instances representing the ClickHouse data type for each column in theresult_columns
  • query_id-- The ClickHouse query_id (useful for examining the query in thesystem.query_loglow hill)
  • summary-- Any data returned by theX-ClickHouse-ResumenHTTP response header
  • first element-- A convenient property to retrieve the first line of the response as a dictionary (keys are column names)
  • first line-- A convenience property to return the first row of the result
  • column_block_stream-- A query result generator in column-oriented format. This property must not be referenced directly (see below).
  • row_block_flow-- A query output generator in line-oriented format. This property must not be referenced directly (see below).
  • row_flow-- A query result generator that produces a single row per call. This property must not be referenced directly (see below).

O*_flowThe properties return a Python context that can be used as an iterator for the returned data. They can only be accessed indirectly using the Client.*_flowmethods. In a future release, the QueryResult object returned by the main clientconsultationthe method will have consumed the stream and will contain all the populated contentresult_setto provide a clear separation between full "batch" results retrieved via the clientconsultationmethod and transmission of retrieved results via theClientquery_*_flowmethods.

Full details of the streaming query results (using StreamContext objects) are described inAdvanced queries (streaming queries).

Note: The streaming behavior from v0.5.0-v0.5.3 that uses the QueryResult object as a Python context has been deprecated in v0.5.4 and will be removed in a future release. The QueryResult methodsstream_column_blocks,stream_row_blocks,mistream_rowsthey should not be used and are included for backwards compatibility only.

Specialized Customer Inquiry Methods

There are three specialized versions of the mainconsultationmethod:

  • query_np-- This version returns a Numpy Array instead of a ClickHouse Connect QueryResult.
  • query_df-- This version returns a Pandas dataframe instead of a ClickHouse Connect query result.
  • query_arrow-- This version returns a PyArrow table. Use ClickHouseSetaformat directly, then accept only three arguments in common with the mainquery method:consultation,parameters, misettings. Furthermore, there is an additional argumentuse_stringswhich determines whether the arrow table will represent ClickHouseString types as strings (if true) or bytes (if false).

Client Stream Query Methods

ClickHouse Connect Client provides several methods to retrieve data as a stream (implemented as a Python generator):

  • query_column_block_stream-- Return query data in blocks as a sequence of columns using the native Python object
  • query_column_rows_stream-- Returns the query data as a block of rows using a native Python object
  • query_rows_stream-- Returns the query data as a linear string using the native Python object
  • query_np_stream-- Returns each ClickHouse query data block as a Numpy array
  • query_df_stream-- Return each ClickHouse block of query data as a Pandas dataframe

Each of these methods returns acontext flowobject that must be opened through acomto start consuming the stream. SeeAdvanced queries (streaming queries)for details and examples.

ClientinsertMethod

For the common use case of inserting multiple records into ClickHouse, there is theclient.insertmethod. It takes the following parameters:

(Video) Cloud-Native Data Warehouses: A Gentle Intro to Running ClickHouse on Kubernetes | K8s Webinar

ParameterTypeStandardDescription
low hillcalleMandatoryThe ClickHouse table to insert. The full name of the table (including the database) is allowed.
dataSequence of SequencesMandatoryThe array of data to insert, either a RowString, each of which is a sequence of column values, or a ColumnString, each of which is a sequence of row values.
column_namesstreet chain, a street'*'A list of column_names for the data array. If '*' is used, ClickHouse Connect will perform a "prequery" to retrieve all column names from the table.
databasecalle''The target database for the insert. If not specified, the customer database is assumed.
column_typesCadena ClickHouseTypeNoneA list of ClickHouseType instances. If neither column_types nor column_type_names are specified, ClickHouse Connect will perform a "prequery" to retrieve all column types from the table.
column_type_namesClickHouse type name stringNoneA list of ClickHouse data type names. If neither column_types nor column_type_names are specified, ClickHouse Connect will perform a "prequery" to retrieve all column types from the table.
column_orientedboolFALSEIf true, thedatathe argument is assumed to be a Sequence of columns (and a "pivot" is not required to insert the data). in another waydatais interpreted as a linear string.
settingssayingNoneverconfiguration description.
insert_contextinsert contextNoneA reusable InsertContext object can be used to wrap the arguments of the above method. To seeAdvanced Inserts (InsertContexts)

This method does not return a value. An exception will be thrown if the insert fails for any reason.

There are two specialized versions of the mainconsultationmethod:

  • insert_df-- Instead of Python Sequence of Sequencesdataargument, the second parameter of this method requires ad.f.argument that must be a Pandas Dataframe instance. ClickHouse Connect automatically renders the data frame as a column-oriented data source, so thecolumn_orientedthe parameter is not required or is not available.
  • insert_arrow-- Instead of a Python sequence of sequencesdataargument, this method requires aarrow_table.ClickHouseConnect passes the Arrow table unchanged to the ClickHouse server for processing, so only thedatabasemisettingsArguments are available in addition tolow hillmiarrow_table.

Observation:A Numpy array is a valid sequence of sequences and can be used asdatamain argumentinsertmethod, so a specialized method is not required.

file inserts

Oclickhouse_connect.controller.toolsincludes theinsert_filemethod that allows you to insert data directly from the file system into an existing ClickHouse table. The analysis is delegated to the ClickHouse server.insert_fileaccepts the following parameters:

ParameterTypeStandardDescription
clientClientMandatoryOdriver.clientused to insert
low hillcalleMandatoryThe ClickHouse table to insert. The full name of the table (including the database) is allowed.
File routecalleMandatoryThe native file system path to the data file
fmtcalleCSV, CSVComNomesThe ClickHouse input format of the file. CSVWithNames is assumed ifcolumn_namesnot provided
column_namesstring sequenceNoneA list of column_names in the data file. Not required for formats that include column names
databasecalleNoneTable database. Ignored if the table is fully qualified. If not specified, the insert will use the client's database
settingssayingNoneverconfiguration description.

For files with inconsistent data or date/time values ​​in an unusual format, settings that apply to data imports (such asinput_format_allow_errors_nummiinput_format_allow_errors_num) are recognized by this method.

matterclickhouse_connect
ofclickhouse_connect.conductor.toolsmatterinsert_file

client=clickhouse_connect.get_customer()
insert_file(client, 'example_table', 'mis_datos.csv',
settings={'input_format_allow_errors_ratio': .2,
'input_format_allow_errors_num': 5})

Raw API

For use cases that do not require transformation between ClickHouse data and native or third-party data types and structures, the ClickHouse Connect client provides two methods for direct use of the ClickHouse connection.

Clientraw_queryMethod

Oclient.raw_queryThe method allows direct use of ClickHouse's HTTP query interface via the client connection. The return value is a raw valuebytesobject. It provides a convenient wrapper with parameter binding, error handling, retries, and configuration management using a minimal interface:

ParameterTypeStandardDescription
consultationcalleMandatoryAny valid ClickHouse query
parametersdictate or iterableNoneverParameter description.
settingssayingNoneverconfiguration description.
fmtcalleNoneClickHouse output format for the resulting bytes. (ClickHouse uses TSV if not specified)
use_databaseboolTRUEUse the client-designated clickhouse-connect database for the query context

It is the caller's responsibility to handle the result.bytesobject. Please note that theCustomer.query_arrowit's just a wrapper around this method using ClickHouseSetaOutput format.

(Video) ClickHouse on Kubernetes!

Clientraw_insertMethod

OCliente.raw_insertThe method allows direct insertions ofbytesobjects orbytesobject generators using the client connection. Since it does not process the payload of the insert, it has high performance. The method provides options for specifying settings and entering the format:

ParameterTypeStandardDescription
low hillcalleMandatoryThe name of the qualified or simple table in the database
column_namessequence[string]NoneColumn names for the insert block. Required if thefmtparameter does not include names
insert_blockstr, bytes, Generador[bytes], BinaryIOMandatoryData to insert. The strings will be encoded with the client's encoding.
settingssayingNoneverconfiguration description.
fmtcalleNoneClickHouse input formatinsert_blockbytes (ClickHouse uses TSV if not specified)

It is the responsibility of the caller that theinsert_blockis in the specified format. ClickHouse Connect uses these rawinserts to load PyArrow files and tables, delegating the analysis to the ClickHouse server.

Videos

1. Own your ClickHouse data with Altinity.Cloud Anywhere | ClickHouse Webinar
(Altinity)
2. ClickHouse Meetup: Redpanda and ClickHouse | Altinity
(Altinity)
3. OSA Con 2022: Building an Analytic Extension to MySQL with ClickHouse and Open Source | Conference
(Altinity)
4. [Kubernetes Tutorial] Start Your First Cluster Using Kubernetes Operator | ClickHouse on Kubernetes
(Altinity)
5. SQLAlchemy Turns Python Objects Into Database Entries
(NeuralNine)
6. Minimal API in .NET 6 Using Dapper and SQL - Minimal API Project Part 2
(IAmTimCorey)
Top Articles
Latest Posts
Article information

Author: Kelle Weber

Last Updated: 03/04/2023

Views: 6405

Rating: 4.2 / 5 (53 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Kelle Weber

Birthday: 2000-08-05

Address: 6796 Juan Square, Markfort, MN 58988

Phone: +8215934114615

Job: Hospitality Director

Hobby: tabletop games, Foreign language learning, Leather crafting, Horseback riding, Swimming, Knapping, Handball

Introduction: My name is Kelle Weber, I am a magnificent, enchanting, fair, joyous, light, determined, joyous person who loves writing and wants to share my knowledge and understanding with you.