Observation:Passing keyword arguments is recommended for most API methods due to the number of possible arguments, many of which are optional.
client initialization
Oclickhouse_connect.driver.client
The class provides the main interface between a Python application and the ClickHouse database server. Use theclickhouse_connect.get_client
function to get a Customer instance, which accepts the following arguments:
connection arguments
Parameter | Type | Standard | Description |
---|---|---|---|
Interface | calle | http | Deve ser http o https. |
host | calle | None | The host name or IP address of the ClickHouse server. If it is not defined,local host It will be used. |
porta | And t | 8123 o 8443 | The ClickHouse HTTP or HTTPS port. If not defined, default is 8123 or 8443 ifsafe=TRUEoInterface=https. |
Username | calle | None | The ClickHouse username. If not defined, thestandard The ClickHouse user will be used. |
password | calle | <empty string> | the password forUsername. |
database | calle | None | The default database for the connection. If not set, ClickHouse Connect will use the default database forUsername. |
safe | bool | FALSE | Use https/TLS. This overrides the inferred values of the interface or port arguments. |
dsn | calle | None | A string in standard DSN (Data Source Name) format. Other connection values (such as host or user) will be extracted from this string if not defined otherwise. |
compress | bool o cadena | TRUE | Enable compression for ClickHouse HTTP inserts and query results. To seeAdditional options (compression) |
query_limit | And t | 0 (unlimited) | Maximum number of rows to return for anyconsultation answer. Set to zero to return unlimited rows. Note that large query limits can cause out-of-memory exceptions if results are not passed, since all results are loaded into memory at once. |
query_retries | And t | 2 | Maximum number of attempts for aconsultation order. Only "repeatable" HTTP responses will be repeated.domain oinsert The controller does not automatically retry requests to avoid unintentional duplicate requests. |
connect_timeout | And t | 10 | HTTP connection timeout in seconds. |
send_receive_timeout | And t | 300 | Send/receive timeout for the HTTP connection in seconds. |
Customer name | calle | None | client_name added to the HTTP header of the user agent. Configure it to track customer queries in the ClickHouse.query_log system. |
send_progress | bool | TRUE | Deprecated as of v0.5.9, does nothing. |
pool_mgr | object | <Default Group Admin> | Ourllib3 PoolManager library to use. For advanced use cases that require multiple connection pools for different hosts. |
servidor proxy HTTP | calle | None | HTTP proxy address (equivalent to setting the HTTP_PROXY environment variable). |
https_proxy | calle | None | HTTPS proxy address (equivalent to setting the HTTPS_PROXY environment variable). |
Argumentos HTTPS/TLS
Parameter | Type | Standard | Description |
---|---|---|---|
to check | bool | TRUE | Validate the ClickHouse server's TLS/SSL certificate (hostname, expiration, etc.) if using HTTPS/TLS. |
as_cert | calle | None | Seto check=TRUE, the file path to the root of the certificate authority to validate the ClickHouse server certificate, in .pem format. Ignored if the check is False. This is not required if the ClickHouse server certificate is a global trusted root verified by the operating system. |
client_cert | calle | None | File path to a TLS client certificate in .pem format (for mutual TLS authentication). The file must contain a complete certificate chain, including intermediate certificates. |
client_cert_key | calle | None | File path for the private key of the client certificate. Required if the private key is not included in the key file of the client certificate. |
server_host_name | calle | None | The ClickHouse server hostname identified by the CN or SNI of your TLS certificate. Set this to avoid SSL errors when connecting through a proxy or tunnel with a different hostname |
configuration argument
Finally, thesettings
argument in favorget_customer
it is used to pass additional ClickHouse configurations to the server for each client request. Please note that in most cases, users withjust reading=1Access cannot change settings sent with a query, so ClickHouse Connect will discard these settings in the final request and log a warning.
Context | Description |
---|---|
buffer size | Buffer size (in bytes) used by ClickHouse Server before writing to the HTTP channel. |
session id | A unique session ID to associate related queries on the server. Required for temporary tables. |
compress | Whether the ClickHouse server should compress the POST response data. This setting should only be used for "raw" queries. |
decompress | If the data sent to the ClickHouse server must be decompressed. This setting should only be used for "raw" inserts. |
quota_key | The quota key associated with these requests. See the ClickHouse server documentation on quotas. |
session_check | It is used to check the status of the session. |
expired session | The number of seconds of inactivity before the idle timeout identified by the session ID expires and is no longer considered valid. The preset value is 60 seconds. |
wait_end_of_query | Stores the entire response on the ClickHouse server. This configuration is required to return summary information. It is set automatically whensend_progress=TRUE. |
For other ClickHouse configurations that can be sent with each query, seethe ClickHouse documentation.
Customer Creation Examples
- Without any parameters, a ClickHouse Connect client will connect to the default HTTP port on
local host
with default username and no password:
matterclickhouse_connect
client=clickhouse_connect.get_customer()
client.server_version
For[2]: '22.10.1.98'
- Connection to an external secure ClickHouse server (https)
matterclickhouse_connect
client=clickhouse_connect.get_customer(host='play.clickhouse.com',safe=TRUE,porta=443,of the user='play',password='house of clicks')
client.domain('Select time zone()')
For[2]: 'Etc/UTC'
- Connection with a session ID and other custom connection parameters and ClickHouse settings.
matterclickhouse_connect
client=clickhouse_connect.get_customer(host='play.clickhouse.com',
of the user='play',
password='house of clicks',
porta=443,
session id='example_session_1',
connect_timeout=15,
database='github',
settings={'distribuido_ddl_task_timeout':300)
client.database
For[2]: 'github'
Common method arguments
Multiple client methods use one or both of the common methodsparameters
misettings
arguments These keyword arguments are described below.
Parameter argument
Cliente ClickHouse Connectconsultation*
midomain
methods accept an optionalparameters
keyword argument used to bind Python expressions to a ClickHouse value expression. There are two types of binding available.
server side binding
Click House Supportserver side bindingfor most query values, where the bound value is sent separately from the query as an HTTP query parameter. ClickHouse Connect will add the appropriate query parameters if it detects a binding expression of the form {<name>:<data type>}. For server-side binding, theparameters
The argument must be a Python dictionary.
- Server-side binding with python dictionary, datetime value, and string value
matterDate and Time
my meeting=Date and Time.Date and Time(2022, 10, 01, 15, 20, 5)
parameters= {'low hill': 'my table',v1': my meeting, 'v2': "a string with single quotes""}
client.consultation('SELECT * FROM {table:Identifier} WHERE data >= {v1:DateTime} AND string ILIKE {v2:String}',parameters=parameters)
# Generate the following query on the server
# SELECT * FROM my_table WHERE date >= '2022-10-01 15:20:05' AND string ILIKE 'uma string com aspas simples\''
Client side connection
ClickHouse Connect also supports client-side parameter binding, which can allow for more flexibility in model SQL query generation. For the client-side binding, theparameters
The argument must be a dictionary or a string. Client side link uses pythonprintf stylestring format for parameter substitution.
Note that, unlike server-side binding, client-side binding does not work for database identifiers, such as database, table, or column names, since Python-style formatting does not you can distinguish between the different types of strings and they must have a different format ( backticks or double quotes for database identifiers, single quotes for data values).
- Example with python dictionary, datetime value, and string escape
matterDate and Time
my meeting=Date and Time.Date and Time(2022, 10, 01, 15, 20, 5)
parameters= {'v1':my meeting, 'v2': "a string with single quotes"}
client.consultation('SELECT * FROM some_table WHERE data >= %(v1)s AND string ILIKE %(v2)s',parameters=parameters)
# Generate the following query:
# SELECT * FROM some_table WHERE data >= '2022-10-01 15:20:05' AND string ILIKE 'uma string com aspas simples\''
- Example with Python Sequence (Tuple), Float64 and IPv4Address
matterIP adress
parameters= (35200,44,IP adress.IPv4 address(0x443d04fe))
client.consultation('SELECT * FROM some_table WHERE metric >= %s AND ip_address = %s',parameters=parameters)
# Generate the following query:
# SELECT * FROM some_table WHERE metric >= 35200.44 AND ip_address = '68.61.4.254''
configuration argument
All SQL Client ClickHouse main database server accepts an optionsettings
keyword argument used to pass the ClickHouse serverUser settingsfor the included SQL statement. EITHERsettings
argument must be a dictionary. Each element must be a ClickHouse configuration name and its associated value. Note that the values will be converted to strings when sent to the server as query parameters.
As with client-level settings, ClickHouse Connect will clear any settings that the server marks asjust reading=1, with an associated log message. Settings that only apply to queries through the ClickHouse HTTP interface are always valid. These settings are described in theget_customer
API.
Example of using the ClickHouse configuration:
settings= {'merge_tree_min_rows_for_concurrent_read': 65535,
'session id': 'sesion_1234',
'use_skip_indexes': FALSE}
client.consultation("SELECT event_type, sum(timeout) FROM event_errors WHERE event_time > '2022-08-01'",settings=settings)
ClientdomainMethod
use theclient.command
method for submitting SQL queries to ClickHouse Server that normally return no data or return a simple single value instead of an entire data set. This method takes the following parameters:
Parameter | Type | Standard | Description |
---|---|---|---|
cmd | calle | Mandatory | A ClickHouse SQL statement that returns a single value or a single row of values. |
parameters | dictate or iterable | None | verParameter description. |
data | four bytes | None | Optional data to be included with the command as the POST body. |
settings | saying | None | verconfiguration description. |
use_database | bool | TRUE | Use the customer database (specified when creating the customer). False means that the command will use the ClickHouse Server default database for the logged in user. |
- domaincan be used for DDL statements
client.domain('CREATE TABLE test_command (col_1 String, col_2 DateTime) Mecanismo MergeTree ORDER BY tuple()')
client.domain('SHOW CREATE TABLE test_command')
For[6]: 'CREATE TABLE default.test_command\\n(\\n `col_1` String,\\n `col_2` DateTime\\n)\\nENGINE = MergeTree\\nORDER BY tuple()\\nSETTINGS index_granularity = 8192'
- domaincan also be used for simple queries that only return a single row
result=client.domain('SELECT count() FROM system.tables')
result
For[7]: 110
ClientconsultationMethod
Oconsult.the.customer
The method is the primary way to retrieve a single "batch" data set from ClickHouse Server. It uses the NativeClickHouse format over HTTP to stream large data sets (up to about a million rows) efficiently. This method takes the following parameters.
Parameter | Type | Standard | Description |
---|---|---|---|
consultation | calle | Mandatory | La consulta ClickHouse SQL SELECT o DESCRIBE. |
parameters | dictate or iterable | None | verParameter description. |
settings | saying | None | verconfiguration description. |
query_formats | saying | None | Specify data type format for result values. See Advanced use (reading formats) |
column_formats | saying | None | Data type format per column. See Advanced use (reading formats) |
coding | calle | None | Encoding used to encode ClickHouse string columns into Python strings. The Python default isUTF-8 if it is not defined. |
use_none | bool | TRUE | using pythonNonetype for null values of ClickHouse. If False, use a default data type (such as 0) for ClickHouse nulls. Note: The default is False for numpy/Pandas for performance reasons. |
column_oriented | bool | FALSE | Returns results as a column string instead of a line string. Useful for transforming Python data into other column-oriented data formats. |
query_tz | calle | None | A time zone name from the zoneinfo database. This time zone will be applied to all datetime or Pandas Timestamp objects returned by the query. |
column_tzs | saying | None | A dictionary from the column name to the time zone name. Asquery_tz , but allows you to specify different time zones for different columns. |
use_na_values | bool | TRUE | Use missing Pandas types, such as pandas.NA and pandas.NaT, for ClickHouse NULLs. only relevant to thequery_df miquery_df_stream methods. |
context | query context | None | A reusable QueryContext object can be used to wrap the arguments of the above method. To seeAdvanced Queries (QueryContexts) |
The QueryResult object
a basisconsultation
The method returns a QueryResult object with the following public properties:
result_rows
-- An array of the returned data in the form of a Rowstring, where each row element is a sequence of column values.result_columns
-- An array of the returned data in the form of a Sequence of Columns, where each element in the column is a sequence of the row values for that columncolumn_names
-- A tuple of strings representing the names of the columns in theresult_set
column_types
-- A tuple of ClickHouseType instances representing the ClickHouse data type for each column in theresult_columns
query_id
-- The ClickHouse query_id (useful for examining the query in thesystem.query_log
low hill)summary
-- Any data returned by theX-ClickHouse-Resumen
HTTP response headerfirst element
-- A convenient property to retrieve the first line of the response as a dictionary (keys are column names)first line
-- A convenience property to return the first row of the resultcolumn_block_stream
-- A query result generator in column-oriented format. This property must not be referenced directly (see below).row_block_flow
-- A query output generator in line-oriented format. This property must not be referenced directly (see below).row_flow
-- A query result generator that produces a single row per call. This property must not be referenced directly (see below).
O*_flow
The properties return a Python context that can be used as an iterator for the returned data. They can only be accessed indirectly using the Client.*_flow
methods. In a future release, the QueryResult object returned by the main clientconsultation
the method will have consumed the stream and will contain all the populated contentresult_set
to provide a clear separation between full "batch" results retrieved via the clientconsultation
method and transmission of retrieved results via theClientquery_*_flow
methods.
Full details of the streaming query results (using StreamContext objects) are described inAdvanced queries (streaming queries).
Note: The streaming behavior from v0.5.0-v0.5.3 that uses the QueryResult object as a Python context has been deprecated in v0.5.4 and will be removed in a future release. The QueryResult methodsstream_column_blocks
,stream_row_blocks
,mistream_rows
they should not be used and are included for backwards compatibility only.
Specialized Customer Inquiry Methods
There are three specialized versions of the mainconsultation
method:
query_np
-- This version returns a Numpy Array instead of a ClickHouse Connect QueryResult.query_df
-- This version returns a Pandas dataframe instead of a ClickHouse Connect query result.query_arrow
-- This version returns a PyArrow table. Use ClickHouseSeta
format directly, then accept only three arguments in common with the mainquery method
:consultation
,parameters
, misettings
. Furthermore, there is an additional argumentuse_strings
which determines whether the arrow table will represent ClickHouseString types as strings (if true) or bytes (if false).
Client Stream Query Methods
ClickHouse Connect Client provides several methods to retrieve data as a stream (implemented as a Python generator):
query_column_block_stream
-- Return query data in blocks as a sequence of columns using the native Python objectquery_column_rows_stream
-- Returns the query data as a block of rows using a native Python objectquery_rows_stream
-- Returns the query data as a linear string using the native Python objectquery_np_stream
-- Returns each ClickHouse query data block as a Numpy arrayquery_df_stream
-- Return each ClickHouse block of query data as a Pandas dataframe
Each of these methods returns acontext flow
object that must be opened through acom
to start consuming the stream. SeeAdvanced queries (streaming queries)for details and examples.
ClientinsertMethod
For the common use case of inserting multiple records into ClickHouse, there is theclient.insert
method. It takes the following parameters:
Parameter | Type | Standard | Description |
---|---|---|---|
low hill | calle | Mandatory | The ClickHouse table to insert. The full name of the table (including the database) is allowed. |
data | Sequence of Sequences | Mandatory | The array of data to insert, either a RowString, each of which is a sequence of column values, or a ColumnString, each of which is a sequence of row values. |
column_names | street chain, a street | '*' | A list of column_names for the data array. If '*' is used, ClickHouse Connect will perform a "prequery" to retrieve all column names from the table. |
database | calle | '' | The target database for the insert. If not specified, the customer database is assumed. |
column_types | Cadena ClickHouseType | None | A list of ClickHouseType instances. If neither column_types nor column_type_names are specified, ClickHouse Connect will perform a "prequery" to retrieve all column types from the table. |
column_type_names | ClickHouse type name string | None | A list of ClickHouse data type names. If neither column_types nor column_type_names are specified, ClickHouse Connect will perform a "prequery" to retrieve all column types from the table. |
column_oriented | bool | FALSE | If true, thedata the argument is assumed to be a Sequence of columns (and a "pivot" is not required to insert the data). in another waydata is interpreted as a linear string. |
settings | saying | None | verconfiguration description. |
insert_context | insert context | None | A reusable InsertContext object can be used to wrap the arguments of the above method. To seeAdvanced Inserts (InsertContexts) |
This method does not return a value. An exception will be thrown if the insert fails for any reason.
There are two specialized versions of the mainconsultation
method:
insert_df
-- Instead of Python Sequence of Sequencesdata
argument, the second parameter of this method requires ad.f.
argument that must be a Pandas Dataframe instance. ClickHouse Connect automatically renders the data frame as a column-oriented data source, so thecolumn_oriented
the parameter is not required or is not available.insert_arrow
-- Instead of a Python sequence of sequencesdata
argument, this method requires aarrow_table
.ClickHouseConnect passes the Arrow table unchanged to the ClickHouse server for processing, so only thedatabase
misettings
Arguments are available in addition tolow hill
miarrow_table
.
Observation:A Numpy array is a valid sequence of sequences and can be used asdata
main argumentinsert
method, so a specialized method is not required.
file inserts
Oclickhouse_connect.controller.tools
includes theinsert_file
method that allows you to insert data directly from the file system into an existing ClickHouse table. The analysis is delegated to the ClickHouse server.insert_file
accepts the following parameters:
Parameter | Type | Standard | Description |
---|---|---|---|
client | Client | Mandatory | Odriver.client used to insert |
low hill | calle | Mandatory | The ClickHouse table to insert. The full name of the table (including the database) is allowed. |
File route | calle | Mandatory | The native file system path to the data file |
fmt | calle | CSV, CSVComNomes | The ClickHouse input format of the file. CSVWithNames is assumed ifcolumn_names not provided |
column_names | string sequence | None | A list of column_names in the data file. Not required for formats that include column names |
database | calle | None | Table database. Ignored if the table is fully qualified. If not specified, the insert will use the client's database |
settings | saying | None | verconfiguration description. |
For files with inconsistent data or date/time values in an unusual format, settings that apply to data imports (such asinput_format_allow_errors_num
miinput_format_allow_errors_num
) are recognized by this method.
matterclickhouse_connect
ofclickhouse_connect.conductor.toolsmatterinsert_file
client=clickhouse_connect.get_customer()
insert_file(client, 'example_table', 'mis_datos.csv',
settings={'input_format_allow_errors_ratio': .2,
'input_format_allow_errors_num': 5})
Raw API
For use cases that do not require transformation between ClickHouse data and native or third-party data types and structures, the ClickHouse Connect client provides two methods for direct use of the ClickHouse connection.
Clientraw_queryMethod
Oclient.raw_query
The method allows direct use of ClickHouse's HTTP query interface via the client connection. The return value is a raw valuebytes
object. It provides a convenient wrapper with parameter binding, error handling, retries, and configuration management using a minimal interface:
Parameter | Type | Standard | Description |
---|---|---|---|
consultation | calle | Mandatory | Any valid ClickHouse query |
parameters | dictate or iterable | None | verParameter description. |
settings | saying | None | verconfiguration description. |
fmt | calle | None | ClickHouse output format for the resulting bytes. (ClickHouse uses TSV if not specified) |
use_database | bool | TRUE | Use the client-designated clickhouse-connect database for the query context |
It is the caller's responsibility to handle the result.bytes
object. Please note that theCustomer.query_arrow
it's just a wrapper around this method using ClickHouseSeta
Output format.
Clientraw_insertMethod
OCliente.raw_insert
The method allows direct insertions ofbytes
objects orbytes
object generators using the client connection. Since it does not process the payload of the insert, it has high performance. The method provides options for specifying settings and entering the format:
Parameter | Type | Standard | Description |
---|---|---|---|
low hill | calle | Mandatory | The name of the qualified or simple table in the database |
column_names | sequence[string] | None | Column names for the insert block. Required if thefmt parameter does not include names |
insert_block | str, bytes, Generador[bytes], BinaryIO | Mandatory | Data to insert. The strings will be encoded with the client's encoding. |
settings | saying | None | verconfiguration description. |
fmt | calle | None | ClickHouse input formatinsert_block bytes (ClickHouse uses TSV if not specified) |
It is the responsibility of the caller that theinsert_block
is in the specified format. ClickHouse Connect uses these rawinserts to load PyArrow files and tables, delegating the analysis to the ClickHouse server.