Connecting to SparkSQL
Specify the following to establish a connection with SparkSQL:
- Server: Set this to the host name or IP address of the server hosting SparkSQL.
- Port: Set this to the port for the connection to the SparkSQL instance.
Authenticating to SparkSQL
To authenticate to SparkSQL, set the following:
- TransportMode: The transport mode to use to communicate with the SparkSQL server. Accepted entries are BINARY and HTTP. BINARY is selected by default.
- AuthScheme: The authentication scheme used. Accepted entries are PLAIN, LDAP, NOSASL, and KERBEROS. PLAIN is selected by default.
Securing SparkSQL Connections
To enable TLS/SSL in the provider, set UseSSL to True.
Connecting to Databricks
To connect to a Databricks cluster, set the properties as described below. Note: The needed values can be found in your Databricks instance by navigating to 'Clusters', selecting the desired cluster, and selecting the JDBC/ODBC tab under 'Advanced Options'.
- Server: Set to the Server Hostname of your Databricks cluster.
- Port: 443
- TransportMode: HTTP
- HTTPPath: Set to the HTTP Path of your Databricks cluster.
- UseSSL: True
- AuthScheme: PLAIN
- User: Set to 'token'
- Password: Set to your personal access token (value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).
Using Kerberos
This section shows how to use the provider to authenticate to SparkSQL using Kerberos.
Authenticating with Kerberos
To authenticate to SparkSQL using Kerberos, set the following properties:
- AuthScheme: Set this to KERBEROS
- KerberosKDC: Set this to the host name or IP Address of your Kerberos KDC machine.
- KerberosRealm: Set this to the realm of the Hive Kerberos principal. This will be the value after the '@' symbol (for instance, EXAMPLE.COM) of the hive.metastore.kerberos.principal value (for instance, hive/_HOST@EXAMPLE.COM) of the hive-site.xml file.
- KerberosSPN: Set this to the service and host of the Hive Kerberos Principal. This will be the value prior to the '@' symbol (for instance, hive/_HOST) of the hive.metastore.kerberos.principal value (for instance, hive/_HOST@EXAMPLE.COM) of the hive-site.xml file. If '_HOST' is specified, the driver will attempt to identify the host using a reverse DNS lookup. If a reverse DNS lookup fails, it may be required to explicitly specify the host.
Retrieve the Kerberos Ticket
You can use one of the following three options to retrieve the required Kerberos ticket.
MIT Kerberos Credential Cache File
This option enables you to use the MIT Kerberos Ticket Manager to get tickets. Note that you won't need to set the User or Password connection properties with this option.
- Ensure that you have an environment variable created called KRB5CCNAME.
- Set the KRB5CCNAME environment variable to a path pointing to your credential cache file (for instance, C:\krb_cache\krb5cc_0). This file will be created when generating your ticket with MIT Kerberos Ticket Manager.
- To obtain a ticket, open the MIT Kerberos Ticket Manager application, click Get Ticket, enter your principal name and password, then click OK. If successful, ticket information will appear in Kerberos Ticket Manager and will now be stored in the credential cache file.
- Now that the credential cache file has been created, the provider will use the cache file to obtain the kerberos ticket to connect to SparkSQL.
Keytab File
If the KRB5CCNAME environment variable has not been set, you can retrieve a Kerberos ticket using a Keytab File. To do this, set the User property to the desired username and set the KerberosKeytabFile property to a file path pointing to the keytab file associated with the user.
User and Password
If both the KRB5CCNAME environment variable and the KerberosKeytabFile property have not been set, you can retrieve a ticket using a User and Password combination. To to do this, set the User and Password properties to the user/password combo that you use to authenticate with SparkSQL.
Customizing the SSL Configuration
By default, the provider attempts to negotiate SSL/TLS by checking the server's certificate against the system's trusted certificate store. To specify another certificate, see the SSLServerCert property for the available formats to do so.
Connecting Through a Firewall or Proxy
HTTP Proxies
To connect through the Windows system proxy, you do not need to set any additional connection properties. To connect to other proxies, set ProxyAutoDetect to false.
In addition, to authenticate to an HTTP proxy, set ProxyAuthScheme, ProxyUser, and ProxyPassword, in addition to ProxyServer and ProxyPort.
Other Proxies
Set the following properties:
- To use a proxy-based firewall, set FirewallType, FirewallServer, and FirewallPort.
- To tunnel the connection, set FirewallType to TUNNEL.
- To authenticate, specify FirewallUser and FirewallPassword.
- To authenticate to a SOCKS proxy, additionally set FirewallType to SOCKS5.
Troubleshooting the Connection
To show provider activity from query execution to network traffic, use Logfile and Verbosity. The examples of common connection errors below show how to use these properties to get more context. Contact the support team for help tracing the source of an error or circumventing a performance issue.
- Authentication errors: Typically, recording a Logfile at Verbosity 4 is necessary to get full details on an authentication error.
- Queries time out: A server that takes too long to respond will exceed the provider's client-side timeout. Often, setting the Timeout property to a higher value will avoid a connection error. Another option is to disable the timeout by setting the property to 0. Setting Verbosity to 2 will show where the time is being spent.
- The certificate presented by the server cannot be validated: This error indicates that the provider cannot validate the server's certificate through the chain of trust. If you are using a self-signed certificate, there is only one certificate in the chain.
To resolve this error, you must verify yourself that the certificate can be trusted and specify to the provider that you trust the certificate. One way you can specify that you trust a certificate is to add the certificate to the trusted system store; another is to set SSLServerCert.