Establishing a Connection
Specify the following to establish a connection with Apache Hive:
- Server: Set this to the host name or IP address of the server hosting HiveServer2.
- Port: Set this to the port for the connection to the HiveServer2 instance.
Authenticating to Apache Hive
To authenticate to Apache Hive, set the following:
- TransportMode: The transport mode to use to communicate with the Hive server. Accepted entries are BINARY and HTTP. BINARY is selected by default.
- AuthScheme: The authentication scheme used. Accepted entries are PLAIN, LDAP, NOSASL, and KERBEROS. PLAIN is selected by default.
Securing Apache Hive Connections
To enable TLS/SSL in the provider, set UseSSL to True.
Using Kerberos
This section shows how to use the provider to authenticate to Apache Hive using Kerberos.
Authenticating with Kerberos
To authenticate to Apache Hive using Kerberos, set the following properties:
- AuthScheme: Set this to KERBEROS
- KerberosKDC: Set this to the host name or IP Address of your Kerberos KDC machine.
- KerberosSPN: Set this to the service and host of the Hive Kerberos Principal. This will be the value prior to the '@' symbol (for instance, hive/_HOST) of the hive.metastore.kerberos.principal value (for instance, hive/_HOST@EXAMPLE.COM) of the hive-site.xml file. If '_HOST' is specified, the driver will attempt to identify the host using a reverse DNS lookup. If a reverse DNS lookup fails, it may be required to explicitly specify the host.
Retrieve the Kerberos Ticket
You can use one of the following three options to retrieve the required Kerberos ticket.
MIT Kerberos Credential Cache File
This option enables you to use the MIT Kerberos Ticket Manager or kinit command to get tickets. Note that you won't need to set the User or Password connection properties with this option.
- Ensure that you have an environment variable created called KRB5CCNAME.
- Set the KRB5CCNAME environment variable to a path pointing to your credential cache file (for instance, C:\krb_cache\krb5cc_0 or /tmp/krb5cc_0). This file will be created when generating your ticket with MIT Kerberos Ticket Manager.
- To obtain a ticket, open the MIT Kerberos Ticket Manager application, click Get Ticket, enter your principal name and password, then click OK. If successful, ticket information will appear in Kerberos Ticket Manager and will now be stored in the credential cache file.
- Now that the credential cache file has been created, the provider will use the cache file to obtain the kerberos ticket to connect to Apache Hive.
As an alternative to setting the KRB5CCNAME environment variable, you can directly set the file path using the KerberosTicketCache property. When set, the provider will use the specified cache file to obtain the kerberos ticket to connect to Apache Hive.
Keytab File
If the KRB5CCNAME environment variable has not been set, you can retrieve a Kerberos ticket using a Keytab File. To do this, set the User property to the desired username and set the KerberosKeytabFile property to a file path pointing to the keytab file associated with the user.
User and Password
If both the KRB5CCNAME environment variable and the KerberosKeytabFile property have not been set, you can retrieve a ticket using a User and Password combination. To to do this, set the User and Password properties to the user/password combo that you use to authenticate with Apache Hive.
Cross-Realm Authentication
More complex Kerberos environments may require cross-realm authentication where multiple realms and KDC servers are used (e.g. where one realm/KDC is used for user authentication and another realm/KDC used for obtaining the service ticket).
In such an environment, the KerberosRealm and KerberosKDC properties can be set to the values required for user authentication. The KerberosServiceRealm and KerberosServiceKDC properties can be set to the values required to obtain the service ticket.
Configuring the SQL Dialect
The provider enables standards-based access to Apache Hive through SQL-92 or the Apache Hive syntax. Set QueryPassthrough to true to bypass the SQL engine of the provider and execute HiveQL queries directly to Apache Hive.
Connecting via HTTP
To connect to HTTP, set TransportMode to 'HTTP' and set the Port and HTTPPath properties to the values required by your Apache Hive instance.
Connecting Through a Firewall or Proxy
Connecting through a proxy is available when connecting to Hive over HTTP and thus the provider must first be configured to connect via HTTP.
To connect through the Windows system proxy, you do not need to set any additional connection properties aside from configuring the connection to Apache Hive to be made over HTTP. To connect to other proxies, set ProxyAutoDetect to false and in addition set the following.
To authenticate to an HTTP proxy, set ProxyAuthScheme, ProxyUser, and ProxyPassword, in addition to ProxyServer and ProxyPort.
To connect to other proxies, set FirewallType, FirewallServer, and FirewallPort. To tunnel the connection, set FirewallType to TUNNEL. To authenticate to a SOCKS proxy, set FirewallType to SOCKS5. Additionally, specify FirewallUser and FirewallPassword.
Troubleshooting the Connection
To show provider activity from query execution to network traffic, use Logfile and Verbosity. The examples of common connection errors below show how to use these properties to get more context. Contact the support team for help tracing the source of an error or circumventing a performance issue.
- Connection errors: Ensure that you have properly set TransportMode, Port, and if necessary HTTPPath to the values configured in the Apache Hive configuration file (hive-site.xml).
- Authentication errors: Ensure that you have correctly set AuthScheme along with any corresponding properties, to the values configured in the Apache Hive configuration file (hive-site.xml).
- Queries time out: A server that takes too long to respond will exceed the provider's client-side timeout. Often, setting the Timeout property to a higher value will avoid a connection error. Another option is to disable the timeout by setting the property to 0. Setting Verbosity to 2 will show where the time is being spent.
Configuring the HiveQL Statements
Apache Hive supports multiple ways to perform similar operations. The options below allow you to configure which HiveQL statement is issued to perform an operation.
- UseShowDatabasesQuery: When this property is set to true, the provider will issue a "SHOW DATABASES" query (instead of using the Thrift API) to retrieve the available schemas.
- UseShowTablesQuery: When this property is set to true, the provider will issue a "SHOW TABLES" query (instead of using the Thrift API) to retrieve the available tables.
- UseInsertSelectSyntax: This option is specified via the Other property. When set to true, the provider will perform INSERT queries using the INSERT INTO SELECT syntax: INSERT INTO TABLE [table] SELECT T.* FROM (....) AS T. If set to false (default), the INSERT INTO VALUES syntax will be used: INSERT INTO TABLE [table] VALUES (....). The provider will automatically determine which syntax is supported by your Hive server but setting this option to true will force the INSERT INTO SELECT syntax to be used.