Establishing a Connection
To connect with the provider's data modeling defaults, set the URI property to the JSON resource location and authenticate to the data source. The following sections provide connection and authentication guides for the available data sources. See Connecting to JSON Data Sources to control the data representation.
Connecting to JSON Data Sources
Below is an overview of the authentication properties corresponding to each data source:
Service provider | URI formats | InitiateOAuth | OAuthClientId | OAuthClientSecret | OAuthAccessToken | OAuthAccessTokenSecret | User | Password | AuthScheme | AzureAccount | AzureAccessKey | AWSAccessKey | AWSSecretKey | Region |
Local | localPath
file://localPath |
Connecting to Local Files
Set the URI to a JSON file.
Below is an example connection string:
URI=C:\folder1\file.json; |
Connecting to HTTP JSON Streams
Set the URI to the HTTP or HTTPS URL of the JSON resource you want to access as a table. Set AuthScheme to use the following authentication types. The provider also supports OAuth authentication; see Using OAuth for a guide through the process.
-
HTTP
To use HTTP Basic or Digest, set the User and Password. Set CustomHeaders if you need access to the request headers. Set CustomUrlParams to modify the URL query string.
-
Windows (NTLM)
Set the Windows User and Password to connect and set AuthScheme to "NTLM".
-
Kerberos and Kerberos Delegation
To authenticate with Kerberos, set the User and Password and set AuthScheme to NEGOTIATE. To use Kerberos Delegation, set AuthScheme to KERBEROSDELEGATION.
For example:
URI=http://www.host1.com/streamname1;AuthScheme=BASIC;User=admin;Password=admin |
Connecting to Amazon S3
Set the URI to a JSON document in a bucket. Additionally, set the following properties to authenticate:
- AWSAccessKey: Set this to an Amazon Web Services Access Key (a username).
- AWSSecretKey: Set this to an Amazon Web Services Secret Key.
For example:
URI=s3://bucket1/folder1/file.json; AWSAccessKey=token1; AWSSecretKey=secret1; Region=OHIO; |
Optionally, specify Region.
Connect to Azure Blob Storage
Set the URI to the name of your container and the name of the blob. Additionally, set the following properties to authenticate:
- AzureAccount: Set this to the account associated with the Azure blob.
- AzureAccessKey: Set the to the access key associated with the Azure blob.
For example:
URI=azureblob://mycontainer/myblob; AzureAccount=myAccount; AzureAccessKey=myKey; |
Connecting to Google Drive
Set the URI to the path to a JSON file. To authenticate to Google APIs, use the OAuth authentication standard. See Connecting to Google Drive for an authentication guide. You can connect with a user account or a service account. In the user account flow, you do not need to set any connection properties for your user credentials, as shown in the connection string below:
URI=gdrive://folder1/file.json;InitiateOAuth=GETANDREFRESH; |
Connecting to Box
Set the URI to the path to a JSON file. To authenticate to Box, use the OAuth authentication standard. See Connecting to Box for an authentication guide. You can authenticate with a user account or a service account. In the user account flow, you do not need to set any connection properties for your user credentials, as shown in the connection string below:
URI=box://folder1/file.json; InitiateOAuth=GETANDREFRESH; |
Connecting to Dropbox
Set the URI to the path to a JSON file. To authenticate to Dropbox, use the OAuth authentication standard. See Connecting to Dropbox for an authentication guide. You can authenticate with a user account or a service account. In the user account flow, you do not need to set any connection properties for your user credentials, as shown in the connection string below:
URI=dropbox://folder1/file.json; InitiateOAuth=GETANDREFRESH; OAuthClientId=oauthclientid1; OAuthClientSecret=oauthcliensecret1; CallbackUrl=http://localhost:12345; |
Connecting to FTP
Set the URI to the address of the server followed by the path to a JSON file. To authenticate, set User and Password.
URI=ftps://localhost:990/folder1/file.json; User=user1; Password=password1; |
Securing JSON Connection
By default, the provider attempts to negotiate SSL/TLS by checking the server's certificate against the system's trusted certificate store. To specify another certificate, see the SSLServerCert property for the available formats to do so.
Configuring Automatic Schema Discovery
By default, the provider models JSON as tables through Automatic Schema Discovery. The available connection properties give you high-level control over the detected schemas. The following sections outline the provider's defaults and link to ways to further customize.
- FlattenArrays: Set this property to the number of the elements in a primitive array that you want to return as column values. You can also use this property with FlattenObjects to extract the properties of objects nested in arrays -- arrays are returned as aggregates by default.
-
FlattenObjects: By default, this is true; that is, the properties of objects and nested objects are returned as columns. When you set FlattenArrays, objects nested in the specified array elements are also flattened and returned as columns.
Fine-Tuning Data Access
You can use the following properties to gain greater control over how the provider parses the JSON data into rows. You can also customize the schemas detected based on the connection string.
- RowScanDepth: This property determines the number of rows that will be scanned to detect column data types when generating table metadata.
- JSONValue: Explicitly specify the paths to nested object arrays, instead of detecting them during the row scan.
-
GenerateSchemaFiles: This property enables you to persist table metadata in static schema files that are easy to customize, to persist your changes to column data types, for example. Set this property to "OnStart" to generate schema files for all tables in your database at connection. Or, set this property to "OnUse" to generate schemas as you execute SELECT queries to tables. The resulting schemas are based on the connection properties you use to configure Automatic Schema Discovery.
To use the resulting schema files, set the Location property to the folder containing the schemas. See Customizing Schemas for more information.
Parsing and Merging Multiple Files Into A Single Table
The provider supports reading and parsing multiple files within a single directory and merging the data into a single result set.
To enable this feature, the URI property can be set to a directory with a file mask (e.g. C:\MyDataFiles\*.json) or a directory (e.g. C:\MyDataFiles). When a directory is specified as the URI, IncludeFiles will be used to identify the file types to include.
This functionality is also available on cloud based service providers such as Google Drive (e.g. gdrive://remotepath/*.json), Amazon S3 (e.g. s3://remotepath/*.json), FTP (ftp://server:port/remotepath/*.json), etc.
When setting URI to a directory, be sure to include an ending forward slash (e.g. s3://remotepath/) to denote that its a directory.
To retrieve all files (including those without a file extension), you can use a file mask of '*' (e.g. s3://remotepath/*) or set IncludeFiles to include a '*' entry. To include just files without an extension, you can set $rpIncludeFiles; to include a 'NOEXT' entry.
When parsing a batch of files, the files are put into a queue. Each file will be retrieved and parsed in a streaming fashion with each row pushed as it is parsed. Therefore files will never be stored in memory or stored in a temporary location on disk, thus limiting the amount of memory usage required.
Metadata Discovery
When reading multiple files, the provider will use the first file identified to discovery metadata. A single file is used for metadata discovery for performance reasons. In cases where the first identified file contains partial JSON data (as compared to the other files in the directory), columns/data for the other files may not be returned. This is due to the provider not being able to properly identify all the columns available in the selected files from the single file used for metadata discovery.
To work around these cases, a "MetadataDiscoveryURI" option can be set via the Other property (e.g. MetadataDiscoveryURI=file:///C:\MyDataFiles\main.json). When a MetadataDiscoveryURI is specified, the provider will use the specified URI to discover tables and columns. Once the metadata has been discovered, the URI value will be used to retrieve and parse the JSON data.
Error Handling
When parsing multiple files, there may be cases where a file is not able to be parsed (such as invalid JSON, a network connectivity issue, etc.). In such cases, the provider will not throw an exception message but rather will log the error in a temporary table called ErrorInfo#TEMP. This is done so that failures don't occur in the middle of reading a large batch of files and having to start over. Instead the temporary table (ErrorInfo#TEMP) can be queried after the initial query has finished. This will allow you to identify if there were any files that failed to parse, thus allowing you to retry the failed requests and merging them with your primary result set.
To retrieve the error list, you can issue the following query: SELECT * FROM ErrorInfo#TEMP. This table contains two columns: URI and Description. URI contains the URI for the single file that failed (which can be set via the URI property to retry). Description contains the description as to why the file failed to parse. In the case that no errors occurred, an empty result set will be returned.
Navigating Subdirectories
The provider supports navigating subdirectories when parsing local files. This functionality is exposed by using the '*' wildcard character in the directory name of the URI value.
For example, suppose you have a 'Data' directory that contains folders with unique names (such as a date value) each of which contain similar JSON data files. You can read all these files into a single table by setting URI to 'file:///C:\Data\*\*.json' or you can set it to the directory without a file mask 'file:///C:\Data\*'.
Partial directory matching is also supported. For example, to retrieve all JSON files within folders starting with '2018' the following URI value can be used: file:///C:\Data\2018*\*.json.
Note: Using wildcards in directory names is not applicable when connecting to cloud resources.
Customizing the SSL Configuration
By default, the provider attempts to negotiate SSL/TLS by checking the server's certificate against the system's trusted certificate store. To specify another certificate, see the SSLServerCert property for the available formats to do so.
Connecting Through a Firewall or Proxy
HTTP Proxies
To connect through the Windows system proxy, you do not need to set any additional connection properties. To connect to other proxies, set ProxyAutoDetect to false.
In addition, to authenticate to an HTTP proxy, set ProxyAuthScheme, ProxyUser, and ProxyPassword, in addition to ProxyServer and ProxyPort.
Other Proxies
Set the following properties:
- To use a proxy-based firewall, set FirewallType, FirewallServer, and FirewallPort.
- To tunnel the connection, set FirewallType to TUNNEL.
- To authenticate, specify FirewallUser and FirewallPassword.
- To authenticate to a SOCKS proxy, additionally set FirewallType to SOCKS5.
Troubleshooting the Connection
To show provider activity from query execution to network traffic, use Logfile and Verbosity. The examples of common connection errors below show how to use these properties to get more context. Contact the support team for help tracing the source of an error or circumventing a performance issue.
- Authentication errors: Typically, recording a Logfile at Verbosity 4 is necessary to get full details on an authentication error.
- Queries time out: A server that takes too long to respond will exceed the provider's client-side timeout. Often, setting the Timeout property to a higher value will avoid a connection error. Another option is to disable the timeout by setting the property to 0. Setting Verbosity to 2 will show where the time is being spent.
- The certificate presented by the server cannot be validated: This error indicates that the provider cannot validate the server's certificate through the chain of trust. If you are using a self-signed certificate, there is only one certificate in the chain.
To resolve this error, you must verify yourself that the certificate can be trusted and specify to the provider that you trust the certificate. One way you can specify that you trust a certificate is to add the certificate to the trusted system store; another is to set SSLServerCert.
Using OAuth
This section shows how to use the provider to authenticate to any data source that supports OAuth.
OAuth requires the authenticating user to interact with JSON using the browser. The provider facilitates this in various ways as described in the following sections.
Before following the procedures below, you need to register an OAuth app with the service to obtain the OAuthClientId and OAuthClientSecret.
Authenticate to JSON from a Desktop Application
After setting the following connection properties, you are ready to connect:
- OAuthVersion: Set this to the OAuth Version, either 1.0 or 2.0.
- OAuthRequestTokenURL: Required for OAuth 1.0. In OAuth 1.0 this is the URL where the app makes a request for the request token.
- OAuthAuthorizationURL: Required for OAuth 1.0 and 2.0. This is the URL where the user logs into the service and grants permissions to the application. In OAuth 1.0 if permissions are granted the request token is authorized.
- OAuthAccessTokenURL: Required for OAuth 1.0 and 2.0. This is the URL where the request for the access token is made. In OAuth 1.0 the authorized request token is exchanged for the access token.
- OAuthRefreshTokenURL: Required for OAuth 2.0. In OAuth 2.0 this is the URL where the refresh token is exchanged for a new access token when the old one expires. Note that for your data source this may be the same as the access token URL.
- OAuthClientId: Set this to the client Id in your app settings. This is also called the consumer key.
- OAuthClientSecret: Set this to the client secret in your app settings. This is also called the consumer secret.
- CallbackURL: Set this to http://localhost:33333. If you specified a redirect URL in your app settings, this must match.
- InitiateOAuth: Set this to GETANDREFRESH. You can use InitiateOAuth to avoid repeating the OAuth exchange and manually setting the access token in the connection string.
- Extracts the access token from the callback URL and authenticates requests.
- Refreshes the access token when it expires.
- Saves OAuth values in OAuthSettingsLocation to be persisted across connections.
Authenticate to JSON from a Web Application
When connecting via a Web application, or if the provider is not authorized to open a browser window, use the provided stored procedures to get and manage the OAuth token values.
Note: You can extend the stored procedure schemas to set defaults for the OAuth URLs or other connection string properties.
Set Up the OAuth Flow
Provide the OAuth URLs to authenticate in the Web flow.
- OAuthRequestTokenURL: Required for OAuth 1.0. In OAuth 1.0 this is the URL where the app makes a request for the request token.
- OAuthAuthorizationURL: Required for OAuth 1.0 and 2.0. This is the URL where the user logs into the service and grants permissions to the application. In OAuth 1.0 if permissions are granted the request token is authorized.
- OAuthAccessTokenURL: Required for OAuth 1.0 and 2.0. This is the URL where the request for the access token is made. In OAuth 1.0 the authorized request token is exchanged for the access token.
- OAuthRefreshTokenURL: Required for OAuth 2.0. In OAuth 2.0 this is the URL where the refresh token is exchanged for a new access token when the old one expires. Note that for your data source this may be the same as the access token URL.
Get an Access Token
In addition to the OAuth URLs, set the following additional connection properties to obtain the OAuthAccessToken:
- OAuthClientId: Set this to the client Id in your app settings. This is also called the consumer key.
- OAuthClientSecret: Set this to the client secret in your app settings. This is also called the consumer secret.
- OAuthVersion: Set this to the OAuth version, either 1.0 or 2.0.
You can then call stored procedures to complete the OAuth exchange:
- Call the GetOAuthAuthorizationURL stored procedure. Set the AuthMode input to WEB and set the CallbackURL input to the Redirect URI you specified in your app settings. The stored procedure returns the URL to the OAuth endpoint.
- Log in and authorize the application. You are redirected back to the callback URL.
-
Call the GetOAuthAccessToken stored procedure. Set the AuthMode input to WEB.
In OAuth 1.0, set the Verifier input to the "oauth_verifier" parameter. Extract the verifier code from the callback URL. Additionally, set the AuthToken and AuthSecret to the values returned by GetOAuthAccessToken.
In OAuth 2.0, set the Verifier input to the "code" parameter in the query string of the callback URL.
Connect to Data and Refresh the Token
The OAuthAccessToken returned by GetOAuthAccessToken has a limited lifetime. To automatically refresh the token, set the following on the first data connection. Alternatively, use the RefreshOAuthAccessToken stored procedure to manually refresh the token.
OAuth Endpoints
- OAuthRequestTokenURL
- OAuthAuthorizationURL
- OAuthAccessTokenURL
- OAuthRefreshTokenURL
OAuth Tokens and Keys
- OAuthClientId
- OAuthClientSecret
- OAuthRefreshToken
- OAuthAccessToken
Initiate OAuth
- OAuthVersion: Set this to 1.0 or 2.0.
- InitiateOAuth: Set this to REFRESH. You can use InitiateOAuth to avoid repeating the OAuth exchange and manually setting the OAuthAccessToken.
- OAuthSettingsLocation: Set this to the path where the provider will save the OAuth values, to be persisted across connections.
On subsequent data connections, set the following:
- InitiateOAuth
- OAuthSettingsLocation
- OAuthRequestTokenURL
- OAuthAuthorizationURL
- OAuthAccessTokenURL
- OAuthRefreshTokenURL