Authenticating to DynamoDB
To authorize Amazon DynamoDB requests, provide the credentials for an administrator account or for an IAM user with custom permissions: Set AccessKey to the access key Id. Set SecretKey to the secret access key.
Note: Though you can connect as the AWS account administrator, it is recommended to use IAM user credentials to access AWS services.
Obtaining the Access Key
To obtain the credentials for an IAM user, follow the steps below:
- Sign into the IAM console.
- In the navigation pane, select Users.
- To create or manage the access keys for a user, select the user and then select the Security Credentials tab.
To obtain the credentials for your AWS root account, follow the steps below:
- Sign into the AWS Management console with the credentials for your root account.
- Select your account name or number and select My Security Credentials in the menu that is displayed.
- Click Continue to Security Credentials and expand the Access Keys section to manage or create root account access keys.
Authenticating from an EC2 Instance
If you are using the Lyftron Provider for Amazon DynamoDB from an EC2 Instance and have an IAM Role assigned to the instance, you can use the IAM Role to authenticate. To do so, set UseEC2Roles to true and leave AccessKey and SecretKey empty. The Lyftron Provider for Amazon DynamoDB will automatically obtain your IAM Role credentials and authenticate with them.
Authenticating as an AWS Role
In many situations it may be preferable to use an IAM role for authentication instead of the direct security credentials of an AWS root user. An AWS role may be used instead by specifying the RoleARN. This will cause the Lyftron Provider for Amazon DynamoDB to attempt to retrieve credentials for the specified role. If you are connecting to AWS (instead of already being connected such as on an EC2 instance), you must additionally specify the AccessKey and SecretKey of an IAM user to assume the role for. Roles may not be used when specifying the AccessKey and SecretKey of an AWS root user.
Authenticating with MFA
For users and roles that require Multi-factor Authentication, specify the MFASerialNumber and MFAToken connection properties. This will cause theLyftron Provider for Amazon DynamoDB to submit the MFA credentials in a request to retrieve temporary authentication credentials. Note that the duration of the temporary credentials may be controlled via the TemporaryTokenDuration (default 3600 seconds).
Connecting to DynamoDB
In addition to the AccessKey and SecretKey properties, you can optionally set Domain and Region. Set Region to the region where your Amazon DynamoDB data is hosted. Set Domain if you want to use a domain name you have associated with AWS.
Querying The Data
DynamoDB has two distinct operations to read data from tables: query and scan. The Lyftron Provider for Amazon DynamoDB will automatically attempt to figure out which type to use based on the WHERE clause of the SQL statement.
Query vs. Scan
In DynamoDB, a query is used when some of the data can be filtered before results are returned. This is done by the use of partition keys and sort keys that are defined on the table to perform the filter. It is typically much faster than a scan.
A scan is performed when anything other than a partition key or a sort key is used to filter the data. In the case of the scan, data is only filtered after it is returned. This is much slower since it will cause DynamoDB to return everything and simply exclude nonmatching results from the response. In the worst cases it will return full pages of data that are simply empty due to no matching results.
Secondary Indexes
Secondary indexes in DynamoDB behave like tables in themselves that are only accessible off of a given table. A secondary index behaves like a view either with predefined columns or simply with all columns being accessible. The main difference is that different partition and sort keys may be selected for them. There is no restriction on which keys to use - they may be the same as the partition and sort keys for the table itself, the same as ones from a different secondary index, or completely unique to that index.
The provider by default will attempt to determine if a secondary index should be used to perform a query operation. This is done by analyzing the WHERE clause of the SQL statement. If it is determined that a query should be used and that a specific index should be used for that query, then the secondary index will automatically be selected. This behavior can be disabled by setting AutoDetectIndex to false.
A secondary index can be explicitly specified by using the SecondaryIndex pseudo column. This will override AutoDetectIndex and will be used for both query and scan operations. For example:
SELECT * FROM Table WHERE SecondaryIndex = 'ColumnName' AND ... |
Query Examples
A query operation will be submitted to DynamoDB if the WHERE clause includes a single PartitionKey with an '=' operator and optionally up to one SortKey using any DynamoDB supported operator. For instance:
SELECT * FROM Table WHERE PartitionKey = 'x' |
SELECT * FROM Table WHERE PartitionKey = 'x' AND SortKey > 'y' AND SortKey < 'z' |
Additionally, if AutoDetectIndex is set to true, the provider will attempt to automatically determine an index based on the specific keys used. For instance:
SELECT * FROM Table WHERE SecondaryIndexPartitionKey = 'x' |
The preceding query would result in a query against the SecondaryIndex, provided that it was the only one that used the specified PartitionKey.
A secondary index may be explicitly specified via the SecondaryIndex pseudo column. This will cause the specified index to be used regardless of if a Query or Scan is submitted:
SELECT * FROM Table WHERE SecondaryIndex = 'IndexName' |
Parallel Scans
For better performance when scanning results, parallel scans may be used. Parallel scans are a way to use multiple threads to execute the same SELECT statement. Each thread is responsible for retrieving an equal share of the data. Because threads retrieve data independently, it can exponentially improve performance. However, performance gains using this method will be limited to CPU performance, number of cores, bandwidth constraints, and read units for the table. A machine with more cores and more bandwidth may benefit from using additional threads whereas a machine with fewer cores and less bandwith available will see no benefit.
Amazon DynamoDB offers no equivalent for parallel scans when querying results. Queries can only be executed in a single thread.
By default, parallel scans are enabled. They can be disabled or changed by updating the ThreadCount connection property.
Please be aware that using parallel scans will consume your allocated read units for a given table at a much faster rate than using a single thread. If you are getting throttle exceptions, it is recommended to decrease the ThreadCount or set it to 1.
Accessing NoSQL Tables
The provider implements Automatic Schema Discovery that is highly configurable. The following sections outline the provider's defaults and link to ways to further customize.
Flattening Nested JSON
By default, the provider projects columns over the properties of objects. Arrays are returned as JSON strings, by default. You can use the following properties to access array elements, including objects nested in arrays.
- FlattenArrays: Set this property to the number of array elements that you want to return as column values. You can also use this property with FlattenObjects to extract the properties of objects nested in arrays.
-
FlattenObjects: By default, this is true; that is, the properties of objects and nested objects are returned as columns. When you set FlattenArrays, objects nested in the specified array elements are also flattened and returned as columns.
Other mechanisms for accessing nested objects are detailed in NoSQL Database.
Inferring the Data Type
You can use the following properties to configure automatic data type detection, which is enabled by default.
- TypeDetectionScheme: You can use this property to enable or disable automatic type detection based on the value specified in RowScanDepth.
- RowScanDepth: This property determines the number of rows that will be scanned to determine column data types.
Fine Tuning Data Access
You can use the following properties to gain greater control over Amazon DynamoDB API features and the strategies the provider uses to surface them:
-
GenerateSchemaFiles: This property enables you to persist table metadata in static schema files that are easy to customize, to change column data types, for example. You can set this property to "OnStart" to generate schema files for all tables in your database at connection. Or, you can generate schemas as you execute SELECT queries to tables. The resulting schemas are based on the connection properties you use to configure Automatic Schema Discovery.
- NumberColumnMode: This property enables you to specify an SQL data type for the Amazon DynamoDB "number" type, such as double, decimal, or string.
-
InsertMode: Inserts in Amazon DynamoDB have several key differences from SQL inserts. An insert in Amazon DynamoDB is an upsert; that is, a row that already exists is replaced by the incoming row. Also, the primary key must be specified.
By default, the provider will perform an upsert based on the primary key. You can use this connection property to throw an error instead of overwriting existing records.
To use the resulting schema files, set the Location property to the folder containing the schemas.
-
UseSimpleNames: Amazon DynamoDB supports attribute names with special characters that many database-oriented tools do not support.
In addition, Amazon DynamoDB table names can include dots and dashes -- the provider interprets dots within table names as hierarchy separators that enable you to drill down to nested fields, similar to XPath.
You can use this property to replace any nonalphanumeric character with an underscore.
- SeparatorCharacter: You can use this property to more easily access nested fields when Querying Documents and Lists; specify the hierarchy separator with this property. By default, this character is the '.' (dot) character.
Fine Tuning Performance
You can use the following connection properties to avoid "maximum throughput exceeded" errors.
Using Paging Effectively
You can use the Pagesize property to optimize use of your provisioned throughput, based on the size of your items and Amazon DynamoDB's 1MB page size. Set this property to the number of items to return.
Generally, a smaller page size reduces spikes in throughput that cause throttling. A smaller page size also inserts pauses between requests. This interval evens out the distribution of requests and allows more requests to be successful by avoiding throttling.
You can also take advantage of secondary indexes by specifying them in the WHERE clause.
You can specify partition keys and sort keys in the WHERE clause to improve the performance of query operations. Based on these inputs, the provider automatically detects the secondary index for a query operation.
See Querying The Data for more information on scan operations vs. query operations.
Setting a Retry Interval
You can set the following properties to retry queries instead of throwing a temporary error such as "maximum throughput exceeded":
- RetryWaitTime: The minimum number of milliseconds the provider will wait to retry a request.
- MaximumRequestRetries: The maximum number of times to retry a request.
Connecting Through a Firewall or Proxy
To connect through the Windows system proxy, set only the Amazon DynamoDB authentication properties and, if necessary, Region and Domain. To connect to other proxies, set ProxyAutoDetect to false and in addition set the following.
To authenticate to an HTTP proxy, set ProxyAuthScheme, ProxyUser, and ProxyPassword, in addition to ProxyServer and ProxyPort.
To connect to other proxies, set FirewallType, FirewallServer, and FirewallPort. To tunnel the connection, set FirewallType to TUNNEL. To authenticate to a SOCKS proxy, set FirewallType to SOCKS5. Additionally, specify FirewallUser and FirewallPassword.
Troubleshooting the Connection
To show provider activity from query execution to HTTP calls, use Logfile and Verbosity. The examples of common connection errors below show how to use these properties to get more context. Contact the support team for help tracing the source of an error or circumventing a performance issue.
- Authentication errors: Typically, recording a Logfile at Verbosity 4 is necessary to get full details on an authentication error.
- Queries time out: A server that takes too long to respond will exceed the provider's client-side timeout. Often, setting the Timeout property to a higher value will avoid a connection error. Another option is to disable the timeout by setting the property to 0. Setting Verbosity to 2 will show where the time is being spent.
NoSQL Database
Amazon DynamoDB is a schemaless database that provides high performance, availability, and scalability. These features are not necessarily incompatible with a standards-compliant query language like SQL-92. In this section we will show various schemes that the provider offers to bridge the gap with relational SQL and a document database.
The provider models the schemaless Amazon DynamoDB tables into relational tables and translates SQL queries into Amazon DynamoDB queries to get the requested data. The provider offers two ways, Automatic Schema Discovery and Custom Schema Definitions, to model Amazon DynamoDB tables as relational tables.
The Automatic Schema Discovery scheme automatically finds the data types in a Amazon DynamoDB table by scanning a configured number of rows of the table. You can use RowScanDepth, FlattenArrays, and FlattenObjects to control the relational representation of the tables in Amazon DynamoDB.
Optionally, you can use Custom Schema Definitions to project your chosen relational structure on top of a Amazon DynamoDB table. This allows you to define your chosen column names, their data types, and the location of their values in the Amazon DynamoDB table.
Automatic Schema Discovery
The provider automatically infers a relational schema by inspecting a series of Amazon DynamoDB items in a table. You can use the RowScanDepth property to define the number of items the provider will scan to do so. The columns identified during the discovery process depend on the FlattenArrays and FlattenObjects properties.
If FlattenObjects is set, all nested objects will be flattened into a series of columns. For example, consider the following item:
{ id: 12, name: "Lohia Manufacturers Inc." , address: {street: "Main Street" , city: "Chapel Hill" , state: "NC" }, offices: [ "Chapel Hill" , "London" , "New York" ], annual_revenue: 35600000 } |
Column Name | Data Type | Example Value |
id | Integer | 12 |
name | String | Lohia Manufacturers Inc. |
address_street | String | Main Street |
address_city | String | Chapel Hill |
address_state | String | NC |
offices | String | ["Chapel Hill", "London", "New York"] |
annual_revenue | Double | 35,600,000 |
If FlattenObjects is not set, then the address_string, address_city, and address_state columns will not be broken apart. The address column of type string will instead represent the entire object. Its value would be {street: "Main Street", city: "Chapel Hill", state: "NC"}. See JSON Functions for more details on working with JSON aggregates.
The FlattenArrays property can be used to flatten array values into columns of their own. This is only recommended for arrays that are expected to be short, for example the coordinates below:
"coord" : [ -73.856077, 40.848447 ] |
Column Name | Data Type | Example Value |
coord_0 | Float | -73.856077 |
coord_1 | Float | 40.848447 |
It is best to leave other unbounded arrays as they are and piece out the data for them as needed using JSON Functions.
Vertical Flattening
It is possible to retrieve an array of objects as if it were a separate table. Take the following JSON structure from the restaurants collection for example:
{ "restaurantid" : "30075445" , "address" : { "building" : "1007" , "coord" : [-73.856077, 40.848447], "street" : "Morris Park Ave" , "zipcode" : "10462" }, "borough" : "Bronx" , "cuisine" : "Bakery" , "grades" : [{ "date" : 1393804800000, "grade" : "A" , "score" : 2 }, { "date" : 1378857600000, "grade" : "A" , "score" : 6 }, { "date" : 1358985600000, "grade" : "A" , "score" : 10 }, { "date" : 1322006400000, "grade" : "A" , "score" : 9 }, { "date" : 1299715200000, "grade" : "B" , "score" : 14 }], "name" : "Morris Park Bake Shop" , } |
SELECT * FROM [restaurants.grades] |
date | grade | score | restaurantid | _index |
1393804800000 | A | 2 | 30075445 | 1 |
1378857600000 | A | 6 | 30075445 | 2 |
1358985600000 | A | 10 | 30075445 | 3 |
You may also want to include information from the base restaurants table. You can do this with a join.
SELECT restaurants.name, [restaurants.grades].* FROM [restaurants] JOIN [restaurants.grades] ON restaurants.restaurantid = [restaurants.grades].restaurantid WHERE restaurants.restaurantid = 30075445 AND [restaurants.grades].restaurantid = 30075445 |
name | restaurantid | date | grade | score | _index |
Morris Park Bake Shop | 30075445 | 1393804800000 | A | 2 | 1 |
Morris Park Bake Shop | 30075445 | 1378857600000 | A | 6 | 2 |
Morris Park Bake Shop | 30075445 | 1358985600000 | A | 10 | 3 |
Morris Park Bake Shop | 30075445 | 1322006400000 | A | 9 | 4 |
Morris Park Bake Shop | 30075445 | 1299715200000 | B | 14 | 5 |
JSON Functions
The provider can return JSON structures as column values. The provider enables you to use standard SQL functions to work with these JSON structures. The examples in this section use the following array:
[ { "grade" : "A" , "score" : 2 }, { "grade" : "A" , "score" : 6 }, { "grade" : "A" , "score" : 10 }, { "grade" : "A" , "score" : 9 }, { "grade" : "B" , "score" : 14 } ] |
JSON_EXTRACT
The JSON_EXTRACT function can extract individual values from a JSON object. The following query returns the values shown below based on the JSON path passed as the second argument to the function:
SELECT Name, JSON_EXTRACT(grades, '[0].grade' ) AS Grade, JSON_EXTRACT(grades, '[0].score' ) AS Score FROM Students; |
Column Name | Example Value |
Grade | A |
Score | 2 |
JSON_COUNT
The JSON_COUNT function returns the number of elements in a JSON array within a JSON object. The following query returns the number of elements specified by the JSON path passed as the second argument to the function:
SELECT Name, JSON_COUNT(grades, '[x]' ) AS NumberOfGrades FROM Students; |
Column Name | Example Value |
NumberOfGrades | 5 |
JSON_SUM
The JSON_SUM function returns the sum of the numeric values of a JSON array within a JSON object. The following query returns the total of the values specified by the JSON path passed as the second argument to the function:
SELECT Name, JSON_SUM(score, '[x].score' ) AS TotalScore FROM Students; |
Column Name | Example Value |
TotalScore | 41 |
JSON_MIN
The JSON_MIN function returns the lowest numeric value of a JSON array within a JSON object. The following query returns the minimum value specified by the JSON path passed as the second argument to the function:
SELECT Name, JSON_MIN(score, '[x].score' ) AS LowestScore FROM Students; |
Column Name | Example Value |
LowestScore | 2 |
JSON_MAX
The JSON_MAX function returns the highest numeric value of a JSON array within a JSON object. The following query returns the maximum value specified by the JSON path passed as the second argument to the function:
SELECT Name, JSON_MAX(score, '[x].score' ) AS HighestScore FROM Students; |
Column Name | Example Value |
HighestScore | 14 |
DynamoDB Queries
Because Amazon DynamoDB is a NoSQL data source, queries need to be handled a bit differently than standard relational databases.
Value-Sensitive Queries
The lack of a required data type for a given column means that you could store different types of data in a single column. For instance, one row could have a String called EmailAddresses and another could have a StringSet also called EmailAddresses. For these and other kinds of cases, the provider largely determines what data type to use based on the values in the query.
For instance, say you have an Items table where the PartNumber could store either a String or a Number. To get back a part with the PartNumber of the number value 12345, you would issue the following query:
SELECT Name, Location, Quantity, PartNumber FROM Items WHERE PartNumber = 12345 |
Alternatively, the PartNumber could have been stored as the string "12345". To get back a part with the PartNumber of the literal string 12345, issue the following query:
SELECT Name, Location, Quantity, PartNumber FROM Items WHERE PartNumber = '12345' |
Detected Column Data Type
If a value is not obvious based purely on the detected data type, the provider compares it to the autodetected column. For instance, if you want to insert a column called Coordinates into the Location table, your insert would look like:
INSERT INTO Locations (Address, Coordinates) VALUES ( '123 Fake Street' , '[40.7127, 74.0059]' ) |
If a Coordinates column was not autodetected when scanning the Locations table, the data type of the inserted value is used.
In this case, we could still resolve that the insert is a NumberSet, but it will cost a bit more overhead to do this.
Querying Documents and Lists
Amazon DynamoDB documents and lists are supported with the Lyftron Provider for Amazon DynamoDB . You can access documents and lists directly at the root level or use the '.' character as a hierarchy divider to drill down to documents and lists.
Reporting Values in Documents and Lists
When data types are autodetected, they are reported down to the lowest level that can be reliably detected. For instance, a document called Customer with a child called Address and a child on Address called Street would be represented by the column Customer.Address.Street.
However, this process does not apply to Lists since a list could have any number of entries. Once a List or a Set is detected, additional values are not reported as being available in the table schema.
Getting Back Unreported Values
If there are attributes that frequently do not have a value and thus are not autodetected, these can still be retrieved by specifying the correct path to them. For instance, to get the Special attribute from the Customer document:
SELECT [Customer.Address.Street], [Customer.Special] FROM MyTable |
SELECT [MyList.0], [MyList.1.Email], [MyList.1.Age] FROM MyTable |
Inserting Documents and Lists
Inserts in Amazon DynamoDB require that the full object is specified during insert. Insert a document or list at the root. Pass in the full JSON aggregate. For instance:
INSERT INTO MyTable (PrimaryKey, EmailAddresses, Address, MyList) VALUES ( 'uniquekey' , '["user@email.com", "user2@email2.com"]' , '{"Street":"123 Fake Street", "City":"Chapel Hill", "Zip":"27713"}' , '[{"S":"somestr"},{"NS":[1,2]},{"N":4}]' ) |
Updating Documents and Lists
Updates are supported using the same syntax that is available during selects. Documents and Lists can be specified using the '.' character to specify hierarchy. For instance:
UPDATE MyTable SET [EmailAddress.0]= 'user@email.com' , [EmailAddress.1]= 'user2@email2.com' , [Address.Street]= '123 Fake Street' , [Address.City]= 'Chapel Hill' , [Address.Zip]= '27713' , [MyList.0]= 'somestr' , [MyList.1]= '[1,2]' , [MyList.2]=4 WHERE PrimaryKey= 'uniquekey' |
Custom Schema Definitions
In addition to Automatic Schema Discovery the provider also allows you to statically define the schema for your Amazon DynamoDB table. Let's consider a schema for the restaurants data set.
Below is an example item from the table:
{ "address" :{ "building" : "461" , "coord" :[ -74.138492, 40.631136 ], "street" : "Port Richmond Ave" , "zipcode" : "10302" }, "borough" : "Staten Island" , "cuisine" : "Other" , "grades" :[ ], "name" : "Indian Oven" , "restaurant_id" : "50018994" } |
Defining a Custom Schema
You can define a custom schema to extract out nested properties as their own columns. The following schema uses the other:bsonpath property to define where the data for a particular column should be retrieved from. Using this model you can flatten arbitrary levels of hierarchy.The collection attribute specifies the collection to parse. The collection attribute gives you the flexibility to use multiple schemas for the same collection. If collection is not specified, the filename determines the collection that is parsed.
In Custom Schema Example, you will find the complete schema that contains the example above.
< api:script xmlns:api = "http://apiscript.com/ns?v1" > < api:info title = "StaticRestaurants" description = "Custom Schema for the DynamoDB restaurants data set." > <!-- Column definitions --> < attr name = "id" xs:type = "integer" other:internaltype = "N" other:path = "restaurant_id" iskey = "true" other:keytype = "HASH" /> < attr name = "name" xs:type = "string" other:internaltype = "S" other:path = "name" /> < attr name = "borough" xs:type = "string" other:internaltype = "S" other:path = "borough" /> < attr name = "cuisine" xs:type = "string" other:internaltype = "S" other:path = "cuisine" /> < attr name = "building" xs:type = "string" other:internaltype = "S" other:path = "address.building" /> < attr name = "street" xs:type = "string" other:internaltype = "S" other:path = "address.street" /> < attr name = "latitude" xs:type = "double" other:internaltype = "N" other:path = "address.coord.0" /> < attr name = "longitude" xs:type = "double" other:internaltype = "N" other:path = "address.coord.1" /> < input name = "rows@next" desc = "Internal attribute used for paging through data." /> </ api:info > < api:set attr = "collection" value = "restaurants" /> </ api:script > |
Custom Schema Example
In this section is a complete schema. The info section enables a relational view of a Amazon DynamoDB table. For more details, see Custom Schema Definitions. The table below allows the SELECT, INSERT, UPDATE, and DELETE commands as implemented in the GET, POST, MERGE, and DELETE sections of the schema below.
Use the tablename attribute to specify the name of the table you want to parse. You can use the tablename attribute to define multiple schemas for the same table.
If tablename is not specified, the filename determines the table that is parsed.
Use the pathseparator attribute to specify the character or characters to use to separate the fields in path to a given column. This will only be used when parsing the other:path input. If not specified, the default is the '.' character.
Note: Amazon DynamoDB is case sensitive. Your tablename and specified paths must match the case of how your fields appear in Amazon DynamoDB.
Copy the rows@next input as-is into your schema. The operations, such as dynamodbadoSelect, are internal implementations and can also be copied as is.
< api:script xmlns:api = "http://apiscript.com/ns?v1" > < api:info title = "StaticRestaurants" description = "Custom Schema for the DynamoDB restaurants data set." > <!-- Column definitions --> < attr name = "id" xs:type = "integer" other:internaltype = "N" other:path = "restaurant_id" iskey = "true" other:keytype = "HASH" /> < attr name = "name" xs:type = "string" other:internaltype = "S" other:path = "name" /> < attr name = "borough" xs:type = "string" other:internaltype = "S" other:path = "borough" /> < attr name = "cuisine" xs:type = "string" other:internaltype = "S" other:path = "cuisine" /> < attr name = "building" xs:type = "string" other:internaltype = "S" other:path = "address.building" /> < attr name = "street" xs:type = "string" other:internaltype = "S" other:path = "address.street" /> < attr name = "latitude" xs:type = "double" other:internaltype = "N" other:path = "address.coord.0" /> < attr name = "longitude" xs:type = "double" other:internaltype = "N" other:path = "address.coord.1" /> < input name = "rows@next" desc = "Internal attribute used for paging through data." /> </ api:info > < api:set attr = "tablename" value = "restaurants" /> < api:set attr = "pathseparator" value = "." /> < api:script method = "GET" > < api:call op = "dynamodbadoSelect" > < api:push /> </ api:call > </ api:script > < api:script method = "POST" > < api:call op = "dynamodbadoAdd" > < api:push /> </ api:call > </ api:script > < api:script method = "MERGE" > < api:call op = "dynamodbadoUpdate" > < api:push /> </ api:call > </ api:script > < api:script method = "DELETE" > < api:call op = "dynamodbadoDelete" > < api:push /> </ api:call > </ api:script > </ api:script > |