How to Read an Item From Dynamodb Table


DynamoDB is sometimes considered just a simple fundamental-value store, but nothing could be farther from the truth. DynamoDB tin can handle circuitous admission patterns, from highly-relational information models to time series data or even geospatial data.

In this post, we'll encounter how to model one-to-many relationships in DynamoDB. One-to-many relationships are at the core of near all applications. In DynamoDB, you have a few different options for representing one-to-many relationships.

We'll cover the basics of one-to-many relationships, then we'll review five different strategies for modeling one-to-many relationships in DynamoDB:

  1. Denormalization by using a complex attribute
  2. Denormalization past duplicating data
  3. Composite main key + the Query API action
  4. Secondary index + the Query API action
  5. Composite sort keys with hierarchical data

Permit's get started!

.

Nuts of i-to-many relationships

A one-to-many relationship occurs when a particular object is the owner or source for a number of sub-objects. A few examples include:

  • Workplace: A unmarried office will have many employees working there; a single manager may take many direct reports.
  • E-commerce: A single customer may make multiple orders over time; a unmarried club may be comprised of multiple items.
  • Software-as-a-Service (SaaS) accounts: An organization will purchase a SaaS subscription; multiple users will vest to one organization.

With one-to-many relationships, in that location'south one cadre trouble: how practice I fetch information nearly the parent entity when retrieving one or more than of the related entities?

In a relational database, there'southward essentially one way to do this—using a foreign key in ane table to refer to a tape in another table and using a SQL join at query time to combine the ii tables.

At that place are no joins in DynamoDB. Instead, there are a number of strategies for one-to-many relationships, and the approach you take volition depend on your needs.

In this mail service, nosotros will comprehend five strategies for modeling ane-to-many relationships with DynamoDB:

  • Denormalization by using a complex aspect
  • Denormalization by duplicating data
  • Composite principal key + the Query API action
  • Secondary index + the Query API action
  • Composite sort keys with hierarchical data

We volition cover each strategy in depth beneath—when you would employ it, when you wouldn't use it, and an example. The end of the post includes a summary of the five strategies and when to choose each ane.

Denormalization by using a circuitous attribute

Database normalization is a cardinal component of relational database modeling and ane of the hardest habits to break when moving to DynamoDB.

You can read the basics of normalization elsewhere, but there are a number of areas where denormalization is helpful with DynamoDB.

The beginning manner we'll use denormalization with DynamoDB is by having an attribute that uses a circuitous data blazon, like a list or a map. This violates the first tenet of database normalization: to get into offset normal form, each attribute value must be atomic. Information technology cannot be cleaved down whatever further.

Allow's see this by way of an example. Imagine we have an due east-commerce site where there are Customer entities that represent people that accept created an account on our site. A single Customer can have multiple mailing addresses to which they may ship items. Perhaps I accept ane address for my domicile, another address for my workplace, and a third address for my parents (a relic from the fourth dimension I sent them a belated ceremony present).

In a relational database, you would model this with two tables using a strange key to link the tables together, as follows:

DynamoDB normalization with Customers and Addresses

Detect that each record in the Addresses table includes a CustomerId, which identifies the Customer to which this Address belongs. You tin can use the join performance to follow the arrow to the tape and detect data about the Customer.

DynamoDB works differently. Considering in that location are no joins, nosotros need to discover a different way to assemble data from 2 dissimilar types of entities. In this case, we can add a MailingAddresses attribute on our Client item. This attribute is a map and contains all addresses for the given customer:

DynamoDB denormalization with Customers and Addresses

Because MailingAddresses contains multiple values, it is no longer atomic and thus violates the principles of first normal course.

There are ii factors to consider when deciding whether to handle a one-to-many relationship past denormalizing with a complex aspect:

  • Do you have whatever access patterns based on the values in the complex aspect?

    All data admission in DynamoDB is washed via main keys and secondary indexes. Yous cannot use a complex attribute similar a list or a map in a primary fundamental. Thus, you won't exist able to make queries based on the values in a complex aspect.

    In our case, we don't have any access patterns like "Fetch a Customer by his or her mailing address". All use of the MailingAddress attribute will be in the context of a Customer, such as displaying the saved addresses on the order checkout page. Given these needs, it's fine for us to save them in a complex attribute.

  • Is the amount of data in the complex attribute unbounded?

    A single DynamoDB detail cannot exceed 400KB of information. If the amount of data that is independent in your complex aspect is potentially unbounded, it won't be a good fit for denormalizing and keeping together on a unmarried particular.

    In this example, information technology'south reasonable for our awarding to put limits on the number of mailing addresses a customer can store. A maximum of 20 addresses should satisfy almost all utilise cases and avoid issues with the 400KB limit.

    Simply yous could imagine other places where the i-to-many relationship might exist unbounded. For example, our e-commerce application has a concept of Orders and Gild Items. Considering an Society could take an unbounded number of Club Items (you don't want to tell your customers there's a maximum number of items they tin can order!), it makes sense to split Order Items separately from Orders.

If the reply to either of the questions above is "Yes", then denormalization with a circuitous attribute is not a expert fit to model that one-to-many relationship.

Denormalization by duplicating information

In the strategy above, we denormalized our data by using a complex attribute. This violated the principles of beginning normal form for relational modeling. In this strategy, we'll continue our crusade confronting normalization.

Here, we'll violate the principles of second normal form by duplicating data beyond multiple items.

In all databases, each record is uniquely identified past some sort of key. In a relational database, this might be an auto-incrementing primary key. In DynamoDB, this is the master key.

To become to 2nd normal form, each non-key attribute must depend on the whole primal. This is a disruptive way to say that information should not exist duplicated across multiple records. If data is duplicated, it should be pulled out into a separate table. Each tape that uses that data should refer to it via a foreign fundamental reference.

Imagine nosotros have an awarding that contains Books and Authors. Each Book has an Author, and each Writer has some biographical data, such as their proper noun and nascence year. In a relational database, we would model the data as follows:

Books & Authors normalization

Note: In reality, a book tin can have multiple authors. For simplification of this example, we're assuming each volume has exactly one author.

This works in a relational database as you can bring together those two tables at query-time to include the author'due south biographical data when retrieving details about the book.

Only we don't accept joins in DynamoDB. And then how can nosotros solve this? We can ignore the rules of second normal grade and include the Author's biographical information on each Book item, as shown beneath.

DynamoDB Books & Authors denormalization

Notice that there are multiple Books that incorporate the biographical data for the Author Stephen King. Because this data won't modify, we can store it directly on the Book item itself. Whenever we retreive the Book, we volition as well get information almost the parent Author item.

There are two primary questions y'all should ask when considering this strategy:

  • Is the duplicated information immutable?

  • If the data does change, how often does it change and how many items include the duplicated information?

In our example above, we've duplicated biographical data that isn't likely to change. Because it's essentially immutable, it's OK to duplicate it without worrying about consistency issues when that data changes.

Fifty-fifty if the information you're duplicating does change, you lot still may determine to duplicate it. The big factors to consider are how frequently the data changes and how many items include the duplicated information.

If the information changes fairly infrequently and the denormalized items are read a lot, it may be OK to indistinguishable to relieve money on all of those subsequent reads. When the duplicated data does change, yous'll demand to work to ensure it's changed in all those items.

Which leads us to the 2d cistron—how many items contain the duplicated data. If yous've only duplicated the information beyond 3 items, information technology tin be piece of cake to notice and update those items when the data changes. If that data is copied across thousands of items, it can be a real chore to discover and update each of those items, and you run a greater risk of data inconsistency.

Essentially, you lot're balancing the benefit of duplication (in the class of faster reads) confronting the costs of updating the data. The costs of updating the data includes both factors above. If the costs of either of the factors above are depression, so nigh whatever benefit is worth it. If the costs are high, the opposite is truthful.

Composite main key + the Query API action

The adjacent strategy to model one-to-many relationships—and probably the most common way—is to employ a composite chief primal plus the Query API to fetch an object and its related sub-objects.

A key concept in DynamoDB is the notion of particular collections. Item collections are all the items in a table or secondary index that share the aforementioned sectionalization key. When using the Query API activeness, you can fetch multiple items inside a single item collection. This can include items of different types, which gives yous join-like beliefs with much better performance characteristics.

Let'southward use one of the examples from the beginning of this section. In a SaaS awarding, Organizations will sign upwardly for accounts. Then, multiple Users will belong to an Organization and take advantage of the subscription.

Considering we'll be including unlike types of items in the same table, we won't take meaningful aspect names for the attributes in our primary cardinal. Rather, nosotros'll employ generic aspect names, like PK and SK, for our primary key.

We have ii types of items in our tabular array—Organizations and Users. The patterns for the PK and SK values are equally follows:

Entity PK SK
Organizations ORG#<OrgName> METADATA#<OrgName>
Users ORG#<OrgName> User#<UserName>

The table below shows some example items:

DynamoDB table Organizations and Users

In this table, we've added 5 items—two Organization items for Microsoft and Amazon, and three User items for Bill Gates, Satya Nadella, and Jeff Bezos.

Outlined in carmine is the item collection for items with the partitioning primal of ORG#MICROSOFT. Discover how in that location are 2 different item types in that collection. In light-green is the Organisation particular blazon in that item drove, and in bluish is the User item type in that item collection.

This primary key pattern makes it piece of cake to solve four access patterns:

  1. Retrieve an Organization. Use the GetItem API call and the Arrangement's proper noun to brand a request for the particular with a PK of ORG#<OrgName> and an SK of METADATA#<OrgName>.

  2. Call back an System and all Users within the Organization. Use the Query API action with a key condition expression of PK = ORG#<OrgName>. This would think the Organisation and all Users inside it every bit they all have the aforementioned partition central.

  3. Retrieve only the Users within an Organization. Utilize the Query API activeness with a key condition expression of PK = ORG#<OrgName> AND begins_with(SK, "USER#"). The apply of the begins_with() part allows the states to remember merely the Users without fetching the Organization object as well.

  4. Retrieve a specific User. If you know both the Organization name and the User'due south username, you lot can use the GetItem API call with a PK of ORG#<OrgName> and an SK of USER#<Username> to fetch the User item.

While all four of these access patterns tin can be useful, the 2nd admission pattern—Retrieve an Organization and all Users within the Organization—is most interesting for this discussion of one-to-many relationships. Notice how we're emulating a join performance in SQL by locating the parent object (the Organisation) in the aforementioned item drove equally the related objects (the Users). We are pre-joining our information by arranging them together at write fourth dimension.

This is a pretty common way to model 1-to-many relationships and will work for a number of situations.

Secondary alphabetize + the Query API action

A similar pattern for one-to-many relationships is to use a global secondary index and the Query API to fetch many. This pattern is well-nigh the aforementioned as the previous blueprint merely it uses a secondary alphabetize rather than the primary keys on the main table.

Y'all may demand to use this design instead of the previous blueprint considering the primary keys in your table are reserved for another purpose. Information technology could be some write-specific purpose, such as to ensure uniqueness on a particular property, or it could be because you lot have hierarchical data with a number of levels.

For the latter situation, allow'southward become back to our most recent example. Imagine that in your SaaS application, each User tin create and salvage diverse objects. If this were Google Bulldoze, it might be a Document. If this were Zendesk, it might be a Ticket. If information technology were Typeform, it might be a Form.

Let's use the Zendesk example and become with a Ticket. For our cases, let's say that each Ticket is identified by an ID that is a combination of a timestamp plus a random hash suffix. Further, each ticket belongs to a particular User in an Organization.

If we wanted to discover all Tickets that belong to a particular User, we could endeavour to intersperse them with the existing table format from the previous strategy, as follows:

DynamoDB table Tickets in Users

Observe the 2 new Ticket items outlined in red.

The problem with this is that it actually jams up my prior use cases. If I want to retrieve an Organization and all its Users, I'grand besides retrieving a agglomeration of Tickets. And since Tickets are likely to vastly exceed the number of Users, I'll be fetching a lot of useless information and making multiple pagination requests to handle our original use case.

Instead, let's try something different. We'll exercise iii things:

  1. We'll model our Ticket items to be in a split item collection birthday in the main tabular array. For the PK and SK values, we'll use a pattern of TICKET#<TicketId> which will allow for straight lookups of the Ticket item.

  2. Create a global secondary index named GSI1 whose keys are GSI1PK and GSI1SK.

  3. For both our Ticket and User items, add values for GSI1PK and GSI1SK. For both items, the GSI1PK attribute value will be ORG#<OrgName>#USER#<UserName>.

For the User detail, the GSI1SK value will exist USER#<UserName>.

For the Ticket item, the GSI1SK value will be TICKET#<TicketId>.

Now our base table looks equally follows:

DynamoDB tables Users and Tickets separate base table

Notice that our Ticket items are no longer interspersed with their parent Users in the base tabular array. Further, the User items now have additional GSI1PK and GSI1SK attributes that volition be used for indexing.

If we look at our GSI1 secondary alphabetize, we come across the following:

DynamoDB table Users and Tickets GSI

This secondary index has an item collection with both the User particular and all of the user'southward Ticket items. This enables the same access patterns we discussed in the previous section.

One concluding note before moving on—detect that I've structured information technology so that the User item is the concluding particular in the sectionalisation. This is considering the Tickets are sorted past timestamp. It's likely that I'll want to fetch a User and the User's virtually recent Tickets, rather than the oldest tickets. As such, I order it and so that the User is at the terminate of the detail collection, and I can use the ScanIndexForward=False property to indicate that DynamoDB should outset at the end of the particular drove and read backwards.

Composite sort keys with hierarchical data

In the last two strategies, nosotros saw some data with a couple levels of bureaucracy—an Organization has Users, which create Tickets. But what if you have more than ii levels of bureaucracy? You don't want to proceed calculation secondary indexes to enable arbitrary levels of fetching throughout your hierarchy.

A mutual example in this area is around location-based data. Let'south keep with our workplace theme and imagine you lot're tracking all the locations of Starbucks around the world. You want to be able to filter Starbucks locations on arbitrary geographic levels—by country, past state, by city, or by zero code.

We could solve this problem by using a composite sort key. This term is a fiddling confusing, because we're using a composite primary cardinal on our table. The term blended sort fundamental means that nosotros'll be peachy a agglomeration of properties together in our sort key to allow for unlike search granularity.

Let'southward see how this looks in a table. Below are a few items:

DynamoDB Starbucks locations

In our table, the sectionalization key is the country where the Starbucks is located. For the sort key, nosotros include the Country, City, and ZipCode, with each level separated by a #. With this pattern, we can search at four levels of granularity using only our primary key!

The patterns are:

  1. Observe all locations in a given country. Use a Query with a key status expression of PK = <Country>, where Country is the land yous want.

  2. Find all locations in a given state and state. Utilize a Query with a status expression of PK = <Country> AND begins_with(SK, '<State>#'.

  3. Notice all locations in a given country, state, and urban center. Use a Query with a status expression of PK = <Land> AND begins_with(SK, '<State>#<City>'.

  4. Find all locations in a given country, state, city, and goose egg code. Employ a Query with a condition expression of PK = <Country> AND begins_with(SK, '<State>#<City>#<ZipCode>'.

This composite sort central pattern won't piece of work for all scenarios, but it tin can exist great in the right state of affairs. It works best when:

  • You have many levels of hierarchy (>2), and y'all have admission patterns for dissimilar levels within the hierarchy.

  • When searching at a item level in the bureaucracy, you want all subitems in that level rather than just the items in that level.

For example, call back our SaaS example when discussing the primary key and secondary alphabetize strategies. When searching at i level of the hierarchy—observe all Users—we didn't want to dip deeper into the hierarchy to observe all Tickets for each User. In that case, a blended sort cardinal will return a lot of extraneous items.

If you want a detailed walkthrough of this example, I wrote up the total Starbucks case on DynamoDBGuide.com.

Conclusion

In this post, we discussed five different strategies you can implement when modeling data in a one-to-many relationship with DynamoDB. The strategies are summarized in the table below.

Strategy Notes Relevant examples
Denormalize + complex aspect Skillful when nested objects are bounded and are not accessed directly User mailing addresses
Denormalize + indistinguishable Proficient when duplicated data is immutable or infrequently changing Books & Authors; Movies & Roles
Primary key + Query API Most mutual. Good for multiple access patterns on the two entity types. Virtually one-to-many relationships
Secondary alphabetize + Query API Similar to primary key strategy. Good when primary key is needed for something else. Most one-to-many relationships
Blended sort key Good for very hierarchical data where you need to search at multiple levels of the hierarchy Starbucks locations

Consider your needs when modeling one-to-many relationships and determine which strategy works best for your state of affairs.

If y'all take questions or comments on this piece, feel free to leave a note below or e-mail me direct.

knightgeould.blogspot.com

Source: https://www.alexdebrie.com/posts/dynamodb-one-to-many/

0 Response to "How to Read an Item From Dynamodb Table"

Publicar un comentario

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel