Firestore Queries And Data Modeling Course
Core Concepts
What is Firestore?
Firebase realtime database was the first database management system created by Firebase.
The pain points to it were:
You couldn't select a subset of data in a node; pulling data from a node pulled all the data.
Querying and sorting of data is limited; required lots of client-side code to handle queries and sorting and filtering.
Cloud Firestore was created to address these issues. It's a document-oriented database (like MongoDB).
Collections and documents
All data is stored in documents. Collections are just containers for documents.
Note: You can nest sub-collections underneath documents. And when you read a parent document, you actually don't pull any data from its sub-collections.
Firestore is denormalized
Relational databases are normalized: they break down data into the smallest parts possible, so there's no data duplication.
Mapping of terms between SQL and Firestore:
Note: There are no JOIN
s in Firestore. You can "join" data client-side, but this doesn't happen directly with the database. This is by design: JOIN
s aren't efficient.
Firestore is schemaless
You can add data to any document on the fly, and it doesn't have to match the data types in other documents in the collection.
The fundamental idea with Firestore is to model your data in a way that doesn't require a JOIN
. You are prerendering data for read efficiency.
Mental Framework for Modeling Data
Aim to model your data based on how it's going to be consumed by the app.
Think about the screens/views in your app, then design your data model that's the path of least resistance to get that view.
In contrast: SQL is designed via preset rules.
Note: Given how open-ended Firestore can be, there is usually more than one way to model your data. You have to consider the trade-offs of each.
Data Types
Firestore has the typical primitive data types: strings, number, booleans, null.
Other data types include:
Geopoints (latitude and longitude)
References to other documents
Map (like an object)
Array (ordered data)
Techniques for Data Modeling
Embedding: adding data directly into a document.
This is the first technique you should consider because it's the most cost-effective and performant.
You only need 1 document read!
Note: It's only practical if the data is small because documents are limited to 1MB in size. Plus queries to the document will download the entire document.
Root collection: creating a collection at root and creating documents in there.
This is normalizing your data.
Great for maintaining relationships across documents.
Great for querying across multiple properties.
Subcollection: scopes data to individual document.
Great for when you need to set up a one-to-many relationship.
The only limitation is that queries across parents don't include subcollections (e.g. filtering a collection).
Note: When you query a document, you have to explicitly request its subcollections!
Bucketing: a collection of documents that embed relational data.
Great for decreasing number of queries to get data.
Reading Data
Rule of thumb: If your query code is simple, you probably have a good data model. If it's complicated, you probably need to rethink the data model.
Note: Under the hood, the Firebase SDK performs pipelining, which combines requests and runs them concurrently.
Massaging Data
Some useful methods for massaging data server-side include:
where('key', '>=', today)
You'll use this the most!
Note: Range operators like
>=
can only be used once, while equality operators can be used infinitely.Note 2: There is no inequality operator, so you have to combine an above and below query
orderBy('key', 'desc')
Note: If some documents don't contain your chosen key, they'll be filtered out.
limit(20)
startAfter
andendBefore
(exclusive) plusstartAt
andendAt
(inclusive)Great for pagination
Pro tip:
You can chain these methods. These chains work like the AND operator.
However, there is no OR operator. To mimic OR, just create multiple queries.
Best practice: It's often much easier to perform many small queries and combine them client-side than to try to perform one giant query.
Bonus: where
has a special array-contains
operator that checks if an array contains a value, returning the array if there's a match.
Indexes
What are indices? I don't get it...
In any case, Firestore will throw an error if it recommends you create an index.
Security
The most important thing to keep in mind when it comes to security is that you cannot partially read a document. So whatever access you provide, they have access to all of the data.
Some models of security:
Best practice: If you want a data set to have its own security rules, it's best to place it into its own collection.
Relational Data Modeling
Cardinality
Ask yourself: how many items can be in a set? A few? Hundreds? Billions?
If only a few, it makes the most sense to embed in the parent document.
If hundreds, it makes sense to create a bucket document with the same document ID.
If billions, opt for a collection because it can scale indefinitely.
One-to-one
One-to-one relationships are the easiest to model. There are 3 ways of creating the relationship:
Embed data inside document.
If the relationship is truly unique, this should work.
Create 2 documents in different collections using the same document ID. (Like buckets.)
Pro tip: Using the same document ID enforces the one-to-one relationship as well. It makes it impossible to accidentally create a duplicate relationship.
Create a reference ID pointing to the other document.
Note: This is less efficient because it requires an extra query: getting the reference ID.
Code implementation:
One-to-many
One-to-many relationships are the most common. There are 3 ways to model it:
Embed an array of maps
Great when you have a small number of items in a set and don't need to query the data.
If you need to query the data, you have to read all parent documents and then query client-side, which is expensive.
So, if data needs to be queried, opt for:
Subcollection
However, you can only query scoped to the specific document
So, if you need to query across parent documents, opt for:
Root collection
Here, you reference the relevant document ID in your documents, creating the relationship.
Many-to-many
Many-to-many relationships are the most complex to model.
Create an middle-man collection
In a middle-man collection, the 2 document IDs are keys in the document.
The original documents don't have data about the relationship. They have to go to the middle-man collection to get the data.
Note: If you want to create a unique relationship (e.g. user can only heart a post once), make the document ID into a composite ID, which combines the 2 document IDs together.
Embedded map with document IDs as keys
You're essentially creating a reference point for related data in the document.
The downside to this is that when we're querying the data, we're pulling in data from the parent document as well, which could be unnecessarily expensive.
Embedded array with document IDs as items
This is similar to an embedded map but uses an array instead.
Note: You can only use the
array-contains
operator once at a time.
Advanced Techniques for Efficiency
Data duplication
Data duplication is about duplicating data at different places in the database to decrease the number of reads required to get the data. (This is a no-no in SQL but normal in NoSQL.)
An example of data duplication is a one-to-many relationship where a user can have multiple comments. Each comment is given the user's document ID for reference. However, if we want to avoid reading the user's document to get more data on them, we can duplicate some of that data in the comment document itself.
Data duplication makes the most sense when you have information with lots of reads but few writes: you're decreasing number of reads required to gather data, but you're increasing number of places you have to write when that data changes.
Pro tip:
Good candidates for data duplication are things like usernames, which almost never change. When they do change, however, you may have to update them in 1000s of documents.
Bad candidates are things like
Aggregation via cloud functions
We can create cloud functions like onCreate
, onDelete
, onUpdate
, and more that run when an event happens.
The most common use case for this is incrementing and decrementing counters (e.g. like count). By creating aggregate data, you decrease the number of reads required to get that data, increasing performance.
Last updated