How to design your app multi-tenant
The primary barrier to adopting cloud services offered by OTTs is the trust issue or the need for more. Businesses recognize the value of data, which encompasses information about their products, customers, employees, suppliers, and more. Since the cloud is centered around data, cloud applications allow centralized, network-based access to data with lower overhead costs than locally installed applications. When creating a SaaS data architecture, cloud architects must prioritize building a solution that is both secure and robust enough to satisfy customers who are apprehensive about relinquishing control of their essential business data to a third party while also being efficient and cost-effective to manage. The optimal isolation required for a SaaS application’s data architecture can vary significantly.
Separate Databases
Restoring tenant data from backups in case of a failure is a relatively straightforward process. The simplest way to isolate data is by storing tenant data in separate databases. While computing resources and application code are typically shared among tenants on a server, each tenant maintains its own logically segregated data set. A metadata association links each database to its corresponding tenant, and database security measures prevent any tenant from accessing other tenants’ data either accidentally or intentionally. Providing each tenant with their database enables easy customization of the application’s data model to meet their specific requirements.
This method of separating tenant data into individual databases is the “premium” approach (similar to AWS reserved instances) and incurs higher costs for equipment maintenance and backing up tenant data. Hardware expenses are also higher since the number of tenants hosted on a database server is restricted by the server’s database capacity. Thus, it is ideal for customers willing to pay more for increased security and customization options. For instance, finance, medicine, or government record management customers may have stringent data isolation needs. They may not consider an application that does not offer a separate database for each tenant.
Shared Database, Separate Schemas
An alternative approach to data isolation involves consolidating multiple tenants within a single database by grouping each tenant’s tables into a schema created when the tenant subscribes to the service. The provisioning subsystem creates a unique set of tables for each tenant and associates it with the tenant’s schema, which becomes the default schema for the tenant account. This allows a tenant to access its data by specifying the table name rather than the schema name.TableName convention creates a single set of SQL statements that all tenants can use to access their data.
This approach provides a moderate level of logical data isolation for security-minded tenants, although not to the extent of a thoroughly segregated system, and can support a more significant number of tenants per database server. Like the isolated approach, the separate-schema system is relatively simple to implement, and tenants can expand the data model as quickly as the individual database approach. However, a disadvantage of the separate-schema approach is that restoring tenant data in the event of a failure is more complicated. While fixing a tenant’s database from the most recent backup is a straightforward process in the isolated approach, in the separate-schema system, it may require restoring the entire database to a temporary server and importing the tenant’s tables into the production server, which can be a complex and time-consuming task.
The particular schema method is suitable for applications that use a relatively small number of database tables per tenant, typically around 100 tables or fewer. This approach can support more tenants per server than the individual database approach, making the application more affordable as long as customers are willing to have their data alongside other tenants.
Choosing an Approach
The suitability of the three approaches depends on various business and technical factors, each offering benefits and trade-offs. The following are some of the considerations to take into account:
Economic Considerations
Applications designed for a shared approach typically require a more significant investment in development effort than those optimized for a more isolated system, primarily due to the greater complexity of creating a shared architecture. Consequently, initial costs are typically higher. However, since shared applications can accommodate more tenants per server, ongoing operational costs are usually lower. The choice of approach may be constrained by various business and economic factors, which may influence your development efforts. Although a shared schema approach can be more cost-effective over the long run, it requires a substantial initial development effort before generating revenue. If your budget cannot support the development effort necessary to build a shared schema application, or you need to bring your application to market more rapidly than a large-scale development effort allows, consider a more isolated approach.
Security Considerations
To ensure the security of sensitive tenant data and meet customers’ high-security expectations, your data architecture decision will play a crucial role in your service level agreements (SLAs) and the data safety guarantees you can offer. Contrary to popular belief, physical isolation is one of many means to provide adequate security. A shared approach can offer strong data safety but requires more advanced design patterns.
To estimate future use, consider factors such as the number of prospective tenants you plan to target, the storage space each tenant’s data will occupy, and the number of concurrent end-users each tenant will likely support. The number, nature, and requirements of your potential tenants will all influence your data architecture decision. A more shared approach may be appropriate if you expect a significant number of tenants. However, if you anticipate that some or all tenants will store large amounts of data, the separate-database approach may be more suitable.
Suppose the average tenant is expected to support many concurrent end-users, or you plan to offer value-added services such as per-tenant backup and restore capability. In that case, a more isolated approach may be preferable. Ultimately, your decision will be influenced by business and economic considerations. A more isolated process may be the best option if a more extensive development effort is not feasible or you need to bring your application to market quickly.
Data Encryption Consideration
One way to enhance the security of tenant data is by encrypting it within the database. This ensures that it remains secure even if the data falls into the wrong hands. There are two types of cryptographic methods: symmetric and asymmetric. In symmetric cryptography, a key is used to encrypt and decrypt data, while in asymmetric cryptography, two keys — public and private — are used.
Public keys are shared with all parties interested in communicating with the key holder, while private keys are kept secure. For instance, to send an encrypted message to B, A would obtain B’s public key through an agreed-upon means and use it to encrypt the message. The encrypted message can only be decrypted by someone possessing B’s private key, which is kept secure.
To send a message to B using symmetric encryption, A would have to ship the symmetric key separately, which poses the risk of interception by a third party. However, public-key cryptography requires more computing power than symmetric cryptography, rendering it infeasible for SaaS applications where every stored data piece is encrypted. Therefore, a better approach is to use a key wrapping system that combines the benefits of both systems.
When building enterprise software that serves thousands of people simultaneously, scalability is a significant challenge. For a SaaS application, scalability is even more critical because it must support data belonging to all customers. It’s crucial to distinguish between scaling the application and scaling the data when developing a scaling procedure. Databases can be scaled up (by upgrading to a more powerful server) or scaled out (by partitioning onto multiple servers). Different strategies are required when scaling a shared database versus scaling dedicated databases.
Scaling Consideration
To scale a database, replication and partitioning are the primary methods used. The image involves copying all or part of a database to another location and keeping the copy or copies synchronized with the original. Single-master replication, where only the initial database can be written to, is easier to manage than multi-master replication, where some or all of the copies can be written to. A synchronization mechanism is used to reconcile changes between different copies of the data.
A database can be partitioned by relocating entire tables or splitting one or more tables into smaller ones horizontally or vertically. Horizontal partitioning divides the database into two or smaller databases with the same schema and structure but fewer rows in each table. Vertical partitioning divides one or more tables into smaller tables with the same number of rows, but each table contains a subset of the columns from the original table.
When scaling databases, replication and partitioning are often used in combination with one another.
Tenant‐Based Horizontal Partitioning Consideration
Scaling a shared database is necessary when it no longer meets required performance KPIs due to high concurrency, large database size, or operational maintenance issues. Horizontal partitioning based on tenant ID is the simplest way to scale out a shared database. SaaS shared databases are particularly suited to horizontal partitioning since each tenant has its own data set, making it easy to target and move individual tenant data. However, it’s essential to plan carefully and not assume that all tenants have similar demands. To avoid creating overtaxed or underused partitions, it’s crucial to partition the database based on metrics that accurately reflect tenants’ needs, such as database size or the total number of active end-user accounts. Repartitioning may be necessary periodically as tenants evolve and change how they work. The chosen partitioning strategy should be feasible to execute without affecting production systems, and it’s essential to build support for monitoring the application to survey and report on partitioning decisions accurately.
Single Tenant Scaleout Consideration
If tenants store and use a significant amount of data, allocate an entire server to a single database that serves one tenant. However, this approach poses scalability challenges similar to those of architects of traditional single-tenant applications. When dealing with an extensive database on a dedicated server, the easiest way to accommodate growth is to scale up. However, if the database continues to expand, moving it to a more powerful server may eventually become cost-prohibitive, requiring a shift to scaling out by partitioning the database across one or more additional servers.
Scaling out a dedicated database is distinct from scaling out a shared one. With a shared database, the most effective scaling method entails transferring complete sets of tenant data from one database to another, with the nature of the data model being relatively insignificant. However, when scaling a database dedicated to a single tenant, it is necessary to analyze the stored data type to determine the optimal approach.