This document provides a reference architecture for a multi-tier application that runs on Compute Engine VMs and Spanner in a global topology in Google Cloud. The document also provides guidance to help you build an architecture that uses other Google Cloud infrastructure services. It describes the design factors that you should consider when you build a global architecture for your cloud applications. The intended audience for this document is cloud architects.
This architecture is aligned with the global deployment archetype. We recommend this archetype for applications that serve users across the world and need high availability and robustness against outages in multiple regions. This architecture supports elastic scaling at the network, application, and database levels. It lets you align costs with usage without having to compromise on performance, availability, or scalability.
Architecture
The following diagram shows an architecture for an application that runs on infrastructure that's distributed globally across multiple Google Cloud regions.
In this architecture, a global load balancer distributes incoming requests to web servers in appropriate regions based on their availability, capacity, and proximity to the source of the traffic. A cross-regional internal load balancing layer handles distribution of traffic from the web servers to the appropriate application servers based on their availability and capacity. The application servers write data to, and read from, a synchronously replicated database that's available in all the regions.
The architecture includes the following Google Cloud resources:
Component | Purpose |
---|---|
Global external load balancer |
The global external load balancer receives and distributes user requests to the application. The global external load balancer advertises a single anycast IP address, but the load balancer is implemented as a large number of proxies on Google Front Ends (GFEs). Client requests are directed to the GFE that's closest to the client. Depending on your requirements, you can use a global external Application Load Balancer or a global external proxy Network Load Balancer. For more information, see Choose a load balancer. To protect your application against threats like distributed denial-of-service (DDoS) attacks and cross-site scripting (XSS), you can use Google Cloud Armor security policies. |
Regional managed instance groups (MIGs) for the web tier |
The web tier of the application is deployed on Compute Engine VMs that are part of regional MIGs. These MIGs are the backends for the global load balancer. Each MIG contains Compute Engine VMs in three different zones. Each of these VMs hosts an independent instance of the web tier of the application. |
Cross-region internal load balancing layer |
Internal load balancers with cross-regional backends handle the distribution of traffic from the web tier VMs in any region to the application tier VMs across all the regions. Depending on your requirements, you can use a cross-region internal Application Load Balancer or a cross-region internal proxy Network Load Balancer. For more information, see Choose a load balancer. |
Regional MIGs for the application tier |
The application tier is deployed on Compute Engine VMs that are part of regional MIGs. These MIGs are the backends for the internal load balancing layer. Each MIG contains Compute Engine VMs in three different zones. Each VM hosts an independent instance of the application tier. |
Spanner multi-region instance |
The application writes data to and reads from a multi-region Spanner instance. The multi-region configuration in this architecture includes the following replicas:
|
Virtual Private Cloud (VPC) network and subnets |
All the resources in the architecture use a single VPC network. The VPC network has the following subnets:
Instead of using a single VPC network, you can create a separate VPC network in each region and connect the networks by using Network Connectivity Center. |
Products used
This reference architecture uses the following Google Cloud products:
- Compute Engine: A secure and customizable compute service that lets you create and run VMs on Google's infrastructure.
- Cloud Load Balancing: A portfolio of high performance, scalable, global and regional load balancers.
- Spanner: A highly scalable, globally consistent, relational database service.
Design considerations
This section provides guidance to help you use this reference architecture to develop an architecture that meets your specific requirements for system design, security and compliance, reliability, cost, operational efficiency, and performance.
System design
This section provides guidance to help you to choose Google Cloud regions for your global deployment and to select appropriate Google Cloud services.
Region selection
When you choose the Google Cloud regions where your applications must be deployed, consider the following factors and requirements:
- Availability of Google Cloud services in each region. For more information, see Products available by location.
- Availability of Compute Engine machine types in each region. For more information, see Regions and zones.
- End-user latency requirements.
- Cost of Google Cloud resources.
- Cross-regional data transfer costs.
- Regulatory requirements.
Some of these factors and requirements might involve trade-offs. For example, the most cost-efficient region might not have the lowest carbon footprint. For more information, see Best practices for Compute Engine regions selection.
Compute infrastructure
The reference architecture in this document uses Compute Engine VMs for certain tiers of the application. Depending on the requirements of your application, you can choose from other Google Cloud compute services:
- Containers: you can run containerized applications in Google Kubernetes Engine (GKE) clusters. GKE is a container-orchestration engine that automates deploying, scaling, and managing containerized applications.
- Serverless: if you prefer to focus your IT efforts on your data and applications instead of setting up and operating infrastructure resources, then you can use serverless services like Cloud Run.
The decision of whether to use VMs, containers, or serverless services involves a trade-off between configuration flexibility and management effort. VMs and containers provide more configuration flexibility, but you're responsible for managing the resources. In a serverless architecture, you deploy workloads to a preconfigured platform that requires minimal management effort. For more information about choosing appropriate compute services for your workloads in Google Cloud, see Hosting Applications on Google Cloud.
Storage services
The architecture shown in this document uses regional Persistent Disk volumes for the VMs. Regional Persistent Disk volumes provide synchronous replication of data across two zones within a region. Data in Persistent Disk volumes is not replicated across regions.
Other storage options for multi-regional deployments include Cloud Storage dual-region or multi-region buckets. Objects that are stored in a dual-region or multi-region bucket are stored redundantly in at least two separate geographic locations. Metadata is written synchronously across regions, and data is replicated asynchronously. For dual-region buckets, you can use turbo replication, which ensures faster replication across regions. For more information, see Data availability and durability.
To store files that are shared across multiple VMs in a region, such as across all the VMs in the web tier or application tier, you can use a Filestore Enterprise instance. The files that you store in a Filestore Enterprise instance are replicated synchronously across three zones within the region. This replication ensures high availability and robustness against zone outages. You can store shared configuration files, common tools and utilities, and centralized logs in the Filestore instance, and mount the instance on multiple VMs.
When you design storage for your multi-regional workloads, consider the functional characteristics of the workloads, resilience requirements, performance expectations, and cost goals. For more information, see Design an optimal storage strategy for your cloud workload.
Database services
The reference architecture in this document uses Spanner, a fully managed, horizontally scalable, globally distributed, and synchronously-replicated database. We recommend a multi-regional Spanner configuration for mission-critical deployments that require strong cross-region consistency. Spanner supports synchronous cross-region replication without downtime for failover, maintenance, or resizing.
For information about other managed database services that you can choose from based on your requirements, see Google Cloud databases. When you choose and configure the database for a multi-regional deployment, consider your application's requirements for cross-region data consistency, and be aware of the performance and cost trade-offs.
External load balancing options
An architecture that uses a global external load balancer, such as the architecture in this document, supports certain features that help you to enhance the reliability of your deployments. For example, if you use the global external Application Load Balancer, you can implement edge caching by using Cloud CDN.
If your application requires Transport Layer Security (TLS) to be terminated in a specific region, or if you need the ability to serve content from specific regions, you can use regional load balancers with Cloud DNS to route traffic to different regions. For information about the differences between regional and global load balancers, see the following documentation:
- Global versus regional load balancing in "Choose a load balancer"
- Modes of operation in "External Application Load Balancer overview"
Security, privacy, and compliance
This section describes factors that you should consider when you use this reference architecture to design and build a global topology in Google Cloud that meets the security, privacy, and compliance requirements of your workloads.
Protection against external threats
To protect your application against threats like distributed-denial-of-service (DDoS) attacks and cross-site scripting (XSS), you can use Google Cloud Armor security policies. Each policy is a set of rules that specifies certain conditions that should be evaluated and actions to take when the conditions are met. For example, a rule could specify that if the source IP address of the incoming traffic matches a specific IP address or CIDR range, then the traffic must be denied. You can also apply preconfigured web application firewall (WAF) rules. For more information, see Security policy overview.
External access for VMs
In the reference architecture that this document describes, the Compute Engine VMs don't need inbound access from the internet. Don't assign external IP addresses to the VMs. Google Cloud resources that have only a private, internal IP address can still access certain Google APIs and services by using Private Service Connect or Private Google Access. For more information, see Private access options for services.
To enable secure outbound connections from Google Cloud resources that have only private IP addresses, like the Compute Engine VMs in this reference architecture, you can use Secure Web Proxy or Cloud NAT.
Service account privileges
For the Compute Engine VMs in the architecture, instead of using the default service accounts, we recommend that you create dedicated service accounts and specify the resources that the service account can access. The default service account includes a broad range of permissions that aren't necessary in this instance, whereas you can tailor dedicated service accounts to have only the permissions needed. For more information, see Limit service account privileges.
SSH security
To enhance the security of SSH connections to the Compute Engine VMs in this architecture, implement Identity-Aware Proxy (IAP) forwarding with Cloud OS Login API. IAP lets you control network access based on user identity and Identity and Access Management (IAM) policies. Cloud OS Login API lets you control Linux SSH access based on user identity and IAM policies. For more information about managing network access, see Best practices for controlling SSH login access.
More security considerations
When you build the architecture for your workload, consider the platform-level security best practices and recommendations that are provided in the Enterprise foundations blueprint and Google Cloud Well-Architected Framework: Security, privacy, and compliance.
Reliability
This section describes design factors that you should consider when you use this reference architecture to build and operate reliable infrastructure for a global deployment in Google Cloud.
MIG autoscaling
When you run your application on multiple regional MIGs, the application remains available during isolated zone outages or region outages. The autoscaling capability of stateless MIGs lets you maintain application availability and performance at predictable levels.
To control the autoscaling behavior of your stateless MIGs, you can specify target utilization metrics, such as average CPU utilization. You can also configure schedule-based autoscaling for stateless MIGs. Stateful MIGs can't be autoscaled. For more information, see Autoscaling groups of instances.
MIG size limit
When you decide the size of your MIGs, consider the default and maximum limits on the number of VMs that can be created in a MIG. For more information, see Add and remove VMs from a MIG.
VM autohealing
Sometimes the VMs that host your application might be running and available, but there might be issues with the application itself. The application might freeze, crash, or not have sufficient memory. To verify whether an application is responding as expected, you can configure application-based health checks as part of the autohealing policy of your MIGs. If the application on a particular VM isn't responding, the MIG autoheals (repairs) the VM. For more information about configuring autohealing, see About repairing VMs for high availability.
VM placement
In the architecture that this document describes, the application tier and web tier run on Compute Engine VMs that are distributed across multiple zones. This distribution ensures that your application is robust against zone outages.
To improve the robustness of the architecture, you can create a spread placement policy and apply it to the MIG template. When the MIG creates VMs, it places the VMs within each zone on different physical servers (called hosts), so your VMs are robust against failures of individual hosts. For more information, see Create and apply spread placement policies to VMs.
VM capacity planning
To make sure that capacity for Compute Engine VMs is available when VMs need to be provisioned, you can create reservations. A reservation provides assured capacity in a specific zone for a specified number of VMs of a machine type that you choose. A reservation can be specific to a project, or shared across multiple projects. For more information about reservations, see Choose a reservation type.
Stateful storage
A best practice in application design is to avoid the need for stateful local disks. But if the requirement exists, you can configure your persistent disks to be stateful to ensure that the data is preserved when the VMs are repaired or recreated. However, we recommend that you keep the boot disks stateless, so that you can update them to the latest images with new versions and security patches. For more information, see Configuring stateful persistent disks in MIGs.
Data durability
You can use Backup and DR to create, store, and manage backups of the Compute Engine VMs. Backup and DR stores backup data in its original, application-readable format. When required, you can restore your workloads to production by directly using data from long-term backup storage and avoid the need to prepare or move data.
Compute Engine provides the following options to help you to ensure the durability of data that's stored in Persistent Disk volumes:
- You can use snapshots to capture the point-in-time state of Persistent Disk volumes. The snapshots are stored redundantly in multiple regions, with automatic checksums to ensure the integrity of your data. Snapshots are incremental by default, so they use less storage space and you save money. Snapshots are stored in a Cloud Storage location that you can configure. For more recommendations about using and managing snapshots, see Best practices for Compute Engine disk snapshots.
- To ensure that data in Persistent Disk remains available if a zone outage occurs, you can use Regional Persistent Disk or Hyperdisk Balanced High Availability. Data in these disk types is replicated synchronously between two zones in the same region. For more information, see About synchronous disk replication.
Database reliability
Data that's stored in a multi-region Spanner instance is replicated synchronously across multiple regions. The Spanner configuration that's shown in the preceding architecture diagram includes the following replicas:
- Four read-write replicas in separate zones across two regions.
- A witness replica in a third region.
A write operation to a multi-region Spanner instance is acknowledged after at least three replicas—in separate zones across two regions—have committed the operation. If a zone or region failure occurs, Spanner has access to all of the data, including data from the latest write operations, and it continues to serve read and write requests.
Spanner uses disaggregated storage where the compute and storage resources are decoupled. You don't have to move data when you add compute capacity for HA or scaling. The new compute resources get data when they need it from the closest Colossus node. This makes failover and scaling faster and less risky.
Spanner provides external consistency, which is a stricter property than serializability for transaction-processing systems. For more information, see the following:
- Spanner: TrueTime and external consistency
- Demystifying Spanner multi-region configurations
- Inside Spanner and the CAP Theorem
More reliability considerations
When you build the cloud architecture for your workload, review the reliability-related best practices and recommendations that are provided in the following documentation:
- Google Cloud infrastructure reliability guide
- Patterns for scalable and resilient apps
- Designing resilient systems
- Google Cloud Well-Architected Framework: Reliability
Cost optimization
This section provides guidance to optimize the cost of setting up and operating a global Google Cloud topology that you build by using this reference architecture.
VM machine types
To help you optimize the resource utilization of your VM instances, Compute Engine provides machine type recommendations. Use the recommendations to choose machine types that match your workload's compute requirements. For workloads with predictable resource requirements, you can customize the machine type to your needs and save money by using custom machine types.
VM provisioning model
If your application is fault tolerant, then Spot VMs can help to reduce your Compute Engine costs for the VMs in the application and web tiers. The cost of Spot VMs is significantly lower than regular VMs. However, Compute Engine might preemptively stop or delete Spot VMs to reclaim capacity.
Spot VMs are suitable for batch jobs that can tolerate preemption and don't have high availability requirements. Spot VMs offer the same machine types, options, and performance as regular VMs. However, when the resource capacity in a zone is limited, MIGs might not be able to scale out (that is, create VMs) automatically to the specified target size until the required capacity becomes available again.
VM resource utilization
The autoscaling capability of stateless MIGs enables your application to handle increases in traffic gracefully, and it helps you to reduce cost when the need for resources is low. Stateful MIGs can't be autoscaled.
Database cost
Spanner helps ensure that your database costs are predictable. The compute capacity that you specify (number of nodes or processing units) determines the storage capacity. The read and write throughputs scale linearly with compute capacity. You pay for only what you use. When you need to align costs with the needs of your workload, you can adjust the size of your Spanner instance.
Third-party licensing
When you migrate third-party workloads to Google Cloud, you might be able to reduce cost by bringing your own licenses (BYOL). For example, to deploy Microsoft Windows Server VMs, instead of using a premium image that incurs additional cost for the third-party license, you can create and use a custom Windows BYOL image. You then pay only for the VM infrastructure that you use on Google Cloud. This strategy helps you continue to realize value from your existing investments in third-party licenses. If you decide to use the BYOL approach, then the following recommendations might help to reduce cost:
- Provision the required number of compute CPU cores independently of memory by using custom machine types. By doing this, you limit the third-party licensing cost to the number of CPU cores that you need.
- Reduce the number of vCPUs per core from 2 to 1 by disabling simultaneous multithreading (SMT).
If you deploy a third-party database like Microsoft SQL Server on Compute Engine VMs, then you must consider the license costs for the third-party software. When you use a managed database service like Cloud SQL, the database license costs are included in the charges for the service.
More cost considerations
When you build the architecture for your workload, also consider the general best practices and recommendations that are provided in Google Cloud Well-Architected Framework: Cost optimization.
Operational efficiency
This section describes the factors that you should consider when you use this reference architecture to design and build a global Google Cloud topology that you can operate efficiently.
VM configuration updates
To update the configuration of the VMs in a MIG (such as the machine type or boot-disk image), you create a new instance template with the required configuration and then apply the new template to the MIG. The MIG updates the VMs by using the update method that you choose: automatic or selective. Choose an appropriate method based on your requirements for availability and operational efficiency. For more information about these MIG update methods, see Apply new VM configurations in a MIG.
VM images
For your VMs, instead of using Google-provided public images, we recommend that you create and use custom OS images that contain the configurations and software that your applications require. You can group your custom images into a custom image family. An image family always points to the most recent image in that family, so your instance templates and scripts can use that image without you having to update references to a specific image version. You must regularly update your custom images to include the security updates and patches that are provided by the OS vendor.
Deterministic instance templates
If the instance templates that you use for your MIGs include startup scripts to
install third-party software, make sure that the scripts explicitly specify
software-installation parameters such as the software version. Otherwise, when
the MIG creates the VMs, the software that's installed on the VMs might not be
consistent. For example, if your instance template includes a startup script to
install Apache HTTP Server 2.0 (the apache2
package), then make sure that the
script specifies the exact apache2
version that should be installed, such as
version 2.4.53
. For more information, see
Deterministic instance templates.
Migration to Spanner
You can migrate your data to Spanner from other databases like MySQL, SQL Server, and Oracle Database. The migration process depends on factors like the source database, the size of your data, downtime constraints, and complexity of the application code. To help you plan and implement the migration to Spanner efficiently, we provide a range of Google Cloud and third-party tools. For more information, see Migration overview.
Database administration
With Spanner, you don't need to configure or monitor replication or failover. Synchronous replication and automatic failover are built-in. Your application experiences zero downtime for database maintenance and failover. To further reduce operational complexity, you can configure autoscaling. With autoscaling enabled, you don't need to monitor and scale the instance size manually.
More operational considerations
When you build the architecture for your workload, consider the general best practices and recommendations for operational efficiency that are described in Google Cloud Well-Architected Framework: Operational excellence.
Performance optimization
This section describes the factors that you should consider when you use this reference architecture to design and build a global topology in Google Cloud that meets the performance requirements of your workloads.
Network performance
For workloads that need low inter-VM network latency within the application and web tiers, you can create a compact placement policy and apply it to the MIG template that's used for those tiers. When the MIG creates VMs, it places the VMs on physical servers that are close to each other. While a compact placement policy helps improve inter-VM network performance, a spread placement policy can help improve VM availability as described earlier. To achieve an optimal balance between network performance and availability, when you create a compact placement policy, you can specify how far apart the VMs must be placed. For more information, see Placement policies overview.
Compute Engine has a per-VM limit for egress network bandwidth. This limit depends on the VM's machine type and whether traffic is routed through the same VPC network as the source VM. For VMs with certain machine types, to improve network performance, you can get a higher maximum egress bandwidth by enabling Tier_1 networking.
Compute performance
Compute Engine offers a wide range of predefined and customizable machine types for the workloads that you run on VMs. Choose an appropriate machine type based on your performance requirements. For more information, see Machine families resource and comparison guide.
VM multithreading
Each virtual CPU (vCPU) that you allocate to a Compute Engine VM is implemented as a single hardware multithread. By default, two vCPUs share a physical CPU core. For applications that involve highly parallel operations or that perform floating point calculations (such as genetic sequence analysis, and financial risk modeling), you can improve performance by reducing the number of threads that run on each physical CPU core. For more information, see Set the number of threads per core.
VM multithreading might have licensing implications for some third-party software, like databases. For more information, read the licensing documentation for the third-party software.
Network Service Tiers
Network Service Tiers lets you optimize the network cost and performance of your workloads. You can choose Premium Tier or Standard Tier. Premium Tier delivers traffic on Google's global backbone to achieve minimal packet loss and low latency. Standard Tier delivers traffic using peering, internet service providers (ISP), or transit networks at an edge point of presence (PoP) that's closest to the region where your Google Cloud workload runs. To optimize performance, we recommend using Premium Tier. To optimize cost, we recommend using Standard Tier.
The architecture in this document uses a global external load balancer with an external IP address and backends in multiple regions. This architecture requires you to use Premium Tier, which uses Google's highly reliable global backbone to help you achieve minimal packet loss and latency.
If you use regional external load balancers and route traffic to regions by using Cloud DNS, then you can choose Premium Tier or Standard Tier depending on your requirements. The pricing for Standard Tier is lower than Premium Tier. Standard Tier is suitable for traffic that isn't sensitive to packet loss and that doesn't have low latency requirements.
Spanner performance
When you provision a Spanner instance, you specify the compute capacity of the instance in terms of the number of nodes or processing units. Monitor the resource utilization of your Spanner instance, and scale the capacity based on the expected load and your application's performance requirements. You can scale the capacity of a Spanner instance manually or automatically. For more information, see Autoscaling overview.
With a multi-region configuration, Spanner replicates data synchronously across multiple regions. This replication enables low-latency read operations from multiple locations. The trade-off is higher latency for write operations, because the quorum replicas are spread across multiple regions. To minimize the latency for read-write transactions in a multi-region configuration, Spanner uses leader-aware routing (enabled by default).
For recommendations to optimize the performance of your Spanner instance and databases, see the following documentation:
- Performance best practices for multi-region configurations
- Schema design best practices
- Bulk loading best practices
- Data Manipulation Language best practices
- SQL best practices
Caching
If your application serves static website assets and if your architecture includes a global external Application Load Balancer, then you can use Cloud CDN to cache regularly accessed static content closer to your users. Cloud CDN can help to improve performance for your users, reduce your infrastructure resource usage in the backend, and reduce your network delivery costs. For more information, see Faster web performance and improved web protection for load balancing.
More performance considerations
When you build the architecture for your workload, consider the general best practices and recommendations that are provided in Google Cloud Well-Architected Framework: Performance optimization.
What's next
- Learn more about the Google Cloud products used in this reference architecture:
- Learn about replication and consistency in Spanner:
- Get started with migrating your workloads to Google Cloud.
- Explore and evaluate deployment archetypes that you can choose to build architectures for your cloud workloads.
- Review architecture options for designing reliable infrastructure for your workloads in Google Cloud.
- Deploy programmable GFEs using Google Cloud Armor, load balancing, and Cloud CDN.
- For more reference architectures, diagrams, and best practices, explore the Cloud Architecture Center.
Contributors
Authors:
- Kumar Dhanagopal | Cross-Product Solution Developer
- Samantha He | Technical Writer
Other contributors:
- Ben Good | Solutions Architect
- Daniel Lees | Cloud Security Architect
- Gleb Otochkin | Cloud Advocate, Databases
- Justin Makeig | Product Manager
- Mark Schlagenhauf | Technical Writer, Networking
- Sekou Page | Outbound Product Manager
- Steve McGhee | Reliability Advocate
- Victor Moreno | Product Manager, Cloud Networking