Industry is adopting multicloud strategy at a rapid pace for reasons such as avoiding vendor lock-in or utilizing best of breed features of different clouds. Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure remain top three clouds in cloud infrastructure and platforms services category (referred as public clouds in this article). At the same time, other public clouds such as Alibaba Cloud, IBM Cloud, Oracle Cloud Infrastructure (OCI) are catching up fast.
While organizations plan to have two or more public clouds, many of them have on-premise presence also. With multiple cloud platforms including hybrid cloud, there is increased need for consistent management and governance of all cloud platforms. To take advantage of this opportunity, many vendors including cloud providers are offering multicloud management solutions. At the same time, management and governance implementation is very specific to organization’s org structure, internal systems, and controls. You need to carefully identify which solutions can meet your needs.
This article helps you in that identification. It explores different feature areas of multicloud management solutions and provides considerations and optimal ways to implement those features in different contexts.
Types of multicloud management solutions
As per Gartner (see reference in the next section), there are more than 90 vendors that offer multicloud management solutions (MCMS). Many of these solutions not only work across public clouds, but also work with hybrid clouds. This article categorizes multicloud management solutions in the following three areas.
Cloud Management Platforms (comprehensive solutions) from cloud agnostic vendors
Comprehensive multicloud management solutions from vendors, who do not provide their own cloud. These solutions are generally addressed as Cloud Management Platforms (CMP) as they support wide range of features for multi cloud management. For example, Flexera CMP, Scalr CMP, Morpheus CMP. Key characteristics of these solutions are:
- They support most major cloud providers (and at least top providers: AWS, Azure, GCP)
- They support most features expected from a Multi Cloud Management Platform (MCMP)
Solutions from cloud providers
Solutions from cloud providers themselves. For example, Azure Arc, Azure Cost Management, Google Anthos, IBM Multi Cloud Management. Key characteristics of these solutions are:
- They natively support specific feature areas of their own cloud. In addition, they also support some features for some other clouds. For example, Azure Arc supports management of Kubernetes clusters and servers in Azure or on-prem or other cloud providers.
- They do not support all multicloud management features. These solutions support few feature areas such as cost management, monitoring or Kubernetes cluster management.
- Their support for other clouds is also limited. For example, Google Monitoring (formerly known as StackDriver) supports AWS only in addition to GCP. Similarly, Azure Cost Management supports AWS only in addition to Azure; though it has roadmap to support GCP and other clouds.
Specialized feature specific solutions
These solutions cater only to a specific feature area and work across clouds. Key characteristics of these solutions are:
- They are available in mainly these feature areas: infrastructure automation (e.g. Ansible, Terraform etc), cost management (e.g. Apptio Cloudability), monitoring (e.g. Datadog, Elasticsearch observability, Grafana), and backup solutions (e.g. Veeam, CommVault)
- These solutions would be generally best of breed solutions in that feature area
- Organizations can build their own custom solutions (instead of buying a vendor solution) for some feature areas depending on complexity of their business needs.
Feature Areas
As multicloud is gaining momentum, vendors are adding more functionality for multicloud management. These functionalities can be categorized into higher level feature area. This article groups these functionalities into eight feature areas, which are defined by Gartner as required functional areas of a CMP in their 2020 magic quadrant for CMP. If you don’t have Garner subscription, you can download a free copy being offered by Flexera. These functional areas are:
- Provisioning and orchestration
- Service request
- Inventory and classification
- Monitoring and analytics
- Cost management and workload optimization
- Cloud migration, backup and disaster recovery
- Security, compliance and identity management
- Packaging and Delivery
Considerations for feature areas in your context
Gartner listed seven CMP vendors in their 2020 magic quadrant: CloudBolt, Flexera, HyperGrid, Morpheus Data, Scalr, Snow Software-Embotics, VMware. Each vendor has their strengths and limitations. When organizations adopt a CMP, they may not be using all feature areas provided by the CMP. They may be using features, which are most relevant to their cloud environment. At the same time, they may also be using other types of solutions (solutions from cloud providers or specialized feature specific solutions). The featured diagram above depicts landscape of multicloud management solutions with various feature areas and types of solutions.
Many organizations have been using public cloud for many years and already have very mature cloud management and governance processes. They may already be using some solutions which have multicloud capabilities. As multicloud adoption increase in an organization, you may need additional multicloud management features. At the same time, you need to ensure that solution can adapt to your management and governance processes. The following sections describe each feature area and provides key considerations while identifying solution/tool for that area.
Provisioning and orchestration
Some examples of features in this area are: End-user provisioning portal, Provisioning templates, provisioning automation and workflows. Key challenges/consideration in this area are:
- 3rd party provisioning portals will be delayed in supporting new features/services. Cloud providers are releasing new features and services at a fast pace. The speed of releases is such that cloud providers’ own portals (also called consoles) are not able to catch up with their own releases. That is the reason, many times, you will find that APIs or scripting options (CLI or PowerShell) are made available first for newly released features/services. Those features/services are made available in the cloud providers’ portals or consoles little later. For the same reason, 3rd party provisioning portals mostly would be lagging behind the speed of the releases from 1st party cloud providers. The duration of that delay would vary from vendor to vendor. It would depend on investment that vendor is making on that area and vendor’s relationship with cloud providers’ engineering teams.
- Your custom provisioning controls may not be available in 3rd party portals. Most organizations enforce some security, monitoring and governance controls for cloud services provisioned in production environments. Your information security teams, or cloud management teams may mandate to enforce these controls at the time of provisioning. However, as these controls would be specific to your organization, it may not be possible to enforce them through 3rd party provisioning portals or it might require extensive customization to implement these.
Tools like Ansible or Terraform are already very popular in the industry. Your organization may already be using one of these for infrastructure provisioning automation. When using, Ansible playbooks or Terraform, you may be using either Ansible native modules or Terraform’s cloud provider modules. If tool specific module is not available, you have option to call cloud’s native modules such as Azure Resource Management (ARM) templates or Google Cloud Deployment Manager templates. With these approaches you get a lot of flexibility to add security/governance controls specific to your organization. It also allows you to write new playbook/config file whenever there is a need to automate newly released feature or service. You need to evaluate, if the cloud management solution can provide you similar flexibility required to implement your organizations specific controls.
Service request
Some examples of features in this area are service catalog, catalog spending limits, request approval workflows. Considerations in this area are:
- Most organizations would need to customize service catalog to their business specific rules and workflows. It might also require integration with their existing service request portals. You need to carefully assess the tool you plan to use, if it provides enough customizability or extensibility to meet your business needs.
- Service request features need to have close integration with provisioning and orchestration features. If you are using custom provisioning tools/scripts, you need to ensure that CMP or any other tool, you are using for service request features, can call those custom scripts.
- To configure approval workflows, you may need tool to refer your existing organization hierarchy or access control database. In that case, you need to see, if tool can integrate with database that has your organization hierarchy or access control rules.
Inventory and classification
Some examples of features in this area are resource discovery & inventory, configuration policies & change monitoring, cloud platform native tagging integration, untagged resource detection & actions. Accurate discovery and inventory of all your cloud resources is the essential part of governance. Most cloud providers provide APIs and tools to get their inventory. MCMS use those APIs/tools to provide a single centralized dashboard. This centralized dashboard of your resources organized as per ownership or department is key benefit of this feature area. Considerations in this area are:
- While it is easy to use resource APIs to get resource inventory of a subscription or account of a cloud provider. However, when you have hundreds of subscriptions/accounts in your environment, it is almost not feasible to rely on such APIs to query across all clouds and all subscriptions/accounts. You need a dedicated service, which is continuously tracking the inventory and storing that in its own index/database, so that it can give you faster results. In fact, it is such an important and complex use case, many cloud providers have provided a separate service to have a centralized inventory/query system such as Azure Resource Graph, Google Cloud Asset Inventory, AWS Config Advanced Query. You would need to carefully assess performance and scalability of CMP or tool that you plan to use for this feature.
Monitoring and analytics
Some examples of features in this area are centralized log collection, monitoring dashboards, alerts etc. This is another important and challenging area for multi-cloud management. All cloud providers have native logging and monitoring tools with advanced capabilities such as Azure Monitor (formerly Application Insights and Log Analytics), Google Cloud Monitoring (formerly Stack Driver), AWS CloudWatch. For having centralized logging or monitoring dashboards, you may need to export/stream data from each cloud to your centralized tool. Considerations in this area are:
- Size of Log Data: Logs can create huge amount of data and require substantial storage. You already pay to the cloud provider for logs being generated in that cloud. Generally, there are two types of costs: ingestion (size of log data getting generated and ingested) and retention (duration/GB for which logs are retained). If you export or stream all that logs to another centralized tool, you will need to pay for log data in that tool also.
- Latency: Latency refers to the time difference between when data is created on the monitored system and when it becomes available for analysis in your tool. Even when you are using native cloud monitoring or logging tools, there would be latency. For example, the typical latency to ingest log data for Azure monitor is between 2 and 5 minutes, as mentioned in Azure documentation. There would be additional latency involved in getting logs from cloud native source to your centralized tool. In some cases, it is paramount to have near real time access to logs and monitoring data. Some examples of these cases are using SIEM tools (for detecting and remediating security breaches) or using alerts (for auto scale or other action). You will need to see your latency requirements and what is provided by the tool you are planning to use.
- Direct Query vs local index and cost: As mentioned above, you can incur considerable cost for storing and ingesting log data. If you are using a tool that is exporting all logs and metrics from cloud source to its local storage to build local index, you would need to pay the cost to two providers: native cloud provider and your 3rd party tool provider. In addition, you would need to setup and manage additional infrastructure to export or stream logs. Examples of this infrastructure would be Event Hubs and/or Functions in Azure, CloudWatch log destinations with Kinesis streams in AWS, log sinks with Pub/Sub in GCP. Having local centralized copy of log has its advantages. However, if your requirements are mostly about having centralized dashboards, you can use tools that query the cloud source directly rather than storing all logs and metrics locally. This way, you will pay for log storage only at its source. Also, you will not have to setup additional log streaming infrastructure for each cloud. Grafana is one such open source platform, which allows you to query, visualize and alert on metrics and logs no matter where they are stored. It has built-in support for most cloud providers in addition to many other data sources including Elasticsearch, Prometheus, Splunk and many more.
- Feature specific 3rd party solutions: This is one feature area for multicloud management solutions, where you will find many specialized 3rd party tools (commercial as well as open source) in addition to CMPs. Some examples are: Datadog, Elasticsearch observability, Grafana, Logz.io, and Sumo Logic. These tools have their specializations and differentiators such as security analytics, built-in machine learning etc. You would need to evaluate those specializations for your requirements along with other considerations mentioned above to select right tool for you.
Cost management and workload optimization
Some examples of features in this area are cost tracking reports & dashboards, budget definition & alert policies, custom pricing for internal cross-charging. It has been one of the first feature area in multicloud management solutions. Public clouds make it easy and frictionless to provision resources with almost limitless capacity. As a result, organizations without enough governance find cost of cloud consumption going upwards very fast. This feature area becomes crucial to have central cost governance through multiple levels of budget definition and detailed reporting. Consideration in this area are:
- Customizability: Large organizations have need to cross-charge the cost to consuming departments. Generally, a central IT team provides access to cloud resources and additional services around those resources such as governance, information security, support, and automation. IT teams charge the cost of these additional services in addition to cloud usage cost to the consuming departments. Different organizations have different rules to calculate the cost for additional services. Some organizations charge a standard markup (a fixed percentage) on the amount of cloud consumption. However, some organizations may have complex rules e.g. different support cost for PAAS and IAAS (as it requires patching and additional software agents for malware, backup etc.). Also, some organizations may want to charge different fixed amount for different services than a percentage of consumption. You need to evaluate, if the cost management tool provides enough customization options to allow you to have different kinds of markup costs.
- Extensibility: When you are using multicloud, you may also be using components/resources which have been procured from vendors different from cloud providers. For example, you might be using network appliances (like F5 BIG-IP load balancer) with Bring Your Own License (BYOL). You might also be providing higher level offerings built on cloud resources (e.g. your business specific lab environments). For all such components/offerings, which are not billed by any of your cloud providers, you would want that your centralized cost management tool can manage billing for those. You need to see if the tool you are evaluating can be extended to components/offering external to supported clouds.
- Scalability: If you are an organization with hundreds of subscriptions/accounts and thousands of resources, the amount of billing (cost) data generated becomes huge. If you are using cloud for years and want to import all that data to your cost management tool, that may not be easy, you need to check if tool can support that. Also, based on the amount of data that you have, you may want to verify if the tool performs (response time of dashboards and queries) as expected.
- Access control: Cost and billing data is sensitive data. You may want to enable individual departments and teams to view and analyse their data. At the same time, you would want to ensure that they cannot access data which is not related to them. Hence, how a tool implements user management and access control is critical. Tools might give you capability to create local users and roles to enable role based access control (RBAC). However, if you are a large organization, local users and groups will not work for you. You would need integration with your identity provider (Idp), so that you can give access to users and groups in your Idp.
Cloud migration, backup and disaster recovery
Generally, migration, backup and disaster recovery (DR) are three different feature areas. Many times, backup methodologies are used for migration from on-premise to cloud. Also, recovery of backed-up data is used for disaster recovery. These factors seem to be the reasons, why Gartner combined these three areas into one area in CMP magic quadrant. However, Gartner listed primarily backup related features in this area. For example, backup for different storage types: object storage, block storage, compute. This is relatively new area for CMPs, hence features in this area are still maturing. Consideration in this area are:
- Some CMPs rely on integration with 3rd party tools to provide features in this area. Some CMPs allow you to backup using native cloud capabilities. At the same time, there are very mature specialized backup vendors such as Veeam, Commvault, and Veritas that also provide multicloud backup solutions. Enterprise storage vendors such as Dell EMC and NetApp also provide multicloud backup solutions. Having reliable backup is critical to your business, hence you need to carefully evaluate the tool or solution, if it meets all your backup and recovery needs. If you need to choose between less reliable single tool for backups and more reliable separate tools for each cloud, you should choose the latter.
- There are different types backup solutions you may need. For example, on-premise to cloud, backup of cloud resources to same or different location, cross-cloud backups for better business continuity management, long term data retention etc. You might also have SaaS solutions, which you want to backup using same tool or solution. You need to see if the tool or solution you are evaluating can meet all types of backup needs that you have.
Security, compliance and identity management
Some examples of features in this area are role based access control (RBAC), cloud-platform-native console SSO, IAM and network policies, security event notifications, Security configuration management, compliance scores. This is a very wide area; one single platform or tool may not cover all the features in their entirety. Also, some features might be covered by, or have overlap or dependency with features in other area covered earlier. For example, security event notification can be covered by log monitoring and analytics tool like Sumo Logic, compliance score might have dependency on tagging feature provided by inventory and classification feature area. Considerations in this area are:
- This feature area is such a broad area, you might be using multiple tools for different requirements: native as well as specialized. CMPs might be providing integration with those tools than providing those features themselves. For example, you might already be using a centralized solution for IAM (e.g. Active Directory, LDAP, Azure AD, Google Identity, Okta, Ping Identity etc) or SIEM (e.g. traditional tools like Splunk, IBM QRadar, McAfee SIEM or cloud native tools like Azure Sentinel, Sumo Logic). Similarly, for key and secrets management, you might be either using cloud providers native features or specialized multicloud solution like HashiCorp Vault. You need to ensure that CMP can integrate with these specialized tools.
Packaging and Delivery
This feature area is more about packaging and delivery capabilities of CMPs than their multicloud management capabilities. Some examples are integration with cloud providers, exposing APIs through SSL, authentications through SAML based SSO, self-service support resources, support plans and 24/7 incident support. Hence considerations in this area are generic considerations that you do while evaluating any vendor product.
Summary
MultiCloud management solutions have evolved a lot in recent past and provide extensive features. CMPs provide a comprehensive package of multiple feature areas supporting most public clouds. At the same time, cloud providers have started providing solutions in some feature areas that enable you to manage those features in hybrid and multicloud environments. In addition, there are specialized solutions which provide industry leading capabilities in their area of specialization. For your organization, single vendor solution or CMP may or may not meet all requirements. For large organizations which are already mature in cloud adoption, you may need specific solutions for individual feature areas. Some of these areas can leverage vendor solutions or cloud native offerings and other solutions might need custom development. You need to assess your multicloud management requirements in each feature and then choose the right solution or solutions for your organization.
Originally Published at Sanjay's blog