03Aug

Cloud Automation, Cloud Expense, FinOps, Highlight, Tag policy

Tagging and Cost Allocation

ABSTRACT

Tagging cloud resources for segmentation lays the foundation for showback and chargeback as well as for automations around cloud governance and waste elimination. Cloud invoices typically only support very basic groupings of costs, such as by account, project or subscription.

Tagging is the way to further separate your cloud costs and align them with your business: cost centers, business units, teams, applications and micro services. All major cloud providers support the tagging of most resources with some exceptions, such as network traffic.

This white paper explores best practices around tagging your cloud resources to
gain greater visibility into your costs.

Tagging Standards

Consistency is key when it comes to tagging, so the first thing you’ll need to do is build a set of standards for your business. When designing a tagging standard, using fewer mandatory tags generally works better.

In our experience, many businesses have a multi-cloud configuration, and each provider has their own limitations on character counts and types for a tag’s key and value properties.

Implementing the same tagging standard across providers requires working each vendor’s
limitations into your standard.

HOW TO CREATE A STANDARD:

Start by determining how your business looks at cost from a finance perspective
Build a minimum set of mandatory tag keys
Socialize your draft tagging standard with engineering leaders to collect feedback
Be sure to manage intersections with any existing tags

Note that tags are case-sensitive, so you will also need to manage the likely scenarios of engineers using incorrect character case or misspelled tags.

A tag is simply a key and value pair.
For example, “environment” : “production”.

Tag Enforcement

There are different methods for enforcing tagging in the cloud. The strictest option is to deny workload deployment when tags do not follow the standard. This will ensure more consistent tag use, but will also introduce delays and even frustration.

A more collaborative approach is to produce a tagging compliance report, in which each workload owner is assigned a compliance percentage goal. This allows leadership and engineers to build tagging compliance roadmaps into their sprint planning.

Managing Tag Changes

Over time, your staff will experience turnover. Teams and their workloads will be split, merged or reassigned to match organizational changes. This means that your tags will change over time, too.

For example, a workload that had the “application equals mobile” tag from January through June will start using the tags “mobile_api” and “mobile_cache” in July. When applying budgets or forecasts, the two new tags will not have spend history in their billing data. This necessitates an external system to manually model tag transitions in order to
provide consistency in reporting.

Cost Allocation

Cloud costs need to be allocated to their responsible owners by mapping tags to employees. But since people leave, new people join and existing staff are reassigned, we should avoid assigning owner tags directly to cloud workloads.

This is because changing a cloud tag is a relatively heavy lift. The tag information may be stored in the source code, which then has to go through a merge and approval process, after which the workload needs to be redeployed – all without impacting your customers.

Ideally, we map cloud tags to business owners in a database where those owners can be quickly reassigned without requiring this process.

For showback and chargeback, it’s important to show cloud cost at a high level, such as VP, business unit or cost center. But it’s also useful to be able to drill down to an individual team lead or engineer for troubleshooting, such as when determining the root cause of a budget overrun.

Managing Shared Costs

Once a tag combination has been classified as a shared cost, there is nothing preventing engineers from using this tag combination for other workloads. This can result in scope creep of shared costs and requires regular reviews to determine if the workloads are tagged correctly.

SOME COMMON TYPES OF SHARED COSTS ARE:

Shared resources, such as network or shared-storage Platform services, such as Kubernetes or logging infrastructure
Enterprise level support
Enterprise level discounts
Licensing, such as third party Software as a Service (SaaS) costs

COMMONLY-USED SHARED COST MODELS ARE:

Proportional: Based on the relative percentage of direct costs of cloud workloads
Even split: Total shared cost is evenly split across cloud workloads
Fixed: A business-defined percentage, which adds up to 100%

Managing Untaggable and Untagged Resources

Even if we were able to fully tag all cloud resources, which in reality is not possible due to the velocity of innovation, some cloud services, such as those related to data transfer, do not support tagging. Untaggable and untagged resources need to be allocated using the same principles outlined above, in Managing Shared Costs.

FinOps Maturity Phases

Your FinOps practice will likely have different maturity levels for different functional areas. For example, tagging may be in the Operate phase while forecasting is in the Inform phase. Over time, all areas will mature, but not all need to graduate to the Operate phase.

For example, an annual process with low impact on the business may stay in the Inform phase indefinitely, and that’s fine. Further optimization of this type of process would consume resources that can be better prioritized elsewhere.

Tagging and Cost Allocation by Maturity Level

INFORM PHASE

Tagging will be done at a higher level (for example, VP, business unit or cost center)
Cost allocation will be performed by simply apportioning the cloud bill by account, project or subscription
Shared costs will be apportioned without further detail until costs become more substantial
Cloud native tools and spreadsheets are predominantly used

OPTIMIZE PHASE

Tagging will have a well-defined strategy that reaches the application or service level
Cost allocation will use metadata to identify engineering leads responsible for specific workloads
Shared costs will use well-established mechanisms like shared pools to apportion costs fairly to business owners
A combination of native and third party tools will be used to accomplish various tasks
Key performance indicators (KPIs) for cost allocation are being introduced, but may not be automated yet

OPERATE PHASE

Tagging will be as granular as the business requires and utilize automation for deployment, governance and management of changes over time
Cost allocation will be automated and able to manage shared costs as well as untaggable and untagged resources
There will be few gaps of costs that are unidentified and little to no manual tracking is required
A combination of native, third party and in-house tools is used to accomplish all required business activities

KPIs are automated and regularly reviewed

Measuring Success Using KPIs

KPIs help you drive business goals, and they allow you to identify trends over time. Using historical KPI trends, thresholds can be set to identify outliers.

SOME COMMONLY-USED KPIS FOR TAGGING AND COST ALLOCATION ARE:

Percentage of taggable resources with tags
Percentage of tags complying with tagging standard
Percentage of total cloud spend that is allocated
Percentage of shared cost that is allocated
Percentage of untaggable and untagged cost that is allocated

A typical target for the Inform phase is 80%, while the Operate phase will target above 90%. More mature practices will also utilize automation to notify workload owners when non-compliant deployments are made.