VM Metrics (preview)

Introduction

Game developers sometimes need access to system level metrics (CPU/RAM/etc.) for the Virtual Machines that are created as part of their Builds. These metrics can provide unique insights to the game developer during the development of their multiplayer game server. Given these metrics, a game developer can make informed decision about utilizing VM resources.

Performance metrics can support various development scenarios:

  1. Using CPU and memory utilization data to measure the resource needs of multiplayer servers, so game developer can properly calculate the optimal number of game servers on a specific Virtual Machine SKU (type)
  2. Using network counters to detect an irregular network environment, such as attempted DDoS attacks or other network congestion

PlayFab Multiplayer Servers service supports a limited number of system metrics via the VM Metrics (preview) feature.

Important

This feature is currently experimental and offered as a preview. This means that parts of it can change at any time. We are significantly iterating on the experience and asking customers for feedback.

Usage

VM metrics for a Build can be enabled in two ways, depending on how a Build is created:

  1. Using Game Manager, you can enable the "Virtual machine metrics preview" checkbox on the "New Build" Game Manager page.
  2. Using the PlayFab Multiplayer Servers API, you can set the property "IsEnabled" to true in the following API objects:

When VM metrics feature is enabled for a Build, it will remain enabled for the entire lifetime of the Build. You can't enable/disable VM metrics for a Build after it has been created.

Windows

On Windows, VM metrics collection is a feature of the existing PlayFab container/process orchestrator called VmAgent. VmAgent periodically (every 10 seconds) runs a task that will query the following system performance counters:

  1. Available MBytes
  2. % Processor Time
  3. % User Time
  4. Disk Reads/sec (for drive D:)
  5. Disk Writes/sec (for drive D:)
  6. Bytes Received/sec
  7. Bytes Sent/sec

Collected counter values are sent to our internal metrics collector running on the VM. The collector aggregates and sends them to our internal backend so they can be presented to the user on Game Manager.

Linux

On Linux, we're using the open-source telegraf agent for collecting and processing metrics. Telegraf collects metrics every 10 seconds and emits them to our internal collector agent every 60 seconds. For reference, below you can see the contents of the telegraf.conf configuration file we're using, feel free to consult the official telegraf docs for further details.

We're also using an internal utility called telegraf-geneva-processor that emits the diff value for counter level metrics (like net_bytes_recv) metric. Emitting the diff value instead of the actual counter value results in better visualization in the provided Game Manager graphs.

[agent]
  interval = "10s" 
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "60s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  omit_hostname = true
  
[global_tags]
  titleID = "TITLE_ID"
  buildID = "BUILD_ID"
  vmID = "VM_ID"

# consult man proc for details
# https://github.com/influxdata/telegraf/tree/master/plugins/inputs/cpu
[[inputs.cpu]]
  percpu = false
  totalcpu = true
  name_prefix = "telegraf_"
  fieldpass = ["usage_system", "usage_user"]

[[inputs.mem]] # https://www.linuxatemyram.com/
  fieldpass = ["available_percent"]
  name_prefix = "telegraf_"

[[inputs.net]] # /proc/net/dev
  fieldpass = ["bytes_sent", "bytes_recv"]
  name_prefix = "telegraf_"
  interfaces = ["eth0"]
  tagexclude = ["interface"]
  
# https://github.com/influxdata/telegraf/tree/master/plugins/inputs/diskio
[[inputs.diskio]] # /proc/diskstats
  fieldpass = ["reads", "writes"] # number of reads and writes on sdb device  
  devices = ["sdb"] # sdb device contains everything (including container storage) apart from /mnt
  name_prefix = "telegraf_" # which is the place where some of our shared folders are

# grab the allocated percentage from VmAgent. Be aware that this must be in influx format  
[[inputs.http]]
  urls = ["http://localhost:56001/v1/metrics/allocatedpercentage"]
  name_prefix = "telegraf_"
  data_format = "influx"
  tagexclude = ["url"]

# send all telegraf data to internal collector
[[outputs.socket_writer]]
  address = "unix:///var/etw/mdm_influxdb.socket"
  data_format = "influx"
  
# write this data to a file. This might be removed in the future  
[[outputs.file]]
  files = ["/tmp/PerformanceMetrics.csv"]
  data_format = "influx"
  rotation_max_size = "100MB"
  rotation_max_archives = 5
  
[[processors.execd]]
  command = [
    "/usr/bin/telegraf-geneva-processor", 
    "-configFile=/etc/telegraf/telegraf.geneva.processor.conf"
  ]

This telegraf.conf configuration enables the telegraf agent to collect the below metrics:

  1. cpu_usage_system
  2. cpu_usage_user
  3. memory_available_percent
  4. net_bytes_recv_diff (network bytes received for eth0)
  5. net_bytes_sent_diff (network bytes sent for eth0)
  6. diskio_reads_diff (number of reads for sdb)
  7. diskio_writes_diff (number of writes for sdb)

Similar to Windows, telegraf sends the collected counter values to our internal metrics collector. The collector aggregates and sends these values to our internal backend so they can be presented to the user on Game Manager.

Allocation Percentage

In both Windows and Linux VMs, we're emitting a metric called Allocation Percentage. The value of this metric is calculated by dividing the number of Active servers with the Total number of servers on the VM. This metric is meant to be used when evaluating and interpreting the reported system metric values. This is because the value of the system metrics will probably be different on a VM with lots of Active servers compared to a VM with lots of StandingBy servers.

Viewing VM metrics

When you enable the VM metrics feature for a new Build, metrics will be emitted as soon as the Build is successfully deployed. You can use the "Virtual Machines" (https://developer.playfab.com/en-US/<YOUR_TITLE_ID>/multiplayer/server/virtual-machines) page on Game Manager to get a link to display VM metrics for a specified VM.

You can also access the VM metrics by selecting your build, going to the servers tab, and hitting the menu inline with the VM you would like to see the metrics on, and select "View Metrics"

View VM Metrics

How can I submit feedback for this preview feature?

Join us on Discord on the #multiplayer-servers channel, we would love to get in touch and hear your thoughts about this feature!