I was recently faced with an interesting problem. A company wanted to cost the migration of thousands of VMs
to Azure
using a lift and shift approach (also known as rehost). Due to the short deadline, we were not able to get our hands on detailed data. All we were provided with was a machine name, CPU
cores count, RAM
and a description field that was sometimes populated. Utilisation, storage and network usage were notably missing. We knew we couldn’t cost the migration accurately due to these unknowns, but we had enough data to cost the VMs
themselves as we had access to CPU
cores count and RAM
. I must also add that the VMs
varied greatly in their hardware specifications.
Microsoft
offers a pricing calculator but it only supports manual input which disqualified it for our use case. A few Microsoft
employees wrote web applications automating the pricing of VMs
by importing Excel
spreadsheets or CSV
files. The ones I tried only offered USD
as a currency and choked for anything bigger than a few hundred VMs
. The output file was using a en-us
culture so it had to be post-processed before being open in Excel
. I didn’t have the time to review and select a commercial solution (Azure Migrate requires to create a VM
on-premises which was not possible). At the end of the day I came up with a semi-automated process that did the trick, but I felt that not much work would be required to empower teams to price VMs
based on a limited data set.
Requirements
I wanted to build something that would fit my use case (AUD
and en-au
) but could also be used by people anywhere:
- Support thousands of
VMs
- Support all currencies
- Support all cultures
- Ability to automate refreshing of the pricing
Timeboxed to a weekend
Note: you can find the code on GitHub.
What I came up with
My first goal was to retrieve the pricing from Azure
. I initially considered the Resource RateCard (part of the Billing API
) but the banner below didn’t fill me with confidence:
The Billing API
requires authentication and parameters to be passed in, which would have increased the complexity of the solution. I knew one place would have mostly up-to-date pricing: the Virtual Machines Pricing page. This page displays all the available instances for a specific region. It is also possible to select a culture, currency and operating system. There was one problem though: the data is available as HTML
markup instead of an API
.
Puppeteer
Puppeteer allows you to control Chrome
or as the project puts it:
Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.
It feels more natural to write a crawler in JavaScript
(JavaScript
is the language of the web after all). According to my limited experience, Puppeteer
is also significantly faster than Selenium Web Driver
. I didn’t bother creating a npm
package so you’ll have to clone the repository and follow the instructions. This is the kind of output you can expect from the tool:
I made the assumption that a single culture and currency will be used through a pricing session and this is why I only encoded the region and operating system in the generated file names. Calling this tool can easily be automated as it doesn’t require any configuration and generates files on disk. You could run it at regular intervals and publish the artefacts.
Once we’ve got our hands on the pricing, all we need to do is size the VMs
and cost them.
Coster
The Coster
is a .NET Core
console application. Again, I didn’t bother pushing a NuGet
package so you’ll have to build from source. You’ll need the pricing files generated by the Parser
. The input expected by the Coster
is a CSV
file with the following format:
Once done, the Coster
will write a CSV
file:
Conclusion
Let me know if you’re using these tools and I’ll tidy up and publish packages on npm
and NuGet
. I can’t say I’ve tested them extensively so be ready for some rough edges!