Ben reached out to me with his idea and asked if I could help develop a script to collect basic configuration information from a vSphere environment to help test out his idea. I was immediately intrigued with his idea and saw the huge potential value that Ben’s unique solution could bring to the virtualization community. The coolest thing about this project is that we were able to put together a working prototype within a week’s time!
Note: Also be sure to check out Ben's article vOpenData - Crunching everyone's data fun for fun and knowledge and his perspective on how he was able to quickly develop a prototype leveraging a PaaS solution.
What is vOpenData?vOpenData is an open community project that grew from the question "What is the average VMDK size for deployed virtual machines?” We wanted to create an open community database that is purely driven by users submitting their virtual infrastructure configurations. Leveraging the powerful virtualization community and applying simple analytics we are able to provide various trending statistics and data for virtualized environments. This is 100% community driven and the results will be available for everyone to view and hopefully you will contribute to the overall dataset!
What information do we collect?We made an effort to not collect specific information such as hostnames or even display names that could be used to identify a particular organization. Instead, we are using UUIDs which are automatically generated by the virtualization platform to uniquely identify a particular object. This allows us to keep track of changes in the our database when a new data set is uploaded from an existing environment. In addition we are collecting various configuration data and you can find a complete list in the Data FAQs
More info on the data we collect is here: Data FAQs
What will this data be used for?We are planning on using this data to create some interesting statistics and data modeling for the community to use in capacity planning and analysis. Most of this data will be made available through a dashboard or reports and eventually through an API to be mixed into other applications.
What about privacy concerns?Though the data that is collected is already anonymized and non-identifying, please ensure that you are abiding by the privacy policies of your organization when uploading this data. If you are concerned about the data, it is recommended that you audit the zip contents before uploading which are just CSV files. We only ask that you do not modify the schema at all.
How do to get started?
Step 1 - Check out the sexy vOpenData Public Dashboard here to get a glimpse of some of the information you will find by submitting your configuration data.
PowerCLI or vSphere SDK for Perl script which you will run against a vCenter Server which will produces a compressed zip file containing several CSV files. Instructions are available on the download page. You may rename the default file name vopendata-stats.zip to something else, as long as you do not modify the contents of the file.
Step 3 - Open a browser and go to http://www.vopendata.org and sign up for new account.
Step 4 - Click on the “Infrastructures” tab at the upper left hand corner. An Infrastructure is a logical view that can help you organize the data you have collected. You can associate a single vCenter Server with an infrastructure or you can combine multiple vCenter Server data sets into a single infrastructure. The choice is really up to you on how you would like to visualize your data and whether you would like to map that to the physical location of your virtual infrastructure.
We hope you frequent the vOpenData site regularly as the community uploads more and more data and see how statistics are trending over time. We would also like thank the following people who were part of our early alpha program and assisted with both testing as well as code contributions: Frederic Martin, Raphaël SCHITZ, Timo Sugliani and of course my Automation colleague Alan Renouf! If you would like to learn more about the vOpenData project, we have also submitted a session for VMworld 2013 4976 - vOpenData - Crunching Everyone's Data For Fun And Knowledge, be sure to vote for it!
You can follow @vopendata on Twitter for new updates and notifications as well as both Ben Thomas at @wazoo and William Lam at @lamw