Datacenters are a source of fascination for me. The startups I worked on early in my career targeted enterprise-class customers, primarily telecommunications and financial services, which counted down time in ”millions of dollars per minute.” I particularly enjoyed the technical challenge of delivering products that would “measure up”. At one point the company I worked with had the distinction of being the only sub-$100M company to successfully ship products to the Japanese telecom company NTT DoCoMo.
While it’s no surprise that I’d find such a technical challenge irresistible, the dynamics of enterprise datacenters are equally fascinating. For one thing, enterprise datacenters are populated with equipment from the industry “A-list” – HP, EMC, Cisco, and a few other companies of like profile. These products are more expensive, but enterprise CIOs have long held the opinion that the stuff never fails, and as such, is worth the price.
I used to alternate between puzzlement and annoyance at the fact that some of the A-list companies had a reputation for reliability and uptime that my startup companies couldn’t quite achieve, even though we seemed to have identical components and manufacturing techniques. My last blog covered the differences I’ve since found in server-class hardware, but there was something else I realized was different about the companies who used A-list equipment – they also have robust monitoring systems and load management software to shield customers from potential downtime.
Another perspective I gathered from experience is related to datacenter economics. I’ve observed that – while the hardware costs for enterprise datacenters are staggering – by the time datacenter facilities, power, personnel, and support costs are factored in, the hardware costs are less than 10% of the overall budget.
So how does this experience translate to Axcient? From day one at Axcient we’ve decided to design and maintain our own datacenters, bypassing the option of architecting a solution around a public cloud. While our target market is SMBs, the Cloud components of our service are so important to our core functionality that our reliability requirements closely parallel the reliability requirements of enterprise datacenters. Add to this the economics of delivering an enterprise-class solution to the SMB market, and we wind up with formidable technical challenges along multiple dimensions. We must scale up to multiple petabytes without large staffs of personnel, all while maintaining a small datacenter footprint and enterprise-class uptime. Once again our partnership with HP provides a good starting point to make this possible.
Start with the most reliable hardware…
The HP IBRIX scale out NAS solution scales smoothly up to 16 petabytes in a single namespace. The self-healing nature of the IBRIX system eliminates the need for offline file system repairs – a significant benefit for highly available multi-petabyte file systems. Current generation HP blade servers are extremely dense, as are certain IBRIX storage arrays. The price tag for this equipment is high, but in aggregate has a very small datacenter footprint, and is relatively lightweight in regard to management overhead.
…Then create great monitoring and load management software
I learned from my early experience with enterprise datacenters that – regardless of the quality of our datacenter network equipment, servers, and storage – hardware still fails. Given this fact, one critical piece of functionality is the software the Axcient team has created to monitor our datacenter infrastructure. The monitoring functionality stitches together the monitoring APIs surfaced for our server, storage, and networking infrastructures. It’s a straightforward approach, but still critically important since a highly refined monitoring system allows us to manage our infrastructure effectively with a vanishingly small staff.
The final and most vital component is Axcient’s proprietary load management software. The management software we’ve developed is quite complex so that it can manage our end-users’ variable, unpredictable workloads. Most of our offsite activity is in the evening hours. In a typical evening we’ll ramp up from a few hundred to many thousands of offsite tasks, scanning billions of files in the process. The datacenter seeding process for new systems runs 24×7. Axcient Cloud Continuity failovers can happen any time of the day or night. Regardless of workload, our load management algorithms keep our datacenter infrastructure running smoothly, even when hardware components are being serviced or replaced.
While datacenter design is a formidable engineering challenge, when it’s done correctly, it can deliver enterprise-level functionality at a price point suitable for the SMB marketplace.
Try out the Axcient service for yourself by signing up for our FREE 30-day trial.