Cloud computing is a model where big corporations host and manage an IT-infrastructure (and offer services to clients, obviously). These big corporations have to invest heavily in big en powerful data centers to be able to host Internet-scale applications.
At work, we have been investigating and discussing Azure, which is Microsofts future cloud offering, in a Special Interest Group (Led by Yves). The most important business cases where it would make sense to use Azure is where customers don’t want to or cannot invest in infrastructure to try out an idea that could potentially become popular.
The Concept of Fail Fast or Scale Fast is important in this respect. Start ups can put some innovative features online, if they catch on: superb, scale it by adding more nodes to handle the traffic. If they don’t catch on: too bad, take it back offline. Other interesting cases include services that are heavily spiked. For example a concert-ticket sales application. Usually all tickets for popular are sold out in 24 hours of opening sales. To handle these spikes, companies have to have a lot of excess capacity, which is not used most of the time, but can hardly handle the spikes when they occur. In such a case the service could just add a large number of capacity for a small period of time.
In the case of Azure, Microsoft built a software-fabric on top of a bunch of connected systems. This allows computing nodes (virtualized servers running the application) to be redistributed and managed within the data centers as Microsoft sees fit (for example to optimize temperature within a datacenter). There is no guarantee that any node will stay up at any point in time. Software developers have to take this into account when designing their software. There are no transactions (in the classical sense). A classical relational database is hard to use. It has to be designed upfront to be scalable. Reliability and Availability are created through the use of replication, partitioning and smart routing.
What about turning it all upside down.
Think about this: instead of limiting the fabric to the corporate data centers, why not take advantage of the “entire” Internet in a peer-to-peer kind of grid. We are all in front of machines that are largely overpowered most of the time, so why not let cloud-provider use this excess capacity in return for a small fee or other compensation to cover the energy bill. You could allow a portion of your own system to be taken up by a virtual machine, which is managed by the cloud-provider, you don’t have to worry about it. It’s there, eating away your idle CPU-cycles — or not. The cloud-provider pays by use, noting less, nothing more. If for example Microsoft would go this way, they would not have to do so much work. Their own processing nodes too can fail, and the fabric is very capable of handling such cases. The only thing they would have to do would be to support more types of hardware (virtual-PC runs already on a lot of systems out of the box). Create some infrastructure (set up the p2p network). Invent some smart algorithms to take advantage of locality in the network plus basic resource management.
Something similar already exists in other context. Think about large-scale 50.000+ node bot nets (used to relay spam, ddos, …). Think about BOINC.
Why not try to take advantage of all those wasted CPU-cycles? I think this can have huge advantages in the future. Due to the distribution of the load, cloud-providers have to worry less about cooling, energy, availability (if the network and management can be really distributed and self-healing). A possible problem is that it is not so easy to predict available capacity, but they could restrict the use of off-premise nodes to the cheapest best-effort SLA’s (no hard guarantees in the service contracts).
Another thought: The cloud-provider could try to move the load users are generating back to their own machine, if they are sharing resources. If someone is using a web-application, why not host the compute intensive load on their own machine, making the use of the service cheaper. This is where I come full-circle;
Any feedback is hugely appreciated.