And people wonder why i LOVE our host....
Hi Paul
My name is Adam Smith. If you've not read one of my emails before, I'm the technical director at
Vidahost and the person responsible if things go wrong.
At Vidahost we are aware that our success over the past 11 years is due almost entirely to customer referral and word of mouth, and we know that our high service level is the primary reason for this. Therefore, when we fail to meet those service levels, I feel it is important to tell you why.
Today (Tuesday 12th August) we experienced a network routing issue between approximately 9AM and 11:30AM. This meant that
some visitors in the UK and elsewhere in the world were unable to reach our network, including customer websites, our own website, email services and other services provided by Vidahost. It’s important to point out that this issue was
sporadic and only affected a small number of people. For the majority of the world, services were accessible throughout.
Below is the graph of one of our many connections to the public internet to demonstrate. You will see a small drop in traffic whilst the issue was occurring and 2 additional ‘blips’ which are due to the router reloads (more details below) but otherwise traffic levels were only slightly affected. The arc you see is unrelated and is part of our daily traffic flow, which naturally drops overnight.

Please rest assured that we dealt with this with the utmost urgency and let me re-iterate that maintaining our reputation for uptime and reliability is of the highest concern to us. We understand that, for many of you, your online business is your livelihood.
For those who are interested, the technical explanation...
The global routing table is a ‘map’ of each possible destination on the internet. Every large network operator (such as ourselves, or the ISP you use to connect to the internet) holds a copy of this, or multiple copies in our case. This is what enables every computer on the internet to reach every other computer.
Over the past two decades the routing table has been increasing in size, due to new ipv4 addresses being used and existing ipv4 address ranges being split (meaning that 2 consecutive ranges might have different paths). Today it hit 512,000 routes. This is a magic number as it’s an inbuilt limit in many common routers and switches.
We had pre-empted this. Most of our routers/switches already have a higher limit and we have recently spent £250,000 on network upgrades to improve the rest of our network. These had not yet been installed as we believed we had room to spare. However, last night there was a sudden increase in the number of routes being announced to the world and at 9AM we hit the limit.
Due to human factors it took us approximately half an hour to find the cause of the issue. At that point we applied a fix on the only router we believed was affected. This took effect after a reboot (which caused approximately 60 seconds of packet loss as indicated on the graph) and the majority of people who could not access our network were then able to. However some users were still reporting problems so we continued to investigate. We believed the issue may lie elsewhere as customers were also reporting issues reaching websites such as eBay and Skype but, despite the lack of any log entries to indicate, it turns out another of our Cisco routers had also hit its routing limit. The same configuration change was applied to that router and, at that point, the remaining people still having problems accessing their site were now able to again.
It seems many other high profile ISPs also suffered the same issue today and most have now fixed their own networks.
Over the next 2 weeks we will be replacing all the affected routers with brand new Juniper devices which can hold enough routes to cover us for the next decade.