The server upgrade was started on time last night and all the websites are currently being backed-up for transfer to the new server.
Unfortunately, we do not know how much longer it will take to complete, it could be minutes it could be several more hours.
Our suggestion is that you try to avoid doing updates to your sites during this time as the changes you make may be lost during the changeover.
We will do what we can to help eliminate any losses once the transfer is complete and there should be very minor disruption.
Your website is still live and running during the process, there should be little or no actual downtime.
I will keep you updated as we go.
We’re looking good right now for the upgrade to start at 11pm Eastern Time today (09/03/2010).
We have been working on slimming down the amount of data on the server to help the transfer go as smoothly and quickly as possible.
The actual transfer is being done by some techy geniuses at the datacenter where the servers live and so, if they are busy with an emergency for another client, we’ll wait until they’re done and they can give us their full attention.
At the moment though, I haven’t heard anything from them suggesting that the move will not start as planned.
I’ll keep you updated.
Due to the issues we were having with the LAMP server, we have been simultaneously working on two solutions.
The first solution was to work with the Techy Geniuses who maintain the servers to find a fix for the frequent periods of downtime. The fix was implemented a week ago and the server has been performing very well since then, with no reports of downtime.
This, though was only ever going to be a temporary measure as we have also been working on the second solution, which was to purchase and test a new, more powerful server and move all of our clients from LAMP over to it.
The new server, which is called Genesis, has been performing very well in our tests and so we are going to go ahead with the move this Friday night.
There should be virtually no disruption to your websites during the move:
- All usernames and passwords will remain the same
- All domain details will remain the same, meaning all internal site links will still function
- All email will be copied across
The only disruption you may notice is that comments on blog posts and emails received during the transfer may be lost, which is why we are doing it on Friday night, to minimise any possible losses.
We are not certain how long the transfer will take, but we are estimating that it will take around ten hours.
We will keep you updated here at day3.co.uk news
The transfer is scheduled to start at 11pm eastern time on Friday September 3rd 2010
To all of our customers,
Server reliability is our top priority.
As a website owner/operator, your first requirement is that your website is available to your visitors as much of the time as possible. We understand that. We need our websites to be up and available as often as possible too.
Over the past couple of months, we have received a number of emails from client’s whose websites have been ‘down’ for a period. These periods of downtime, though fairly short, can be frustrating and disconcerting and we want to do everything we can to reassure you that we are working to deliver the best possible server performance for you.
The downtime falls into one of five categories:
- Scheduled Downtime. This is time that the server is down for scheduled maintenance/upgrades. We have not had any periods of scheduled maintenance recently and will notify you in advance if we do.
- Nightly Backups. Every night, between 1am and 5am Pacific time, we run an R1Soft Backup of the entire server. This backup enables us to rebuild the server should a serious error occur. It appears that, as the backup runs, it locks the files it is currently backing up to ensure that they are backed up correctly. This can result in 10-15 minutes of downtime per domain. We are working to minimize this downtime while still maintaining complete backups that can be reliably restored from.
- Resource Draining. The physical resources of the server are shared between all of the domains using that server. If one or other domain is compromised or the owner runs a script which severely drains the available resources, any or all of the websites can appear ‘down’. Generally, this type of downtime simply necessitates a server reboot or a restart of one or other pieces of software. This is the most common cause of downtime and is a symptom of sharing a server with other websites. Downtime due to resource drains is generally limited to fifteen minutes or less.
- Hardware Failure. The most problematic cause of server downtime is hardware failure. Although we use the best possible hardware in our servers, nothing is perfect and something will inevitably break eventually. In this eventuality, the technicians will try to replace the faulty component but, if that is not possible, will replace the server with a new one and restore from the latest R1Soft backup. The process can take anything from a few minutes for a component replacement to several hours if a restore is needed.
- Site Issues. Frequently, customer’s websites are down not because of a server issue but because of an issue with the individual website itself. While we try to help with these issues, we cannot fix every problem and cannot devote an extended amount of time to troubleshooting client site issues.
Most of the ‘website down’ emails we have received lately have been due to either the nightly backup running or a resource drain issue.
While the actual percentage of the time that our servers are up and running is very high, we understand the frustration that comes from having your website unavailable at different times so we are currently working on the best solution to minimize these issues.
Hopefully within the next few weeks we will be announcing a plan for how to deliver even better server performance than we are now giving.
On Monday, November 30th, there will be an emergency maintenance performed on the core router providing connectivity to a major portion of the network –
Cisco Systems have identified a service impacting bug in the current IOS version running on this router. A vendor recommended upgrade has been provided as a resolution and needs to be implemented immediately.
Original Issue Reference:
With the introduction of 4-byte ASNs (Autonomous System Numbers), a newly devised method of injecting a specifically crafted prefix into the global route table allows for the BGP process to be reset on older IOS code.
During this maintenance, customers will notice a complete loss of connectivity between 10:30 PM PST and 11:30 PM PST (Monday, November 30th). While the upgrade duration is scheduled for 1 hour, we only expect between 15-20 minutes of downtime as the routers reload and fully converge.
If you have any questions or concerns in regards to the above, please let us know and we will be more than happy to address them.
We do appreciate your patience and welcome any feedback.