PDA

View Full Version : Has anyone on Host155.rimuhosting.com been able to recover the data rimu lost?


ericds
08-24-2007, 06:20 PM
Rimu decided not to post this publicly, which I find disappointing, on the server maintenance page: http://rimuhosting.com/maintenance.jsp?server_maint_oid=43396931

But they lost all the data on Host155 and we were in limbo for close to 24 hours before we were told that our data was gone and that they set us up with a fresh vps.

From a Rimu email:

"We hit a problem with your host server, host155.

This problem has resulted in us losing your VPS filesystem data. We will be doing a clean reinstall of your server.

The details are as follows...

== Problem Details

Host 155 had stopped responding. After a restart it gave out machine check exceptions. This indicates a hardware problem. We had not seen any indications of problems up to that point. e.g. no disk problem email alerts, no recent restarts.

We swapped the disks into a completely new server. That server refused to see some of the disks/took an age to boot up/ended up with unusably poor disk IO.

We tried putting the disks in external usb enclosures and reassembling the raid array. It did not take (mdadm refused to assemble the disk, when we forced it we just got ext3 errors).

For our host servers we have a raid array and a separate backup disk. We tried getting the data off of the backup disk by mounting it internally in the new server and via a usb enclosure. We had no success with either method. We were getting hardware errors on it.

It seems like some issue has therefore resulted in breaking a disk from the raid array _and_ the backup drive at about the same time.

At this point we have no alternative method of getting back your VPS filesystem."

They are giving us one month free and a 64mb memory upgrade, which seems like a joke after losing all of our data and the extended downtime.

Has this ever happened to anyone else? Has anyone on host155 gotten help recovering their data? Is there anything we can do? Should we continue to use Rimu?

Thanks, Eric

timharig
08-25-2007, 01:45 AM
I would agree that it was extremely poor form not to have announced this publicly. It has been my observation that they do not post maintenance notices for a lot of things that otherwise go unnoticed. It rather confirms my suspicion that Rimuhosting is starting to become large enough that it acts like a large money hungry conglomerate and I am afraid that their customer service will fall off as they grow. It is my fear that they are overextending themselves as they grow and tending to start putting out fires rather then preventing them from happening.

In my experience with RAID controllers is that they tend to take everything with them when they go bad. What bothers me is that Rimuhosting uses a backup disk on the same machine that it uses to host. I consider this extremely poor form. One might have assumed that backups would be stored on some NAS in the local data center. The VPS hosts could also back to other VPS hosts. Having data backed up on three different hosts in a rotation would lead to decent data integrity. It is also quite possible, given that Rimuhosting operates out of four data centers that they could do periodic backups off site to one of their other data centers.

I personally find the downtime to be even more inexcusable given the technologies that they are working with. These are not traditional shared accounts; so, I can well understand if they feel backups are the consumers problem. I even tend to agree with them to a certain extent where it does not affect downtime. Downtime, on the other hand, is totally out of our control. Given that they are using Xen which makes it easy to quickly move VPSs (in some cases while they are still running) to new hosts if problems are encountered I tend to find it inexcusable when these problems arise.

Ideally, when all of this started they could have loaded your VPS backups (remember that these should not have been on the infected machine) onto spare hardware, or distributed them across hosts with extra available space in preparation of things going wrong (This could be done lightning fast if the backups were already distributed between different hosts where they could be run.) before they ever decided to restart the host server. Then, when the problems were encountered, they could have redirected all of the affected VPSs to the backup instances of those VPSs. When they notified you of the circumstances, you could have updated their backups with whatever backups you have made since their last backup was taken. In the worst case scenario the systems are only down a few minutes while they are being redirected. Having a slightly out of date VPS running is better then having a VPS out for 24 hours.

Having off site backups, not only makes the backups more secure from large disasters, but extends this concept for downtimes that effect data centers. Having an out of date VPS running in another, maybe less optimal, location is also better then having a VPS down for 24 hours.

I have watched several of these problems that Rimuhosting thought would be simple to fix cause outages that lasted for a day or more. Sometimes, these problems don't just cause blanket outages, but unresolved problems cause the systems to be taken up and down several times for long periods until Rimuhosting can get them resolved. These outages can be just as disruptive as a single length of time. With a little bit of careful planning, many of these downtimes can be avoided.

Again, I am very sorry that these problems have affected you and the other host 155 VPSs.

retep
09-05-2007, 10:06 AM
To put it mildly we consider losing customer data to be about the worst thing that we can let happen. In RimuHosting's 4 years of operation I cannot recall a problem of this severity occurring before. Not one that has defeated both our raid array and backup disk.

We offer our deepest apologies on this.

We operate several hundred servers. Statistics and murphy's law make it inevitable that from time to time there will be problems. We have some very smart sysadmins who work hard to ensure that any outages are kept to a minimum and are resolved as quickly as possible.

We have a maintenance/outage system that lets us record outages for individual systems and notify affected customers so they can be aware of the problem and kept up to date.

ekerin
09-10-2007, 06:10 PM
I assume the backup FTP space we can use is on a different server than our normal VPS(s)?

retep
09-10-2007, 10:20 PM
The server we use for the backupftp site is in the NY data center we use. It is on a separate server from any shared VPS host server.

i.e. yes, it is on a different server from any customer's server.