PDA

View Full Version : Host16 Issues


retep
06-21-2004, 02:58 AM
About 20 hours ago host16 became 'troublesome'.

The data center staff tried to restart the server. But they did not succeed in getting it to power back up.

The hard drives were swapped to our spare server at the data center. We had to overcome some OS issues there (some problem with SMP kernels, uni-proc servers and the server's network controllers).

I installed our most recent and stable 2.6 kernel. And the server started back up.

A few hours later it required a restart. And a few hours after that (just now) it required a restart again.

Looking at the error logs I suspect there is an issue with one or both of the hard drives. I am working to schedule drive replacements.

That may take a few hours until then I fear there may need to be another server restart or two.

So, in summary, you have all new hardware (except the hard drives) and a new kernel. The problems are still occuring. There are errors in /var/log/messages indicating drive problems. I'm working to get new drives.

I'm posting here so that you can follow what is going on and ask any questions you need to. And so we can deal with this in an open manner.

Anonymous
06-21-2004, 03:10 AM
Peter,

Do we expect the server to be up anytime soon?

Thanks,
Ariel

Anonymous
06-21-2004, 03:22 AM
Peter,

It's been my experience in events like this to simply get a new machine swapped in. I have 30 machines in data centers across the country for my customers.

What I've learned is that if your backing things up, and you feel reasonably confident of your backups -- tell the data center to place a new machine into the mix, and get that up and running.

Downtime is not a problem if it's infrequent. Downtime is hell if it's consistantly a problem and makes your application unreliable. I'd rather be down for a long period of time then be down more than once.

I've had problems at ServerBeach with machines, had them swap drives, move things and in the end it was a motherboard... it's best to just get a NEW machine put into the situation to replace the old one. Your SLA at these places covers the hardware -- use that to your advantage.

Gary

retep
06-21-2004, 03:41 AM
Hi Ariel. Your server is back up.

Gary, I had a spare server. And swapped to that. So it's all new hardware except the original disks. And I'm working on getting new disks right now.

(Just changing this forum to be registered user posts only. You can register from the Forum drop down menu up top there)

asvb278
06-21-2004, 03:58 AM
Peter,

Thanks for the prompt response.

retep
06-21-2004, 04:13 AM
Looks like it has gone down again.

At this point it looks like the server is not going to stay up for very long. And unfortunately the hard drives are still several hours away from being ready.

I am going to see about moving each VPS to a different server. That will require allocating a new IP to each VPS, but everything else should be exactly the same.

Any thoughts about that (while I'm doing some planning for it)?

retep
06-21-2004, 12:55 PM
We are in the middle of moving your VPSs to their new home, host32. We sent out an email awhile ago to everyone with the new IP details. The copying process is not going as well as we would like. The host server stops responding every so often and needs a reboot. Then we have to resume the VPS download.

We are doing a VPS at a time. Currently we have 1 VPS up and running on the new host. And 4 or 5 that are 'nearly there' if we can just get the last bit of data from the old host server.

We are also trying a few different techniques for moving data over. To see which gives us the highest throughput.

retep
06-21-2004, 02:26 PM
We were making poor progress bringing across the VPS images. We are trying to bring across the backups instead.

These seem to be coming across a lot better (and knock on wood, with few crashes on the old host server).

Using the backups will help us significantly reduce the duration of this maintenance window. However, it will mean the VPS images are up to one week old.

retep
06-21-2004, 04:55 PM
The new method is proving more successful. I've downloaded most of the file systems. The ones still coming down are the larger file systems. They are - per the Move # on the email I sent to people: 23 5 10 13 21 19 and 9.

The next setups I have queued up are Move #'s: 3 1 6 4 12 14 2 8 18. I will send out an email to affected people when they are up and running.

carl
06-21-2004, 06:47 PM
Following setups have been completed: 24, 18, 8, 5

retep
06-21-2004, 10:55 PM
For the customers that we needed to go back to the backups, here are the dates the backups were taken. The Move # will be per the original email we sent out.

Move# Date
1 17 June
2 14 June
3 15 June
4 15 June
5 14 June
6 14 June
7 20 June
8 14 June
9 14 June
10 15 June
11 15 June
12 20 June
13 14 June
14 16 June
15 14 June
16 15 June
17 15 June
18 18 June
19 18 June
20 15 June
21 19 June
22 17 June
23 17 June

I believe Carl is working on the last 4 moves now.

retep
06-22-2004, 03:18 AM
OK. The last handful of file systems are coming across now.

We had a problem with the larger ones. We would rsync them over to the new host. But host16 would die. After a restart we would resume the rsync. But the checksumming it did to see if anything had changed took too long (e.g. host16 would die before the rsync completed).

So now we're using the swiss army knife of unix commands: dd:
ssh host16 "dd ibs=1 if=/filesystem.tar.gz skip=961977856" >> filesystem.tar.gz

Where 'skip' is where we are resuming the download. This approach is working well. So we are finishing up the last few file systems now.

Then, if you request it here, I can restart your old VPSs on host16 in order for you to grab any files you may need from it.

Flooda
06-22-2004, 02:22 PM
lxsplit (http://packages.gentoo.org/packages/?category=app-misc;name=lxsplit) works good for splitting too, although it's mostly used on usenet for trading large files. multiplatform too, except on win/dos, it's called hjsplit.

dub
06-22-2004, 04:18 PM
What's the progress guys?

Jeff Mincey
06-22-2004, 07:22 PM
I wish to register my request for time on host16 to salvage some files which didn't make the trip on the VPS backup. Now that VPS transfers are completed I hope host16 will be a bit more stable and perhaps remain running long enough to allow for an scp or sftp operation. We shall see.

I gather this will call for some synchronization between the client and Rimuhosting, and perhaps clients who have this need would want to schedule a specific time.

Peter, please advise how you want to handle this -- whether by creating a concrete schedule or by some other means.

Thanks.

dub
06-22-2004, 07:34 PM
With all due respect Jeff, though your point is still very valid, if the transfers were completed then my avatar would load. I imagine I am not the only one still queued.

Jeff Mincey
06-22-2004, 07:47 PM
To dub, my apology; apparently I am mistaken on the sequence of things. Let me then rephrase to say simply that once all VPS's have been successfully transferred from host16 to their new home, then I should like to have a stab at salvaging some files that slipped between the cracks. I'm sure I'm not alone in this wish, but of course first things first.

I certainly don't want to do anything on my old VPS on host16 which might add to its instability and disrupt the very fragile operations Peter and his crew are undertaking now. After all, they are having enough problems as it is. I just thought perhaps this forum could be a place to document the need for follow-up on host16 -- so Peter could gauge the scope of that need and how many clients would be interested in that.

retep
06-23-2004, 08:31 AM
Hi Jeff. Thanks.

I've created a startup file that will restart selected VPSs after I power cycle the server. Yours is up now (refer to your original setup email if you forget the IP address).

host16 (soon to be renamed host666) is still requiring a reboot every 30 minutes or so (if its active during that time).

I've got one customer who I'm still transferring over (thanks for being so understanding flooda!). Because of that I'm monitoring its status and restarting it when necessary.

FYI, rsync is a really cool and efficient way to grab your data.

E.g. on your 'new' server run this to copy over a directory:

rsync --rsh=ssh --compress --recursive --partial oldip:/some/directory .

retep
06-24-2004, 01:16 AM
OK. Everyone is off host16 now. Phew. I've started up a couple of VPSs per their owner's request. If anyone else needs something from their old VPS, let me know and I'll start it up as well.