11/27
Confuci.us has been up and stable for
17 days, 19:01:35 as of 6am today. I will be off island until
12/12 available by cell phone and email most of the time.
11/13 6PM
No news actually is good news! Everything seems stable [very few exceptions] at
Navisite and we're moving forward getting all databases working and moving over
SSL certificates etc.. on Advanced Confucius...
Q & A about the new server, click here.
Monday 8AM
All but two accounts have been imported into the "new" new server. I am focused
today on continuing to try and make the new server as user friendly as possible
so that your transition will be as un-traumatic as possible... I am also
trying to write or gather documentation on how to do some of the more common
tasks [like getting into webmail, getting into your control panel etc. ...]
You will be pleased to note when you do get switched over that I am vastly
increasing the sizes of your accounts, increasing your monthly bandwidth quota
and offering more features [like manual backup and restores you can configure]
etc... We seem to be stable up at Navisite, which, despite last week's abysmal
failures, is actually a stable company. I am going to take advantage of that to
contact everyone directly later today and appraise you all of this update
[there are many that aren't aware of Pono.us]. I
will also give everyone an idea of how the transition from Navisite with take
place and what to expect, including how to retrieve mail that may be on the
Navisite server after the transition...
If your site is down or you are having any problems please
call me and let me know ASAP.
I am here in the office 808-878-3625
Gill
12:45 Sunday
[Wow, I wish my sites did this
well all the time. This page has had 44,600+ hits in 9 days]
Click here
to see if your user name is on the list of clients that have been
successfully
imported into the new server at Liquidweb.
6AM Sunday
Almost everyone appears to be up at Navisite, at least everyone I have had time
to check is... I have gotten perhaps 80%+ of all accounts moved over to the new
server. Am working with tech support there to make the environment as similar as
possible... Will get back to you later this morning..
8:50 PM 11/10
I am continuing to import everyone's data into the "new" new server. So far I
would guess I have 75% of your domains. In some cases I have already switched
the name servers and you pages will be coming on line tomorrow. I will get
an early start tomorrow and pick up where I left off... We are very much on
target to flip the switch for everyone Monday, perhaps even tomorrow...
1:40 pm
click here
[no longer to
see a list of user names of accounts that have successfully imported into the
new server and are being configured for access...
10:30 AM Sat.
My copy process is going very very well. As accounts arrive I am switching the name servers up at the registrar [if I have that info] to point to the "new" new server and people are coming on line! Quite a few sites are down again at Navisite and I am on hold awaiting help...
8:45 AM HST
It is working! I have been working with an import script
on the new server trying to get it working to import all of my accounts, user
names, passwords, databases... etc. in one fell swoop and it is working!! This
is the biggest/best news in the past 12 hours! If this process takes all
day today and tomorrow, which it is likely to take, we can count on being on the
"new" new server about an hour after it completes!!
7:30 am Sat.
About 10% of us are down. I am goosing everyone I can find to get those on-line.
I am also making progress on the new server. I can't wait to get out of here!!
4:50 pm 11/9
We've been up and solid for about 4 hours. I am diligently working on the new server creating accounts for us all and moving data over as quickly as possible. At the speed the downloads are going it may be most of the weekend getting everything for everyone onto it but I am targeting Monday [sooner if possible] to flip the switch over to the new machine. I am configuring the new server to be as close as possible to the one you've become familiar with setting things like webmail up to work the same way etc.... If you have specific needs call or mail me at gillybird@hawaii.rr.com
12:15
We are up again, amazing what you can get done for a case of Vodka! I am hoping,
besides doing their jobs right, that might be an incentive to keep us up!!
I am working to transfer clients over to the new server as we speak. Thanks for your patiences
11:5 am Friday
It was a long grueling week and I was [WAS] thinking it was over. Alas, it is
not over yet ;{
I hope to speak to each of you as soon as I can, thank you for your
understanding, or at least trying to understand…
5AM Friday
I am so sorry to awake and find us down. The new server has just been completed
and is available. I am opening accounts for everyone on it right now and will
begin moving people onto it today! Please be patient, I hope we are back up soon
on Navisite but as soon as I can move each of you I will be doing so... I am
available by phone, but it slows me down so don't call unless there is something
you need immediate help with.
Thank you
4 PM HST
Email
Here is my understanding of what you can expect. Email is sent out by someone to
you, the sender's email server [usually their ISP] is configurable but most
servers tend to send mail that is bounced for 5 days. At the end of that 5
days it will send a bounce message to the original sender saying that it has
tried to deliver the message for five days and failed. Hopefully that will
inspire the original sender to resend. Mail that was sent during that 5
days, while it may not be sitting on your account right this minute may well be
redelivered from the sender's ISP at any time [up to 5 days]. Different
ISP will resend bounced emails at different intervals...
Regardless of the fact that mail from days past may well
still come in, I want to encourage everyone to send email to all persons who may
have tried to email you in the past 6 days and ask them to resend anything
important.
This experience has taught me a lot
of things, not least of which is that I don't have off network [non-your domain]
email addresses for many of you. Many of you learned how important such and
address is as well. So, if you haven't got one, get one [gmail, yahoo, road
runner or Hawaiiantel...] If you suspect I might not have an off-network email
for you please email me back and let me know by clicking here:
offsite@confuci.us and
just writing your alternative email address in the body of the message. I want
to update everyone's accounts.
Please remember that we are still in QA and have not yet
passed final testing.
But, if I may be allowed an expression of relief,
WHEW, it's about time !!!
2:15 PM HST
Confuci.us and therefore your domain is complete and is being moved to QA
[Quality Assurance]. When we are through the quality assurance phase they move
us to client/management and user acceptance testing... Even though the server
may be up through that whole process it may take many hours before it is
pronounced 100% available and fully on line.
Some email may start trickling in at any point, when we get out of QA we we're
all hoping for a flood of emails!!!
Please understand that email may come in completely out of sequence, if you
start getting things from yesterday, don't presume that things from earlier
won't be following...
I hope this is a solid return, either way I am positioning the new server to be
available ASAP, if not as a permanent home then as a mirror and immediately
available backup site so there is no downtime ever again!
Please, if you weren't aware of it, as we go through final testing, see http://pono.us
for off-site updates that I am going to keep updating for the foreseeable
future.
This isn't over yet, but the end has come back into sight at least..
Gill
1:54 PM HST
When we are through the quality assurance phase
they move us to client/management and user acceptance testing... Even though the
server may be up through that whole process it may take many hours before it is
pronounced 100% available and fully on line. At any point through this process
email may start flowing. I have already started getting a few dozen... Not until
we are fully up will everything come through...
11:40 HST
I just got 3-4 emails and can trace route into my
boxes! No pages yet, but the first pulse I have heard in DAYS!!! Confuci.us is complete and is being moved to QA [Quality Assurance] Some email may start trickling in at any point, when we get out of QA we we're all hoping for a flood of emails!!! Please understand that email may come in completely out of sequence, if you start getting things from yesterday, don't presume that things from earlier won't be following... |
11:20 HST
Their top Unix guy is working on my box as we speak. That said, absolutely
nothing else has changed ;{
From Navisite at 2:45 EST:
We are on track bringing up more servers on-line and approximately 90% of
servers are currently up in the environment. All remaining servers have been
touched and we have a status on them within our analysis.
9:15
They have "fast-tracked" my server, there are now 4 guys working on my
case!
Strike that last post, it is 8PM, I am still on the phone with the guy in charge but so far we can't get into my box directly at all... keep the faith, but, I am glad the other server is being built today!!
6:15 AM
I believe we may be back up by 8AM if this keeps going as it is, I am hearing back from someone every 15-20 minutes!
4 AM 11/8
I am on the phone with someone from Navisite as an intermediary between someone
actually working on my case. I am hearing live updates coming from the
technician into him... I have a real sense of movement. I am afraid to leave my
desk for a coffee...
6 PM HST 11/7
I have bought a new server! After several days of research and after Navisite's 'Failure to Launch' I have found a new company, Liquidweb, that can build a server for me that meets or exceeds all my/our needs. It will be coming on line tomorrow sometime. Immediately thereafter I will start populating it with everyone's accounts. At that point you will all get emails as to how to upload to your new location or at least giving you the details of the account [as I will be doing the uploading ]. Once that process is underway I will change my name server information up at the registrar. What this means is that all of your domains are currently set to go to NS.CONFUCI.US and NS2.CONFUCI.US and the your various registrars are sending all of that traffic to my hosting account at Navisite [ formerly Alabanza]. Right away all that traffic will forward instead to Liquidweb where the new server will be waiting.
Our first hope continues to be that Navisite will get us on line and stable before I change the name servers over to Liquidweb. If that is the case we will stay there. That would result in no lost email and nothing but a bad memory. But, put aside your concerns about Navitiste as I will then set the new server up as a mirror of Navisite and split the name server information so that your registrar will send people to Navisite and if that isn't available they will automatically be sent to Liquidweb. That process will be transparent to everyone and take virtually no time longer than a normal hope to any website.
This doesn't get us completely out of the woods, if we do have to move to the new server their will be the downtime that it takes for the actual move and name server propagation. There will also be some downtime for getting databases and licenses and SSL certificates transferred, I know this doesn't affect most of you but those of you who use such things will have that additional delay in full restoration as we will have to wait for Navisite to come back on line for me to get access to that data and move it.
For those of you who believe in the powers of mind or the powers that be, please speak to those powers about getting through this with the least resistance, notably Navisite getting its act together before we actually pull the plug...
Again, I would like to thank all of you who have been so supportive. This move could not have become a bigger disaster nor come at a worse time for some of you and I can't tell you how much I appreciate the restraint many of you used when you were telling me how disastrous this was for you. I felt your anguish and was going through plenty of my own so I truly did understand.
I will continue to be on the phone and on line with them over night tonight and tomorrow and will update this page as detail or progress become available.
2:30 pm 11/7
Aloha,
While there is currently more hope that we may be
up today I feel a need to draw a line in the sand and be proactive about getting
us back up
-elsewhere. To that end I am currently opening a new
account at a different company and preparing to move us tonight to another
location. That said, I am hoping and praying that they can get us back up today
before 6PM HST.
If they can get us up then all the
email should come streaming back in and all would be well. If not there will be
another delay of 24-36 hours to get the DNS changed from Alabanza [my previous
parent company] to Liquid Web
[the
new location]. A
potentially bigger concern is email. What is waiting on the server will be
available as soon as Navisite gets us back on line, even if we initiate a move
to a new hosting company you will be able to access your old site via its IP
address. If we do have to move I will be gathering up all of everybody’s data,
mail, web pages and databases and making it available as soon as I can.
This move means we will have
to upload your web sites to the new location, as much as humanly possible and
for those of you whose web sites I have here on my computer, I will upload those
immediately. Those of you who have developed your own web sites or used other
designers, please contact them as soon as possible and advise them that they
will need to upload the site as soon as I can provide you with the new details…
I am expecting a call from a senior networking guy
at Cisco in less than an hour who may be able to help. That said, I am still on
the phone with other hosts [14-16 hours a day] and still working with the
Navisite technicians, as they are available, to try and bring us back on line.
Our best possible hope is that we come back up
where we are. Navisite is a good company and those sites that have come on-line
have been performing better than ever…
Gill
808-878-3625
3 pm EST from the top
At this point we have made significant progress eliminating issues that have
been affecting all clients. Both name server issues as well as high network
utilization have been mitigated. For the past 24 hours we have been
focused on individual server configuration issues so that we are able to bring
up complete client environments. We are making steady progress with this effort
and the overall momentum is picking up.
It is worth noting that for the clients that have been brought back online, we have been hearing that the performance of the environment has improved vs. the Baltimore environment. This was part of the rationale for the migration.
Just spoke to my account rep and to senior support, I am in direct email contact with one right now and am moving as quick as I can towards a resolution... Am feeling more optimistic than I have for days...!
All of those servers that we have selectively validated and brought back on line are operating efficiently. For those that we have brought back on line, we are getting feedback that their overall performance levels have improved significantly – as our teams continuously are bringing an increasing number of customers back on line.
I am continuing to research a company that can host us all,
this outage is not only unprecedented it is outrageous.
Some of the sites that were up yesterday are down and we still are not up yet!
November 06, 2007 (Computerworld) -- Approximately 165,000 Web sites have been offline since Saturday, thanks to a failed data center migration involving Andover, Mass.-based Web hosting company NaviSite Inc.
The problems started Saturday when NaviSite attempted to migrate and replace hundreds of servers operated by Baltimore-based Alabanza Corp., a Web hosting company acquired by NaviSite in August.
According to NaviSite spokesman Rathin Sinha, NaviSite decided to physically move 200 of the 850 servers operated by Alabanza to NaviSite's data center in Andover and then virtually migrate the data from the rest of the older servers to new boxes, also in Andover.
NaviSite let its customers know that their sites would be down for a while on Saturday, with the migration expected to be finished that day, Sinha said. But when NaviSite attempted to transfer the data from the 650 servers still in Baltimore it ran into a number of synchronization failures that kept multiplying.
As Saturday progressed, NaviSite realized it would probably miss its completion deadline; as a result, company officials decided to physically transfer another 200 servers from Baltimore to Andover to help reduce the scope of the virtual migration and speed up the data transfer.
But then NaviSite ran into more problems. As the hosts came up, their URLs did not, so although customers could access their Web sites from their IP address, they could not do so using their URLs, Sinha said.
"That was unanticipated," he said.
As NaviSite tried to solve that problem, the network became overloaded because of all the customers trying to get online, Sinha said. "What happened was first the URL could not match with the IP address and then IP did not match with the machine, so it took some time, and all this time we have a highly trafficked overloaded network," he said. "If there is one little problem, they multiply because there is a lot of dependencies."
Although Sinha said a "big chunk" of sites are back online, he could not say when everything might be back to normal. He also couldn't say how much this failed migration would cost -- NaviSite is a publicly-traded company.
To put it mildly, one of NaviSite's customers, Cynthia Brumfield, president of Emerging Media Dynamics Inc., an analyst firm in Washington, seems to be furious.
In an interview, Brumfield said she's going into her fourth day without access to her Web sites. And she said she doesn't believe the way NaviSite is spinning the story. While NaviSite said it has brought a large number of Web sites back online, she claims it hasn't.
"According to people who have talked to NaviSite's tech personnel, they were ill equipped for the relocation and ignorant of how to accomplish even basic tasks," she said in a blog post. "At this point, NaviSite's poorly planned data center consolidation has slipped from mere incompetence to outrageous indifference to its customers' needs and should be grounds for legal action, if not government sanctions of some kind."
Brumfield said that, in effect, NaviSite yanked the servers for 200,000 Web sites, put them on trucks and then didn't know what to do once the servers arrived in Andover. "But what's worse, NaviSite had informed its clients of a completely different timetable and process for the server relocation than the one implemented," she said in the blog posting.
In the interview, Brumfield said that because all of her backup files are also stored on Alabanza's servers, she has no choice but to hop on a plane to Boston on Wednesday and drive to Andover to retrieve her data. And she said she's bringing a video camera with her to document NaviSite's response to her request.
This is the most trying business thing I have ever gone through. I just got of yet another Bridge call with little or no satisfaction. They are working, as always, as hard as they can and lots of people are coming back on line... Just not our turn yet.... I very much appreciate your support and fully understand if you can wait no longer. I will be glad to help you move on to another server if you wish. Call me at 808-878-3625 if you would like to do this. That process takes 15 minutes to setup but between 2 and 72 hours for that info to propagate around the world…
The primary concentration in resolving outstanding issues has been to bring up all of the name servers - as we continue to collaborate with Cisco to handle Address Resolution Protocol (ARP) issues. As we were bringing more name servers online, ARP requests were overloading the network. As a result, we have been focusing squarely on ensuring the overall stability of the network.
Our two primary focuses continue to be bringing all of the name servers back online as we collaborate with Cisco to correct high levels of ARPs on the network. As more hosts were being brought on line, a significant number of ARP requests were overloading the network, so we were tasked with resolving stability issues.
We are committed to providing you with the latest information on the migration and will provide additional updates on this page - as they become available.
11/06/07 9:48 AM HST
I am currently on a bridge call with the head techie and VP
of network ops of Navisite.
It appears the problems are name server [NS] and routing related and while they
are working on them it is just taking a lot of time... They are more or
less promising us that by end of today we should all be on and stable. I wish I
were confident this was realistic.
11/06/07 8:25 AM HST
Our two primary focuses currently are to bring all of the name servers back online as well as collaborating with Cisco to correct high levels of ARPs on the network. As more hosts were being brought on line, a significant number of ARP requests were overloading the network, so we were tasked with resolving stability issues.
Overnight, we were troubleshooting and resolved name server issues many more hosts were being brought on line. This required a network configuration change in the midst of resolving specific DNS issues.
We have continued to hand off completed environments to clients – at a much faster pace. We are doing this across multiple environments so that we are satisfied that standards for Quality Assurance (QA) have been met. Routing issues were resolved by assigning dedicated engineers on a client-by-client basis to support the process.
A full complement of engineering resources have been committed to completing the overall migration and have been working around the clock to resolve overall service issues and to resolve specific problems. We did have some improved service-level commitments in mind that have included:
Again, we regret the circumstances and apologize for significant inconvenience and remain positive that the long-term decision to migrate will ultimately be in our collective best interests.
And, we will continue to update our progress.
11/06/07 6AM HST
I have been on the phone since 4:30 and no movement. I have about 15% of us back up. No immediate commitment...
11/06/2007 9 a.m. EST
We just resolved the network routing issue that was referenced in the last
update. We will now start working on bringing all of the name servers back on
line as well as working with Cisco to correct the high levels of ARP'ng on the
network. These two items are the primary success criteria in bringing the
remaining hosts on line.
1/06/2007 8 a.m. EST
As we were bringing more and more hosts on last night, we were creating a
significant number of ARP requests on the network. The number of ARP requests is
primarily due to the fact that we were still resolving name servers issues in
part of the environment. The number of ARP requests ultimately overloaded the
network which is why you were seeing instability.
In the process of troubleshooting the over utilization issue with Cisco, a
routing error was made which has caused most of the environment to be
inaccessible. We are working to reverse this change out with Cisco and will then
work both the name server and ARP'ng issue.
Ok, OK, I am back. [10:30 pm] I saw a glimmer of hope from the living room
[about a dozen sites just came up] and realized I should give you all something
to try so you can part take in the fun. {ha}
Here is one of the primary
tools I have been using through this process, VisualRoute. You can
download it
Here [directly
from me]. This is a 15 day trial, download, and double click the file to install
it. When complete, open the program and type in your domain name or your IP
address. then click Start. The program will trace the route to your actual
domain from your ISP. It will graphically lay it out both on a map and below
that, on a chart. Here is the challenge, repeat the process until your domain
shows up in a pink/coral colored box, when it does, the odds are 98% that your
website will then come up in a browser and that your email will work [I don't
know what percentage of likelihood the email functionality might be]. I used
this tool to get more than 90% of you back on line this morning. This doesn't
mean it will work for sure in the same way, but it did with the issue this
morning [Monday at 5-7AM] as it resolves a firewall & DNS issue that was
occurring, so what's to lose?
Sorry folks, it is only 9pm but I have been cut off of the last Bridge / 'tele-conference' of the day and they wont pick up until Tuesday AM EST. We are going into 72 hours in a bit and I can't stay awake for more broken promises. I just left a conversation with the VP of network operations at Navisite, he anticipates 95% of servers to be on-line by dawn, EST , the next 4% by end of business Tuesday and the remaining few, well, he didn't give a time table.
In truth, about 10-15% of Confuci.us has come up again in the last 25 minutes... I assume, if you are reading this, you are not in that group. Please, be patient. If you can not,
I am taking calls 24 hrs. If you wish to move to a different host I will help you do that any time you call. Please understand that the process takes a minimum of several hours and an average of 18 hours or so. That said, being 'down' for +/- 72 hours is COMPLETELY un-acceptable and I fully understand. Many of you know how to move to another server yourselves, if you decide to do so, please let me know you have made that decision. I would like to communicate directly with you.For those of you willing to stick it out with me, I totally appreciate your support and assure you that you will be compensated for your losses [and your allegiance]. I assure you I feel your pain [more than you can imagine].
I have to crash, please call if there is anything I can address that isn't covered here and check back in tomorrow. I hope to be back with an
update by 4 - 5AM Tuesday HST, but exhaustion may win the fight...
Confuci.us is down again
We are making continued progress in troubleshooting specific issues and bringing up individual hosts - as we fix them. The root cause of the DNS issue that we have been experiencing - and which has prevented us from bringing up the additional hosts - has been identified.
We will be making a network configuration change at 11 p.m. EST that will take approximately 15-30 minutes to resolve the issue. We expect this will solve both email and name resolution issues that have been identified, and we will continue our troubleshooting efforts
The many stages of "server-downess"
I have now passed through embarrassment, through anger, I've left frustration long behind. Now I am going through a little self defense mode.... So, in defense of Confuci.us, in the past 3 years Confuci.us has been down for a total of 19 hours [ Sept. 18th 2003] . There was a fire in the building and the fire department shut down the entire block. Immediately after power was restored my parent company purchased $200K worth of generators and removed themselves from the grid. There was another fire, also coincidentally in Baltimore, that torched 3 trunks on the internet in a tunnel about 10 miles from Confuci.us. We were down, along with several million sites, for 3-4 hours.
This move represents the single biggest "data center move" failure in internet history. It is affecting +/- 220,000 web sites...
Our engineering staff has been working all day to resolve the remaining issues
preventing all sites from becoming live. We continue to engage our full
complement of engineering and support staffs to troubleshoot the balance of the
issues and expect that all sites will become fully available shortly. We will
provide further status reports as they become available.
Again, we want to extend our thanks for your
continued patience - as we move rapidly to finalize the migration. We are
continuing to hand off completed environments to clients, and now at a faster
pace – as we are addressing multiple environments by resolving issues through
the Quality Assurance (QA) process.
We are assigning dedicated engineers on a
client-by-client basis for thorough Quality Assurance, and then handing the
server environment back to the client. This is going to be our focus for the
next couple of hours.
On a more specific note, a few of you have indicated
that you are having routing issues through the 207.x.x.x address space to your
environments. This is an address that we had temporarily set up during the data
transfer process, and the symptom that you are seeing is an anomaly. We are
aware of the root cause and are now going through the environment to correct.
[posted: 8:35 AM HST]
HST currently:
Scroll Down
Here is a story posted on:
http://www.lightreading.com
Navisite planned for some reason to
"consolidate" their datacenters, so with barely any warning, and day after day
of delays in implementation because they hadn't worked the bugs out, they barged
ahead and did the transfer, blithely promising their clients that the downtime
would be reasonably short, if sadly inevitable.
The result has been a total fiasco. This is the schedule for when "everything
would be back up and running" according to the emails sent to their "Valued
Customers".
Saturday 12 noon {These are EST]
Saturday 4 pm
Saturday 6 pm
Saturday 10 pm
Sunday 6 am
Sunday Noon
Sunday 5 pm - Midnight
...and mind you, the sites have all pretty much been down since 2 am Eastern
time Saturday morning (or earlier), November 3rd.
Literally THOUSANDS of critical web sites, some the entire livelihoods of
families, businesses, and organizations were simultaneously wiped out and every
deadline for bringing them back online has gone unmet. Navisite should have
admitted defeat and reverted back to the Alabanza data center when they missed
the 4pm deadline after missing the 12 noon deadline.
This is outrageous and reckless behavior by this company. Customers are
infuriated as many of them are small hosting providers with hundreds or
thousands of their own clients, all of whom are being wiped out simultaneously
with no recourse. This is causing serious damage to both Navisite's reputation
and the hapless web hosting firms that have been caught up in their maw after
previouly enjoying long-standing productive relationships with Alabanza.
Navisite better get out its checkbook and start voluntarily offering settlements
or else it's probably going to have quite a few lawsuits on its hands.
Whoever was in charge of making this call needs to be FIRED, at a minimum, for
Navisite to retain any credibility at all. I don't know what the upside was to
doing this "migration", but it's been horribly botched and significant economic
damage has been the result, not to mention the ruining of reputations in the
independent hosting industry and the devastation of many businesses who rely on
their web sites for their livelihood.
During a conference call this afternoon (36 hours into the disaster) with scores
of irate customers who were watching their businesses crumble before their eyes,
Navisite offered up a PR guy instead of one of their executives in a shameful
display of corporate cowardice. If I were a shareholder, I'd want the entire
management team ousted.
What a total and complete fiasco. Outrageous, unbelievable, horrifying.
A complete unmitigated disaster. It will be interesting to see how much this
winds up costing them.
I've spent hours with the technical
support department, or
waiting for them on the phone, and participating in a tele-conference with them.
I can only hope their latest
guestimate to be accurate.
My sincere apologies, I will find a way to make this
up to you all..
Gill