July 24, 2008

Beat this uptime


We had an interesting thread unfold on an internal list, which I thought I would open up to our readership.  Someone was foraging around the network and came across some impressive server uptime (all server names changed to keep infosec happy):

server-x% uptime
7:13pm up 500 day(s),  3:17,  53 users,  load average: 0.08, 0.11, 0.11

to which someone else countered with

server-y$ uptime
23:45:15 up 700 days,  8:31,  3 users,  load average: 0.00, 0.00, 0.00

The irony behind this server is that it has outlasted the business unit it apparently supported.


However, the winner so far is:

WS-C5000 Software, Version McpSW: 3.1(2) NmpSW: 3.1(2a)
Copyright (c) 1995-1998 by Cisco Systems
NMP S/W compiled on Feb 20 1998, 18:56:57
MCP S/W compiled on Feb 20 1998, 19:05:51

System Bootstrap Version: 2.4(1)

Hardware Version: 2.1 Model: WS-C5000 Serial #: 007584271
. .
. Uptime is 2618 days, 9 hours, 11 minutes


7+ years—guess there is something to that investment protection thing after all.


So what is the best system uptime in your data center?  The response with the best uptime gets a Cisco fleece.

Omar Sultan Posted by Omar Sultan at 01:11AM PST

Permalink, Comments (16), Trackbacks (1)

Tags: catalyst data center

16 Comments

James V. Jul 24, 2008

>So what is the best system
>uptime in your data center?

I can’t beat 7+ years. 

IOS (tm) RSP Software (RSP-DSV-M), Version 11.2(15)P

Router uptime is 5 years, 39 weeks, 5 days, 1 hour, 3 minutes

Andrew Wales Jul 24, 2008

% uptime
  1:11pm up 3307 day(s), 23:42,  1 user,  load average: 0.24, 0.17, 0.16
%

Over nine years, how about that?  Last rebooted for y2k patching.

Timothy Sipples Jul 25, 2008

It should be noted that the results of an “uptime” command represent an IT-centric view of the world, and a binary one that doesn’t capture any performance slowdowns, for example. It’s fun, but it’s not a business-centric view, and we IT people are kidding ourselves if we think so.

With that background, we have some customers running “a certain renowned type of business server” that:

- upgrade hardware models and features
- upgrade software (including operating systems and middleware, even whole database versions)
- upgrade applications
- reconfigure network connections
- add storage
- relocate and reconfigure data centers

and otherwise change any server component you care to imagine, yet still deliver verified continuous business service that fully complies with all business performance requirements.

And they’ve been doing that continously for over a decade.

The poster upthread is correct. It’s not a particular engineering challenge to install a gadget, leave it be, let it run at a low pulse, and get some years from an “uptime” command at the console. My refrigerator does that, and it’s more sophisticated. grin The real trick is to keep business activities humming continuously while everything possible changes. And it can be done with a combination of careful planning, exceptional technology, and competent people.

Omar Sultan Jul 25, 2008

Wow—Andrew is in the lead.  His server had better uptime than my first marriage. But James is no slouch at almost six years.

When I started this, I used the generic term “system”—do you think its fair to put servers and networking gear in the same category?  Is 5 years uptime for a server equally as impressive as 5 years for a switch?

Omar

Omar Sultan Jul 25, 2008

Timothy:

I do not disagree with anything you said—heck, my first real job included managing DEC VAXes and FEPs.  In fact, much of what you say about "a certain renowned type of business server"  ( smile ) can be said about our Nexus 7000—I would venture that they were both designed with similar principles in mind.

I agree that service uptime is the correct measure and that is dependent on system architecture and operations.  Components will always fail—the goodness of the architecture and operations dictate how gracefully the system handles that.

That being said, the Law of Entropy is alive and well in the data center, so anytime you see numbers like these, the people that keep these systems up and running deserve a nod….and perhaps some free clothing.

Paul Thomas - CCIE Jul 25, 2008

Wanted to chime in with the best uptime I could come up with in 10 min of searching.

WS-C6506 Software, Version NmpSW: 6.1(1b)
Copyright (c) 1995-2000 by Cisco Systems
NMP S/W compiled on Nov 9 2000, 22:11:25

System Bootstrap Version: 5.3(1)

Hardware Version: 3.0 Model: WS-C6506 Serial #: GUTTED

    DRAM             FLASH             NVRAM
Module Total   Used   Free   Total   Used   Free   Total Used Free
——————-———-———-———-———-———-——-——-——-
1     65408K 36910K 28498K 16384K   8741K   7643K 512K 271K 241K

Uptime is 2722 days, 20 hours, 34 minutes

Brian Cantor Jul 25, 2008

switch (enable) sho ver
WS-C5000 Software, Version McpSW: 3.2(2) NmpSW: 3.2(2)
Copyright (c) 1995-1998 by Cisco Systems
NMP S/W compiled on Aug 7 1998, 11:43:53
MCP S/W compiled on Aug 07 1998, 11:47:44

System Bootstrap Version: 2.1

Hardware Version: 2.0 Model: WS-C5000 Serial #: 004993573

Module Ports Model     Serial #  Hw   Fw     Fw1   Sw
—————-—————————-——————-———-——————————
1     2   WS-X5009   004993573 2.0   2.1   2.1(4)  3.2(2)
2     12   WS-X5213   001900817 1.2   1.4         3.2(2)
3     12   WS-X5213   001900731 1.2   1.4         3.2(2)
4     12   WS-X5213   001900917 1.2   1.4         3.2(2)

    DRAM             FLASH             NVRAM
Module Total   Used   Free   Total   Used   Free   Total Used Free
——————-———-———-———-———-———-——-——-——-
1     20480K   8316K 12164K   4096K   3584K   512K 256K 117K 139K

Uptime is 3450 days, 19 hours, 45 minutes

kgraham Aug 8, 2008

...can be said about our Nexus 7000 —
  I would venture that they were both
  designed with similar principles in
  mind.

Does 497 days sound familiar?

    That being said, the Law of
    Entropy is alive and well in
    the data center, so anytime
    you see numbers like these,
    the people that keep these
    systems up and running deserve
    a nod….

Hardly. They deserve a serious questioning as to how they are properly managing their systems. This is usually the result of gross neglect, not solid administration policies. Huge uptimes are always a neat novelty, but novelties don’t have a place in production.

  * Is the system configuration and functionality similar to currently available hardware? Can existing functionality be duplicated quickly and efficiently without compromising availability?
  * How will resources be allocated to repair the inevitable failure of this component? Have budgets been pre-allocated for a rapid replacement?
  * What methodology is used to identify ongoing security risks and vulnerabilities to the code being run?
  * Are current networking best practices applicable to this device? If you’ve made an exception, where else do you let standardization slip?
  * Are current security best practices even applicable? (I’m guessing you’re not running ssh or snmpv3; what additional compensating infrastructure have you had to introduce to continue to facilitate this device?
  * With no current shipping product running this operating system, how much additional time has been allocated to ensure that staff are able to adequately manage it and ensure consistency with the rest of the environment?
  * Did you explicitly call out this device in your most recent security audits? If you omitted it because it was a nifty toy how many other nifty toys are you excluding?

Omar Sultan Aug 8, 2008

kgraham:

While this thread was never meant to be an exhaustive study of operational best practices, that being said, I only asked folks for a snippet of information, so I am not sure its all that fair to extrapolate that into an overall discussion of operational rigor. 

That being said, you do bring up a good point.  There is an interesting dichotomy at work.  While we demand systems to never go down, we also expect them to be dynamic and adapt to changing needs, so a system (server, switch, whatever) can no longer be snapshot of the day it was last booted.

As Timothy pointed out earlier, this is one of the reasons IBM (or DEC in their day) were successful—bulletproof hardware with an OS that supports this dichotomy.  As I mentioned in the initial post, the N7K follows a similar path.

I have to admit, the security angle is not one I had paid much attention to, but you bring up good points.  I guess the current hubbub over DNS vulnerabilities is an example of that.

kgraham Aug 8, 2008

Omar—yep, its just a reminder that in most of the common cases I’m familiar with, long uptime is indicative of sloppy, not adept, operational practices. There’s the geeky “how long can I keep this cool thing going” and the conservative management “don’t touch it if its working”, both of which are widely accepted but harmful to long-term reliable operation. (I’ll take a wild guess and say that the case of those two unix boxes and Cat5k cited originally hadn’t seen a kernel patch or confirmed a clean startup process in a very long time.)

As the IT industry starts to rediscover clustering and partitioning (I think I’ve heard the word “virtualization” mentioned a lot this time around), the definition of the “uptime” counter is again thoroughly misleading. Some of the earlier posts mentioned it as well, but if there’s a slippery slope just past anything more than “time since the running kernel initialized itself”. 12.2SX phrases it well, referring to “continuous operation of this forwarding plane” while not obfuscating what “we all know” is the uptime:

test6504 uptime is 2 years, 5 weeks, 4 days, 19 hours, 37 minutes
Uptime for this control processor is 19 weeks, 6 days, 20 hours, 2 minutes

...though replace this with a VSS or 3750 stack, or dive in further, especially with distributed components and virtualized layers, it really becomes impossible to express in a number (which really _is_ cool).

Omar Sultan Aug 15, 2008

kgraham:

You make some great points—I do agree with your last point that these metrics will lose relevance as we start to measure different things with the evolution of infrastructure.

In general, we find the hardest part of data center transformation is changing behavior—the same seems to be true in this area.  What do you think would help change behavior and people’s thinking?

Andy Kosela Sep 19, 2008

I don’t think anyone can beat HP’s OpenVMS - 10 years of uptime!

http://www.openvms.org/stories.php?story=06/01/08/4531954

Alessandro Asson Oct 2, 2008

>So what is the best system
>uptime in your data center?

Our old internet border gateway:

IOS (tm) 7200 Software (C7200-IS-M), Version 12.2(15)T7,  RELEASE SOFTWARE (fc2)
TAC Support: http://www.cisco.com/tac
Copyright (c) 1986-2003 by cisco Systems, Inc.

ROM: System Bootstrap, Version 12.2(8r)B, RELEASE SOFTWARE (fc1)
BOOTLDR: 7200 Software (C7200-KBOOT-M), Version 12.2(4)BW, EARLY DEPLOYMENT RELEASE SOFTWARE (fc1)

b01.cineca.net uptime is 4 years, 13 weeks, 5 hours, 29 minutes
System returned to ROM by reload at 10:40:08 MET-DST Sun Jul 4 2004
System restarted at 10:42:16 MET-DST Sun Jul 4 2004

Alessandro Asson Oct 2, 2008

>So what is the best system
>uptime in your data center?

Our old Internet border gateway is:

b01.cineca.net#sh ver
IOS (tm) 7200 Software (C7200-IS-M), Version 12.2(15)T7,  RELEASE SOFTWARE (fc2)
TAC Support: http://www.cisco.com/tac
Copyright (c) 1986-2003 by cisco Systems, Inc.

ROM: System Bootstrap, Version 12.2(8r)B, RELEASE SOFTWARE (fc1)
BOOTLDR: 7200 Software (C7200-KBOOT-M), Version 12.2(4)BW, EARLY DEPLOYMENT RELEASE SOFTWARE (fc1)

b01.cineca.net uptime is 4 years, 13 weeks, 5 hours, 18 minutes
System returned to ROM by reload at 10:40:08 MET-DST Sun Jul 4 2004
System restarted at 10:42:16 MET-DST Sun Jul 4 2004

Andy Pettica Sep 13, 2009

Lotta love 11.2, token-tastic.  10 years and counting….

Cisco Internetwork Operating System Software
IOS (tm) 4500 Software (C4500-IS-M), Version 11.2(17), RELEASE SOFTWARE (fc1)
Copyright (c) 1986-1999 by cisco Systems, Inc.
Compiled Mon 04-Jan-99 18:18 by ashah
Image text-base: 0x600088A0, data-base: 0x60604000
ROM: System Bootstrap, Version 5.2(7) [rchiao 7], RELEASE SOFTWARE (fc1)
BOOTFLASH: 4500 Bootstrap Software (C4500-XBOOT), Version 10.2(7), RELEASE SOFTWARE (fc1)
XXXXXXXX uptime is 10 years, 6 weeks, 7 hours, 33 minutes
System restarted by reload at 02:44:34 AEST Fri Aug 6 1999
System image file is “flash:c4500-is-mz.112-17”, booted via flash
cisco 4700 (R4K) processor (revision B) with 32768K/16384K bytes of memory.
Processor board ID 02169217
R4600 processor, Implementation 32, Revision 2.0 (Level 2 Cache)
G.703/E1 software, Version 1.0.
Bridging software.
X.25 software, Version 2.0, NET2, BFE and GOSIP compliant.
2 Token Ring/IEEE 802.5 interface(s)
128K bytes of non-volatile configuration memory.
8192K bytes of processor board System flash (Read/Write)
4096K bytes of processor board Boot flash (Read/Write)
     
Configuration register is 0x2102

Evan Anderson Dec 27, 2009

I can’t beat 7+ years, or the 10+ years above, but I did just recently retire two Catalyst 3550-48 switches that showed the following on their last “show ver” output:

TICNTC013550S02 uptime is 6 years, 3 weeks, 6 days, 22 minutes

TICNTC013550S01 uptime is 6 years, 3 weeks, 5 days, 23 hours, 51 minutes

These were in a UPS-backed wiring closet installed in December 2003 in support of a VoIP deployment. I checked my trouble ticket logs and found that no tickets were ever logged re: these switches, either. As far as I can tell they’re fully functional (no bad ports, etc).

1 Trackback

Adventures in systems land Jul 24, 2008

Now here’s an interesting uptime challenge the utilization is so low as to not warrant the electricity they’ve used. Rather than an uptime boast, these systems seem like a great opportunity for a green datacenter consolidation and save the electricity! ...

Post a comment

Join the conversation!

We encourage your comments, questions and suggestions. All comments are moderated and will appear as soon as they are approved by the moderator.

Please increase the validity of your comment by providing a valid first and last name. Spam, off-topic or offensive comments will not be posted.

Name:
Email:
URL:

Comments:

Notify me of follow-up comments?

Submit the word you see below:


Post a trackback

Ping this URL to post a trackback:
http://blogs.cisco.com/trackback/6175/quA07ZnF/

More blog posts

Previous post:
Nice Summary on DC Dynamics Conference

Next post:
Mark Your Calendar

Recent posts:
February 2010 Archive