CASE.EDU:    HOME | DIRECTORIES | SEARCH

Emergency Maintenance

Emergency Maintenance: Redundant Internet testing - CWRU4 upgrade

Problem:   Redundant Internet testing - CWRU4 upgrade
Cause:     Primary switch IOS upgrade
Affects:   See note
Started:   07/02/2009 11:30 PM
Resolved:  07/03/2009 01:00 AM

Notes:

A reboot of the primary switch is required for this work. Campus network is running on backup link so this upgrade will not affect the campus network. The only connection that will feel the impact is the UH link to CASE. This work should last not more than 30min if it happens as planned.


Created: 07/02/2009 23:26:27 by roo

Updates:


Emergency Maintenance: Shutodwn of blog server

Problem:   Shutodwn of blog server
Cause:     Replacement of a failed redundant power supply
Affects:   Blog users
Started:   07/02/2009 05:00 AM
Resolved:  07/02/2009 06:00 AM

Notes:

We are shutting the system down to replace the power supply so that we can eliminate the potential for the system going down during high load hours. We are reserving the entire hour but expect that the power supply replacement will take somewhat less time.


Created: 07/01/2009 17:15:17 by dak

Updates:


Emergency Maintenance: Adjust VPN / Wireless Firewall Default Routes

Problem:   Adjust VPN / Wireless Firewall Default Routes
Cause:     Scheduled Maintenance For Failover
Affects:   VPN Users and Guest Wireless
Started:   07/02/2009 10:30 PM
Resolved:  07/03/2009 03:00 AM

Notes:

The default routes of VPN and guest wireless devices must be adjusted to allow their current default router to be upgraded during failover testing. ~ Ozanich
Continued development of VPN and guest wireless to align them with fail over conditions. ~ Ozanich


Created: 07/01/2009 12:30:52 by jxo63

Updates: 07/02/2009 22:34:37 by jxo63


Emergency Maintenance: Backup system is down

Problem:   Backup system is down
Cause:     ?
Affects:   no backup
Started:   06/30/2009 02:06 PM
Resolved:  06/30/2009 03:00 PM

Notes:

restarting


Created: 06/30/2009 14:07:14 by dxi16

Updates: 06/30/2009 14:11:00 by dxi16


Emergency Maintenance: Backup system is down

Problem:   Backup system is down
Cause:     ?
Affects:   no backup
Started:   06/29/2009 11:05 AM
Resolved:  06/29/2009 12:05 PM

Notes:

restarting


Created: 06/29/2009 11:07:01 by dxi16

Updates:


Read more Emergency Maintenance posts. Subscribe

Problem Report

Problem Report: Veale-m1-e1

Problem:   Veale-m1-e1
Cause:     Unknown
Affects:   Wired, wireless and phone connection in Veale
Started:   07/01/2009 03:00 PM
Resolved:  07/02/2009 07:00 AM

Notes:

[2009, July 2nd., Thursday, 08:45 AM]
I restored the power to the Emerson/Liebert UPS; and
I moved the power to the analog voice gateway,
Veale-M1-VG248, back to the Emerson/Liebert UPS; and
I moved the power to the Cisco Catalyst Switch,
Veale-M1-E1, back to the Emerson/Liebert UPS, and
I moved the power for the two Power Supplies,
one-by-one, or only one-at-a-time.
[2009, July 2nd., Thursday, 07:00 AM]
I restored the power to the analog voice gateway,
Veale-M1-VG248, and to the Cisco Catalyst Switch,
Veale-M1-E1; and an HVAC technician is now looking for,
to try to restore the power to the Emerson/Liebert UPS,
Veale-M1-U1.
[2009, July 1st., Wednesday, 10:00 PM]
A reboot of the switch occurred at around three o'clock
this afternoon when the UPS was upgraded,
though switch returned to normal operation,
it soon failed three hours later - investigating.


Created: 07/01/2009 22:16:21 by roo

Updates: 07/02/2009 07:06:42 by euw, 07/02/2009 08:41:10 by euw


Problem Report: Voice mail is down

Problem:   Voice mail is down
Cause:     unknown
Affects:   all voice mail service
Started:   06/26/2009 05:13 PM
Resolved:  06/26/2009 06:43 PM

Notes:

Reboot server restore connectivity. Engineer is investigating the cause of the problem..

Voice mail is unavailable. The cause is unknown at this time. Engineers have been notified.


Created: 06/26/2009 17:15:50 by man27

Updates: 06/26/2009 18:43:17 by wxc16


Problem Report: Glaser cisco switch

Problem:   Glaser cisco switch
Cause:     rebooted for unknown reasons
Affects:   wired/wireless data, I/P phones
Started:   06/26/2009 02:11 PM
Resolved:  06/26/2009 02:14 PM

Notes:

No room for the log file. did not write it out.


Created: 06/26/2009 14:51:36 by jhm

Updates:


Problem Report: Pathology-p3-e1 cooling issue

Problem:   Pathology-p3-e1 cooling issue
Cause:     HVAC issues
Affects:   network equipment
Started:   06/22/2009 02:58 PM
Resolved:  

Notes:

According to Facility, HVAC to the SER had to be shutdown to fix flooding in the building.
There is minimal impact at the moment but If prolonged high temperature in the SER, it will affect the network equipment causing potential outage to Wired, wireless, phone and security panels in pathology.


Created: 06/23/2009 09:04:48 by roo

Updates:


Problem Report: ccsbppo-m1-e1

Problem:   ccsbppo-m1-e1
Cause:     UPS failure
Affects:   Wired, wireless and phones in ccsbppo
Started:   06/21/2009 05:40 PM
Resolved:  06/22/2009 10:30 AM

Notes:

Replaced two switch power supplies, switch is now functional. Wired, wireless and phones restored.
   

Replaced CCSB/PPO-M1-U1's PD-002.

Investigating and troubleshooting.


Created: 06/22/2009 08:34:20 by roo

Updates: 06/22/2009 09:48:17 by euw, 06/22/2009 10:39:42 by roo


Problem Report: Bingham Hub

Problem:   Bingham Hub
Cause:     Cooling problem in Hub
Affects:   Wired, wireless, phones, security panels for several buildings on south side
Started:   06/20/2009 02:48 AM
Resolved:  

Notes:

2009, June 24, 08:15 AM, we restored the back-up link
between the Bingham Hub and the KSL Data Center.

2009, June 24, 06:30 AM, we restored the main link
between the Bingham Hub and the Crawford Data Center,
which restored all network Connections and Connectivity,
for the South Campus, with regards to the Bingham Hub;
however, the back-up link between the Bingham Hub and
the KSL Data Center, it is still down, and it will need
to be further investigated.

June 24, 04:45 AM Lost several building networks because the AC failed again. Facility on the way.

June 22, 03:00 PM Hub experiencing cooling issues again. Plant services have been called. The SER AC keeps shutting down.

11:52PM: The line cards in the hubs have started experiencing temperature failure again. Called plant services to look into it. looks like the AC keeps tripping off.

03:30 am Update: Plan services is currently working on the SER cooling. The switch line cards are recovering from Temperature failure.
Jun 20 03:33:45 EDT: %C6KENV-SP-4-MINORTEMPALARMRECOVER: module 9 outlet temperature crossed threshold #1(=60C). It has returned to normal operating temperature range.

Services are coming back online.
Still monitoring.

Investigating.
Called Plant services to check cooling.
Suspect failed cooling in SER.
bingham-h0-e1>show mod
Mod MAC addresses Hw Fw Sw Status
--- ---------------------------------- ------ ------------ ------------ -------
   1 0009.11f7.e830 to 0009.11f7.e83f 1.0 7.2(1) 8.5(0.46)RFW MinFail
   2 000c.ceb5.a900 to 000c.ceb5.a90f 1.0 7.2(1) 8.5(0.46)RFW MinFail
   3 000c.ceb5.aa40 to 000c.ceb5.aa4f 1.0 7.2(1) 8.5(0.46)RFW MinFail
   4 0003.feac.7772 to 0003.feac.7779 2.0 7.2(1) 3.5(1) Ok
   5 000c.ce63.e864 to 000c.ce63.e867 2.1 7.7(1) 12.2(18)SXF1 Ok
   9 000d.6550.b866 to 000d.6550.b869 1.1 12.2(14r)S5 12.2(18)SXF1 MinFail

Mod Sub-Module Model Serial Hw Status
---- --------------------------- ------------------ ----------- ------- -------
   1 Distributed Forwarding Card WS-F6K-DFC3A SAD072004CL 1.0 MinFail
   2 Distributed Forwarding Card WS-F6K-DFC3A SAD072300XR 1.0 MinFail
   3 Distributed Forwarding Card WS-F6K-DFC3A SAD072004BU 1.0 MinFail
   5 Policy Feature Card 3 WS-F6K-PFC3A SAD072100G1 1.1 Ok
   5 MSFC3 Daughterboard WS-SUP720 SAD072100JS 1.2 Ok
   9 Distributed Forwarding Card WS-F6700-DFC3A SAD074805CH 1.0 MinFail

bingham-h0-e1>


Created: 06/20/2009 02:52:45 by roo

Updates: 06/20/2009 03:37:16 by roo, 06/20/2009 23:51:59 by roo, 06/22/2009 15:38:35 by roo, 06/24/2009 06:09:17 by roo, 06/24/2009 06:54:11 by euw, 06/24/2009 08:19:44 by euw


Problem Report: Email to Hotmail is blocked for 24 hours

Problem:   Email to Hotmail is blocked for 24 hours
Cause:     Compromised account spamming hotmail
Affects:   Any email sent to a hotmail address
Started:   06/17/2009 12:09 AM
Resolved:  06/18/2009 12:09 AM

Notes:

A user's account was compromised (phishing scam) and used to send Spam to Hotmail.Hotmail then blocked all email from Case for 24 hours. I told them the problem was resolved on our end but they said they can not do anything to speed up the unblocking.

Since the blocks were not lifted at 9:09 our time I assume they meant pacific time. :( Times updated.


Created: 06/18/2009 08:16:22 by emr

Updates: 06/18/2009 09:19:10 by emr


Problem Report: ERP Financials server outage

Problem:   ERP Financials server outage 
Cause:     Hardware error
Affects:   Provided Financial ERP services
Started:   06/15/2009 08:00 AM
Resolved:  

Notes:

Received report the financials process server was unavailable. Server Engeneering staff responded onsite and the server was reporting a hard disk error and hung.

Power-cycled the server and the services became available after the reboot completed. Services ran in a degraded state as the rebuild of the disk was running.

The rebuild of the hard disk failed, the hard drive was replaced, the rebuild was attempted again and failed.

Continuing to troubleshoot the issue.


Created: 06/17/2009 11:57:04 by rak7

Updates:


Problem Report: Michelson-M1-E1 UPS Failure

Problem:   Michelson-M1-E1 UPS Failure
Cause:     unknown
Affects:   No network, no wireless and no phones
Started:   06/17/2009 11:10 AM
Resolved:  

Notes:

2:30PM Switch has been restore.Phones and wireless restored. lost 2 data cards and 2 voice cards. In the process of recovering 96 faceplates.

Engineers are aware and are currenty working on the issue.


Created: 06/17/2009 11:25:40 by dmw132

Updates: 06/17/2009 15:33:57 by roo


Problem Report: UPS Failure

Problem:   UPS Failure
Cause:     Swtich Failure
Affects:   No network, no wireless and no phones
Started:   06/17/2009 11:10 AM
Resolved:  06/17/2009 11:38 AM

Notes:

Engineers are aware and are currenty working on the issue.

Forgot to put Michelson-M1-E1


Created: 06/17/2009 11:12:56 by dmw132

Updates: 06/17/2009 11:38:00 by dmw132


Problem Report: SSL VPN Services interrupt

Problem:   SSL VPN Services interrupt
Cause:     unknown
Affects:   SSL VPN Services (Cisco AnyConnect Client)
Started:   06/12/2009 08:00 AM
Resolved:  06/12/2009 10:30 AM

Notes:

SSL VPN Service was disabled in the VPN server. Corrected configuration. Problem resolved.


Created: 06/12/2009 11:45:49 by wxc16

Updates:


Problem Report: One of the Firewalls possibly failed this morning at about 05:45 AM.

Problem:   One of the Firewalls possibly failed this morning at about 05:45 AM.  
Cause:     unknown
Affects:   Traffic between on-campus and off-campus may have been affected.  
Started:   06/10/2009 05:45 AM
Resolved:  06/10/2009 06:55 AM

Notes:

firewall appear to fine now and vpn as well. confirmed from off campus. Web sites are also working fine now


Investigating.

ASA5540-1-Active-outside, ASA5540-2-StdBy-outside, FW1-ACTIVE-OUTSIDE-GUEST, FW1-STDBY-OUTSIDE-GUEST, CHECKPOINT-CASC-INSIDE


Created: 06/10/2009 06:37:49 by euw

Updates: 06/10/2009 06:55:19 by lxc152


Problem Report: Scholars house segemented from CCN

Problem:   Scholars house segemented from CCN
Cause:     Suspecting hardware failure.  
Affects:   No users are in this building for the summer.  
Started:   06/08/2009 04:39 PM
Resolved:  

Notes:

Suspecting supervisor module failure.

Investigating.


Created: 06/08/2009 16:57:59 by jhm

Updates: 06/08/2009 18:31:38 by roo, 06/09/2009 08:47:52 by euw


Problem Report: KSL Production Database was experiencing issues

Problem:   KSL Production Database was experiencing issues
Cause:     Unknown
Affects:   KSL Library applications
Started:   06/05/2009 09:00 AM
Resolved:  06/05/2009 09:20 AM

Notes:

The KSL Library application was experiencing problems this morning due to unknown issues. Working with the application development team it was decided to reboot the database to clear up the issue.


Created: 06/05/2009 09:32:00 by rxg263

Updates:


Problem Report: ISP Internet Issues

Problem:   ISP Internet Issues
Cause:     ISP not advertising routes to us
Affects:   intermittent connectivity
Started:   06/04/2009 04:25 PM
Resolved:  06/04/2009 10:05 PM

Notes:

Our ISP appears to not be advertising internet routes into our primary internet path. A soft reset of their routing tables appears to have failed. We are running partially on our backup path. OneCleveland engineers are investigating the problem. Sensitive traffic, such as video and some web traffic will be most likely affected.
   Changes, last night, to OneCleveland's network by their provider, Global Crossing, has caused a change in the dynamics of their routing which is affecting us. They will attempt to correct their routing issues at around 9:00-10:00PM tonight. ~ Ozanich
   OneCleveland has corrected their routing issue. Our routing tables are correctly populated and our traffic is symmetric again. ~ Ozanich


Created: 06/04/2009 16:29:27 by jxo63

Updates: 06/04/2009 17:27:25 by jxo63, 06/05/2009 04:05:12 by jxo63


Problem Report: LDAP Replica (ldap-replica7) is down

Problem:   LDAP Replica (ldap-replica7) is down
Cause:     Seems to be a disk problem
Affects:   Only people pointed directly at ldap-replica7.cwru.edu
Started:   05/22/2009 11:35 AM
Resolved:  

Notes:

[5/22/09 1:30 PM] - A reboot seems to have brought the confused disk back so the LDAP replica is back in operation. We have called the vendor to have a look at the system however and are leaving it out of any access paths for the moment in case the system dies permanently. We will close this problem report when we have a definitive closure to the problem.

The disk on which the LDAP server binaries are stored appears to no longer be mounted on the system (this is a local disk). Server Engineering is looking into the issue.

The system is one of several redundant LDAP replicas. The only applications affected will be those who are pointed directly at this particular LDAP replica.


Created: 05/22/2009 12:24:57 by dak

Updates: 05/22/2009 13:27:28 by dak


Read more Problem Report posts. Subscribe

Scheduled Maintenance

Scheduled Maintenance: blackboard.case.edu downtime

Problem:   blackboard.case.edu downtime
Cause:     system maintenance
Affects:   all Blackboard users
Started:   06/24/2009 03:00 AM
Resolved:  06/24/2009 06:00 AM

Notes:

We are taking Blackboard down to get a clean, consistent copy of all data for our test environment.


Created: 06/23/2009 11:00:31 by prj

Updates:


Scheduled Maintenance: Electrical Systems Testing

Problem:   Electrical Systems Testing
Cause:     Data Center Commissioning
Affects:   All Systems
Started:   06/23/2009 03:00 AM
Resolved:  06/23/2009 06:00 AM

Notes:

Case Construction Administration will be conducting a full load test of the KSL and Crawford Data Center Electrical Systems and Generators.

Successful completion of the test will result in no system downtime. However, there is a risk for a full power outage in either data center during this maintenance window.


Created: 06/22/2009 15:12:46 by smo7

Updates:


Scheduled Maintenance: Redundent Internet Connection Test

Problem:   Redundent Internet Connection Test
Cause:     Test of the redundent internet connection
Affects:   Only multicast will be unavailable.
Started:   06/29/2009 03:00 AM
Resolved:  07/01/2009 03:00 AM

Notes:

This is a test of the redundent internet connnection. We will be sending all outgoing Internet traffic through secondary OneCommunity connection in Crawford. Multicast will not be available for the duration of the test.


Created: 06/10/2009 16:06:05 by man27

Updates:


Read more Scheduled Maintenance posts. Subscribe