Emergency Maintenance
Emergency Maintenance: Redundant Internet testing - CWRU4 upgrade
Problem: Redundant Internet testing - CWRU4 upgrade Cause: Primary switch IOS upgrade Affects: See note Started: 07/02/2009 11:30 PM Resolved: 07/03/2009 01:00 AM
Notes:
A reboot of the primary switch is required for this work. Campus network is running on backup link so this upgrade will not affect the campus network. The only connection that will feel the impact is the UH link to CASE. This work should last not more than 30min if it happens as planned.
Created: 07/02/2009 23:26:27 by roo
Updates:
Emergency Maintenance: Shutodwn of blog server
Problem: Shutodwn of blog server Cause: Replacement of a failed redundant power supply Affects: Blog users Started: 07/02/2009 05:00 AM Resolved: 07/02/2009 06:00 AM
Notes:
We are shutting the system down to replace the power supply so that we can eliminate the potential for the system going down during high load hours. We are reserving the entire hour but expect that the power supply replacement will take somewhat less time.
Created: 07/01/2009 17:15:17 by dak
Updates:
Emergency Maintenance: Adjust VPN / Wireless Firewall Default Routes
Problem: Adjust VPN / Wireless Firewall Default Routes Cause: Scheduled Maintenance For Failover Affects: VPN Users and Guest Wireless Started: 07/02/2009 10:30 PM Resolved: 07/03/2009 03:00 AM
Notes:
The default routes of VPN and guest wireless devices must be adjusted to allow their current default router to be upgraded during failover testing. ~ Ozanich
Continued development of VPN and guest wireless to align them with fail over conditions. ~ Ozanich
Created: 07/01/2009 12:30:52 by jxo63
Updates: 07/02/2009 22:34:37 by jxo63
Emergency Maintenance: Backup system is down
Problem: Backup system is down Cause: ? Affects: no backup Started: 06/30/2009 02:06 PM Resolved: 06/30/2009 03:00 PM
Notes:
restarting
Created: 06/30/2009 14:07:14 by dxi16
Updates: 06/30/2009 14:11:00 by dxi16
Emergency Maintenance: Backup system is down
Problem: Backup system is down Cause: ? Affects: no backup Started: 06/29/2009 11:05 AM Resolved: 06/29/2009 12:05 PM
Notes:
restarting
Created: 06/29/2009 11:07:01 by dxi16
Updates:
Problem Report
Problem Report: Veale-m1-e1
Problem: Veale-m1-e1 Cause: Unknown Affects: Wired, wireless and phone connection in Veale Started: 07/01/2009 03:00 PM Resolved: 07/02/2009 07:00 AM
Notes:
[2009, July 2nd., Thursday, 08:45 AM]
I restored the power to the Emerson/Liebert UPS; and
I moved the power to the analog voice gateway,
Veale-M1-VG248, back to the Emerson/Liebert UPS; and
I moved the power to the Cisco Catalyst Switch,
Veale-M1-E1, back to the Emerson/Liebert UPS, and
I moved the power for the two Power Supplies,
one-by-one, or only one-at-a-time.
[2009, July 2nd., Thursday, 07:00 AM]
I restored the power to the analog voice gateway,
Veale-M1-VG248, and to the Cisco Catalyst Switch,
Veale-M1-E1; and an HVAC technician is now looking for,
to try to restore the power to the Emerson/Liebert UPS,
Veale-M1-U1.
[2009, July 1st., Wednesday, 10:00 PM]
A reboot of the switch occurred at around three o'clock
this afternoon when the UPS was upgraded,
though switch returned to normal operation,
it soon failed three hours later - investigating.
Created: 07/01/2009 22:16:21 by roo
Updates: 07/02/2009 07:06:42 by euw, 07/02/2009 08:41:10 by euw
Problem Report: Voice mail is down
Problem: Voice mail is down Cause: unknown Affects: all voice mail service Started: 06/26/2009 05:13 PM Resolved: 06/26/2009 06:43 PM
Notes:
Reboot server restore connectivity. Engineer is investigating the cause of the problem..
Voice mail is unavailable. The cause is unknown at this time. Engineers have been notified.
Created: 06/26/2009 17:15:50 by man27
Updates: 06/26/2009 18:43:17 by wxc16
Problem Report: Glaser cisco switch
Problem: Glaser cisco switch Cause: rebooted for unknown reasons Affects: wired/wireless data, I/P phones Started: 06/26/2009 02:11 PM Resolved: 06/26/2009 02:14 PM
Notes:
No room for the log file. did not write it out.
Created: 06/26/2009 14:51:36 by jhm
Updates:
Problem Report: Pathology-p3-e1 cooling issue
Problem: Pathology-p3-e1 cooling issue Cause: HVAC issues Affects: network equipment Started: 06/22/2009 02:58 PM Resolved:
Notes:
According to Facility, HVAC to the SER had to be shutdown to fix flooding in the building.
There is minimal impact at the moment but If prolonged high temperature in the SER, it will affect the network equipment causing potential outage to Wired, wireless, phone and security panels in pathology.
Created: 06/23/2009 09:04:48 by roo
Updates:
Problem Report: ccsbppo-m1-e1
Problem: ccsbppo-m1-e1 Cause: UPS failure Affects: Wired, wireless and phones in ccsbppo Started: 06/21/2009 05:40 PM Resolved: 06/22/2009 10:30 AM
Notes:
Replaced two switch power supplies, switch is now functional. Wired, wireless and phones restored.
Replaced CCSB/PPO-M1-U1's PD-002.
Investigating and troubleshooting.
Created: 06/22/2009 08:34:20 by roo
Updates: 06/22/2009 09:48:17 by euw, 06/22/2009 10:39:42 by roo
Problem Report: Bingham Hub
Problem: Bingham Hub Cause: Cooling problem in Hub Affects: Wired, wireless, phones, security panels for several buildings on south side Started: 06/20/2009 02:48 AM Resolved:
Notes:
2009, June 24, 08:15 AM, we restored the back-up link
between the Bingham Hub and the KSL Data Center.
2009, June 24, 06:30 AM, we restored the main link
between the Bingham Hub and the Crawford Data Center,
which restored all network Connections and Connectivity,
for the South Campus, with regards to the Bingham Hub;
however, the back-up link between the Bingham Hub and
the KSL Data Center, it is still down, and it will need
to be further investigated.
June 24, 04:45 AM Lost several building networks because the AC failed again. Facility on the way.
June 22, 03:00 PM Hub experiencing cooling issues again. Plant services have been called. The SER AC keeps shutting down.
11:52PM: The line cards in the hubs have started experiencing temperature failure again. Called plant services to look into it. looks like the AC keeps tripping off.
03:30 am Update: Plan services is currently working on the SER cooling. The switch line cards are recovering from Temperature failure.
Jun 20 03:33:45 EDT: %C6KENV-SP-4-MINORTEMPALARMRECOVER: module 9 outlet temperature crossed threshold #1(=60C). It has returned to normal operating temperature range.
Services are coming back online.
Still monitoring.
Investigating.
Called Plant services to check cooling.
Suspect failed cooling in SER.
bingham-h0-e1>show mod
Mod MAC addresses Hw Fw Sw Status
--- ---------------------------------- ------ ------------ ------------ -------
1 0009.11f7.e830 to 0009.11f7.e83f 1.0 7.2(1) 8.5(0.46)RFW MinFail
2 000c.ceb5.a900 to 000c.ceb5.a90f 1.0 7.2(1) 8.5(0.46)RFW MinFail
3 000c.ceb5.aa40 to 000c.ceb5.aa4f 1.0 7.2(1) 8.5(0.46)RFW MinFail
4 0003.feac.7772 to 0003.feac.7779 2.0 7.2(1) 3.5(1) Ok
5 000c.ce63.e864 to 000c.ce63.e867 2.1 7.7(1) 12.2(18)SXF1 Ok
9 000d.6550.b866 to 000d.6550.b869 1.1 12.2(14r)S5 12.2(18)SXF1 MinFail
Mod Sub-Module Model Serial Hw Status
---- --------------------------- ------------------ ----------- ------- -------
1 Distributed Forwarding Card WS-F6K-DFC3A SAD072004CL 1.0 MinFail
2 Distributed Forwarding Card WS-F6K-DFC3A SAD072300XR 1.0 MinFail
3 Distributed Forwarding Card WS-F6K-DFC3A SAD072004BU 1.0 MinFail
5 Policy Feature Card 3 WS-F6K-PFC3A SAD072100G1 1.1 Ok
5 MSFC3 Daughterboard WS-SUP720 SAD072100JS 1.2 Ok
9 Distributed Forwarding Card WS-F6700-DFC3A SAD074805CH 1.0 MinFail
bingham-h0-e1>
Created: 06/20/2009 02:52:45 by roo
Updates: 06/20/2009 03:37:16 by roo, 06/20/2009 23:51:59 by roo, 06/22/2009 15:38:35 by roo, 06/24/2009 06:09:17 by roo, 06/24/2009 06:54:11 by euw, 06/24/2009 08:19:44 by euw
Problem Report: Email to Hotmail is blocked for 24 hours
Problem: Email to Hotmail is blocked for 24 hours Cause: Compromised account spamming hotmail Affects: Any email sent to a hotmail address Started: 06/17/2009 12:09 AM Resolved: 06/18/2009 12:09 AM
Notes:
A user's account was compromised (phishing scam) and used to send Spam to Hotmail.Hotmail then blocked all email from Case for 24 hours. I told them the problem was resolved on our end but they said they can not do anything to speed up the unblocking.
Since the blocks were not lifted at 9:09 our time I assume they meant pacific time. :( Times updated.
Created: 06/18/2009 08:16:22 by emr
Updates: 06/18/2009 09:19:10 by emr
Problem Report: ERP Financials server outage
Problem: ERP Financials server outage Cause: Hardware error Affects: Provided Financial ERP services Started: 06/15/2009 08:00 AM Resolved:
Notes:
Received report the financials process server was unavailable. Server Engeneering staff responded onsite and the server was reporting a hard disk error and hung.
Power-cycled the server and the services became available after the reboot completed. Services ran in a degraded state as the rebuild of the disk was running.
The rebuild of the hard disk failed, the hard drive was replaced, the rebuild was attempted again and failed.
Continuing to troubleshoot the issue.
Created: 06/17/2009 11:57:04 by rak7
Updates:
Problem Report: Michelson-M1-E1 UPS Failure
Problem: Michelson-M1-E1 UPS Failure Cause: unknown Affects: No network, no wireless and no phones Started: 06/17/2009 11:10 AM Resolved:
Notes:
2:30PM Switch has been restore.Phones and wireless restored. lost 2 data cards and 2 voice cards. In the process of recovering 96 faceplates.
Engineers are aware and are currenty working on the issue.
Created: 06/17/2009 11:25:40 by dmw132
Updates: 06/17/2009 15:33:57 by roo
Problem Report: UPS Failure
Problem: UPS Failure Cause: Swtich Failure Affects: No network, no wireless and no phones Started: 06/17/2009 11:10 AM Resolved: 06/17/2009 11:38 AM
Notes:
Engineers are aware and are currenty working on the issue.
Forgot to put Michelson-M1-E1
Created: 06/17/2009 11:12:56 by dmw132
Updates: 06/17/2009 11:38:00 by dmw132
Problem Report: SSL VPN Services interrupt
Problem: SSL VPN Services interrupt Cause: unknown Affects: SSL VPN Services (Cisco AnyConnect Client) Started: 06/12/2009 08:00 AM Resolved: 06/12/2009 10:30 AM
Notes:
SSL VPN Service was disabled in the VPN server. Corrected configuration. Problem resolved.
Created: 06/12/2009 11:45:49 by wxc16
Updates:
Problem Report: One of the Firewalls possibly failed this morning at about 05:45 AM.
Problem: One of the Firewalls possibly failed this morning at about 05:45 AM. Cause: unknown Affects: Traffic between on-campus and off-campus may have been affected. Started: 06/10/2009 05:45 AM Resolved: 06/10/2009 06:55 AM
Notes:
firewall appear to fine now and vpn as well. confirmed from off campus. Web sites are also working fine now
Investigating.
ASA5540-1-Active-outside, ASA5540-2-StdBy-outside, FW1-ACTIVE-OUTSIDE-GUEST, FW1-STDBY-OUTSIDE-GUEST, CHECKPOINT-CASC-INSIDE
Created: 06/10/2009 06:37:49 by euw
Updates: 06/10/2009 06:55:19 by lxc152
Problem Report: Scholars house segemented from CCN
Problem: Scholars house segemented from CCN Cause: Suspecting hardware failure. Affects: No users are in this building for the summer. Started: 06/08/2009 04:39 PM Resolved:
Notes:
Suspecting supervisor module failure.
Investigating.
Created: 06/08/2009 16:57:59 by jhm
Updates: 06/08/2009 18:31:38 by roo, 06/09/2009 08:47:52 by euw
Problem Report: KSL Production Database was experiencing issues
Problem: KSL Production Database was experiencing issues Cause: Unknown Affects: KSL Library applications Started: 06/05/2009 09:00 AM Resolved: 06/05/2009 09:20 AM
Notes:
The KSL Library application was experiencing problems this morning due to unknown issues. Working with the application development team it was decided to reboot the database to clear up the issue.
Created: 06/05/2009 09:32:00 by rxg263
Updates:
Problem Report: ISP Internet Issues
Problem: ISP Internet Issues Cause: ISP not advertising routes to us Affects: intermittent connectivity Started: 06/04/2009 04:25 PM Resolved: 06/04/2009 10:05 PM
Notes:
Our ISP appears to not be advertising internet routes into our primary internet path. A soft reset of their routing tables appears to have failed. We are running partially on our backup path. OneCleveland engineers are investigating the problem. Sensitive traffic, such as video and some web traffic will be most likely affected.
Changes, last night, to OneCleveland's network by their provider, Global Crossing, has caused a change in the dynamics of their routing which is affecting us. They will attempt to correct their routing issues at around 9:00-10:00PM tonight. ~ Ozanich
OneCleveland has corrected their routing issue. Our routing tables are correctly populated and our traffic is symmetric again. ~ Ozanich
Created: 06/04/2009 16:29:27 by jxo63
Updates: 06/04/2009 17:27:25 by jxo63, 06/05/2009 04:05:12 by jxo63
Problem Report: LDAP Replica (ldap-replica7) is down
Problem: LDAP Replica (ldap-replica7) is down Cause: Seems to be a disk problem Affects: Only people pointed directly at ldap-replica7.cwru.edu Started: 05/22/2009 11:35 AM Resolved:
Notes:
[5/22/09 1:30 PM] - A reboot seems to have brought the confused disk back so the LDAP replica is back in operation. We have called the vendor to have a look at the system however and are leaving it out of any access paths for the moment in case the system dies permanently. We will close this problem report when we have a definitive closure to the problem.
The disk on which the LDAP server binaries are stored appears to no longer be mounted on the system (this is a local disk). Server Engineering is looking into the issue.
The system is one of several redundant LDAP replicas. The only applications affected will be those who are pointed directly at this particular LDAP replica.
Created: 05/22/2009 12:24:57 by dak
Updates: 05/22/2009 13:27:28 by dak
Scheduled Maintenance
Scheduled Maintenance: blackboard.case.edu downtime
Problem: blackboard.case.edu downtime Cause: system maintenance Affects: all Blackboard users Started: 06/24/2009 03:00 AM Resolved: 06/24/2009 06:00 AM
Notes:
We are taking Blackboard down to get a clean, consistent copy of all data for our test environment.
Created: 06/23/2009 11:00:31 by prj
Updates:
Scheduled Maintenance: Electrical Systems Testing
Problem: Electrical Systems Testing Cause: Data Center Commissioning Affects: All Systems Started: 06/23/2009 03:00 AM Resolved: 06/23/2009 06:00 AM
Notes:
Case Construction Administration will be conducting a full load test of the KSL and Crawford Data Center Electrical Systems and Generators.
Successful completion of the test will result in no system downtime. However, there is a risk for a full power outage in either data center during this maintenance window.
Created: 06/22/2009 15:12:46 by smo7
Updates:
Scheduled Maintenance: Redundent Internet Connection Test
Problem: Redundent Internet Connection Test Cause: Test of the redundent internet connection Affects: Only multicast will be unavailable. Started: 06/29/2009 03:00 AM Resolved: 07/01/2009 03:00 AM
Notes:
This is a test of the redundent internet connnection. We will be sending all outgoing Internet traffic through secondary OneCommunity connection in Crawford. Multicast will not be available for the duration of the test.
Created: 06/10/2009 16:06:05 by man27
Updates:
