Emergency Maintenance
Emergency Maintenance: Language Learning Center server needs to be rebooted
Problem: Language Learning Center server needs to be rebooted Cause: Unknown Affects: users of LLC web services Started: 11/20/2009 03:00 PM Resolved: 11/20/2009 06:10 PM
Notes:
Services required for management are slow to respond or non-responsive. No indication that the services provided by the server are affected. Restart scheduled for 6:05pm.
Created: 11/20/2009 17:13:37 by dmm61
Updates:
Emergency Maintenance: Squire Valley Farm - Time Warner Maintenance
Problem: Squire Valley Farm - Time Warner Maintenance Cause: mandatory upgrade Affects: Squire Valley Farm Started: 11/23/2009 12:00 AM Resolved: 11/23/2009 06:00 AM
Notes:
Time Warner Cable will be performing emergency maintenance. Expected downtime is only 20 minutes - they are reserving the entire window.
Created: 11/20/2009 10:24:38 by jxo63
Updates:
Emergency Maintenance: PeoplsSoft Schedulers SADEV,SADMO,SAIPD and SATST not running on the Windows Server Kessel.
Problem: PeoplsSoft Schedulers SADEV,SADMO,SAIPD and SATST not running on the Windows Server Kessel. Cause: Unknown Affects: SADEV,SADMO,SAIPD and SATST windows schedulers Started: 11/19/2009 02:35 PM Resolved: 11/19/2009 02:45 PM
Notes:
The PeopleSoft Instances did not start up properly. Server was rebooted to clear up the issue.
Server came back up and services are running properly now.
Created: 11/19/2009 14:45:24 by tpw9
Updates:
Emergency Maintenance: ITS-Services Needs Restarted
Problem: ITS-Services Needs Restarted Cause: Apache Affects: ITS-Services Started: 11/16/2009 09:48 AM Resolved: 11/16/2009 09:50 AM
Notes:
Apache needs bounced to hopefully clear up some problems.
Created: 11/16/2009 09:49:50 by jms18
Updates:
Emergency Maintenance: Maintenance on KSLP
Problem: Maintenance on KSLP Cause: database contention Affects: KSL Production Library will be unavailable for 30 minutes Started: 11/16/2009 05:00 AM Resolved: 11/16/2009 05:30 AM
Notes:
There has been contention within the FEDORA application which resides in the KSLP database which is severely impacting performance. Maintenance will be performed Monday 11/16 from 5:00am - 5:30 am to address this issue.
During that time the KSLP database will not be available.
Created: 11/13/2009 15:34:51 by rxg263
Updates:
Emergency Maintenance: Development ERP Windows Server Needs rebooted.
Problem: Development ERP Windows Server Needs rebooted. Cause: Equipment maintenance Affects: Unknown Started: 10/12/2009 12:00 PM Resolved: 10/12/2009 12:30 PM
Notes:
The development Peoplesoft server Akita was having issues with the Symantec
Problem Report
Problem Report: Mail to off campus locations
Problem: Mail to off campus locations Cause: It may be some sort of network issue since this seems to be affected all off campus mail Affects: All off campus mail - maybe other off campus services Started: 11/22/2009 09:50 PM Resolved:
Notes:
[11/23/09 12:00AM] - After much searching by Network Engineering , it appears that the problem is some sort of issue with Google blocking connectinos with our mail servers. We have opened a priority trouble call with them. Although we haven't heard back from them, many of the servers are starting to empty their mail queues, so we are hopeful that they may be working on correcting the problem.
Network Engineering is working on the issue with OneCommunity.
Created: 11/22/2009 21:56:10 by dak
Updates: 11/22/2009 23:56:25 by dak
Problem Report: Backup DHCP server for VoIP offline
Problem: Backup DHCP server for VoIP offline Cause: Disk Failure Affects: No one Started: 11/20/2009 03:48 PM Resolved:
Notes:
The Backup Server Roo (VoIP) suffered a boot disk failure.
Server Engineers are aware of the problem and looking at the server.
Created: 11/20/2009 16:09:42 by dnd
Updates: 11/20/2009 16:12:28 by dnd
Problem Report: Non-ERP Databases down
Problem: Non-ERP Databases down Cause: Server crash - Reasin unknown at this time Affects: See list below Started: 11/20/2009 09:44 AM Resolved: 11/20/2009 12:00 PM
Notes:
[11/20/2009 12:30PM] - Services seem to have been moved back to the other server and most if not all the services have been restored.
The non-ERP Oracle server has crashed (again). Server Engineering is looking into the issue but we currently have no ETA for the server or databases to be up at this time.
Affected applications include (but are not limited to):
Ad Astra
BlackBoard (Course mgmt. system)
Dental School Clinic
Identity Management (user account/password changes)
IP address/host name management
E-mail lists
MyCase portal
Pinnacle (phone billing)
University Library web sites (not including EuclidPLUS/on-line catalog system)
DARS
ProSam (Financial Aid)
Encyclopedia of Cleveland History
Internal IT applications: Appworx, ChangeMan, Mashups, VirtualCenter
Created: 11/20/2009 09:55:02 by dak
Updates: 11/20/2009 12:40:44 by dak
Problem Report: Non-ERP Oracle databased down
Problem: Non-ERP Oracle databased down Cause: unknown (software panic/crash) Affects: many applications (see list below) Started: 11/19/2009 05:03 PM Resolved: 11/19/2009 07:30 PM
Notes:
UPDATE: all databases restarted by 7:15pm. Key applications (Blackboard, etc.) are being verified and/or re-started.
The non-ERP Oracle server has crashed (again...but not a hardware problem this time). ETA for service to be restored is 7:30pm.
Affected applications include (but are not limited to):
Ad Astra
BlackBoard (Course mgmt. system)
Dental School Clinic
Identity Management (user account/password changes)
IP address/host name management
E-mail lists
MyCase portal
Pinnacle (phone billing)
University Library web sites (not including EuclidPLUS/on-line catalog system)
DARS
ProSam (Financial Aid)
Encyclopedia of Cleveland History
Internal IT applications: Appworx, ChangeMan, Mashups, VirtualCenter
Created: 11/19/2009 18:22:30 by jan3
Updates: 11/19/2009 19:19:09 by jan3
Problem Report: Astra room scheduling application down
Problem: Astra room scheduling application down Cause: system drive full Affects: Room scheduling across campus Started: 11/19/2009 11:15 AM Resolved: 11/19/2009 12:10 PM
Notes:
Server started crashing as system drive filled after OS patching. Server Engineering caught the problem and cleared space, but system still had to be physically cycled to restore service.
Service restored.
Created: 11/19/2009 15:21:12 by tev1
Updates:
Problem Report: KSL Database was bounced to resolve an issue with Fedora
Problem: KSL Database was bounced to resolve an issue with Fedora Cause: Application contention on a Fedora table Affects: All KSL applications using the KSLP database Started: 11/19/2009 11:58 AM Resolved: 11/19/2009 11:59 AM
Notes:
Update: The Fedora problem returned again and the Database was bounced from 1:56pm to 1:57 pm. The KSL staff will refrain from making updates to the table which is at the root cause of the problem.
There was locking contention around a table in the Fedora Application. At the request of the KSL application support staff the database KSLP was bounced to resolve the issue.
IMPACT
The problem started affecting Fedora users from 11:32 am to 11:59 am.
All KSL users would have been impacted for 1 minute from 11:58 am to 11:59 am while the database was bounced.
Created: 11/19/2009 12:07:31 by rxg263
Updates: 11/19/2009 14:45:55 by rxg263
Problem Report: DB8 sftp to secure ftp server failures
Problem: DB8 sftp to secure ftp server failures Cause: The wrong ip address was provided for db8 Affects: sftp from db8 Started: 11/18/2009 08:29 AM Resolved: 11/18/2009 08:29 AM
Notes:
The firewall required an additional push to added the correct ip address of db8
Created: 11/18/2009 08:31:30 by lxc152
Updates:
Problem Report: Campus Attendant Console is down
Problem: Campus Attendant Console is down Cause: Campus Operator Answering Services is degraded (368-2000) Affects: Campus 3682000 answering time. Started: 11/17/2009 01:00 PM Resolved: 11/17/2009 04:58 PM
Notes:
Operator error found. Corrected problem.
Engineer is investigating the problem. Attendant Console software appears unstable on operator's PC.
Callers calling 368-2000 main campus number may experience delay.
Created: 11/17/2009 16:00:04 by wxc16
Updates: 11/17/2009 16:50:33 by wxc16
Problem Report: Non-ERP Oracle databases down
Problem: Non-ERP Oracle databases down Cause: system configuration issue Affects: (Many systems--see Notes for detailed list) Started: 11/11/2009 05:00 PM Resolved: 11/11/2009 08:28 PM
Notes:
UPDATE: databases were restored to service by 8:20pm. Applications are now being re-started (if needed).
After the hardware failures this morning (and in previous weeks), we moved the Oracle databases to a different machine. This machine was not yet fully configured to handle a production workload. We are now correcting the configuration issue that caused this outage; ETA to restore service is around 8:00pm.
Affected applications include (but are not limited to):
Ad Astra
BlackBoard (Course mgmt. system)
Dental School Clinic
Identity Management (user account/password changes)
IP address/host name management
E-mail lists
MyCase portal
Pinnacle (phone billing)
University Library web sites (not including EuclidPLUS/on-line catalog system)
DARS
ProSam (Financial Aid)
Encyclopedia of Cleveland History
Internal IT applications: Appworx, ChangeMan, Mashups, VirtualCenter
Created: 11/11/2009 18:43:05 by jan3
Updates: 11/11/2009 20:28:40 by jan3
Problem Report: DB7 is down
Problem: DB7 is down Cause: Hardware Problems Affects: Multiple Aplications Started: 11/11/2009 02:32 AM Resolved: 11/11/2009 12:45 PM
Notes:
Resolved: All Databases have been successfully failed over to DB8.
Update at 10:00 am: While attempting to restore all databases on DB7 the server crashed again. Server Engineering is no in the process of moving all of the databases to DB8. There is no ETA.
DB7 crashed at 2:30 am. due to hardware problems. Server Engineering is investigating. ETA is 9:30 am.
This affects the following Databases and applications:
Advp, Apexp, appworxp, astrap, aurorap, blackboard, Degree Audit Reporting System, dashboard, dental school, IP Self Registration for computers, isisp, KSL Oracle Database, Mail Group, New My Case Portal, Oracle Internet Directory, onbasep (Financial Aid Application), oncorep, onregp, personp, Pinnacle Telephone Application, prosamp, prtlp, sccp, Serena Change Man, t2Parking, Unified Messaging, and WebEvent.
Because the Oracle Internet directory is on DB7 any database connections using LDAP may also be affected. These include:
Password reset/change
Account activation
Mail List Manager Sympa
Any system registration, static IP, hostname and other IP changes
Case WARN sign-up (actual notifications not affected)
ITS/Help Desk internal tools (lookup, changin user, etc).
Created: 11/11/2009 07:42:42 by rxg263
Updates: 11/11/2009 09:57:33 by rxg263, 11/11/2009 12:44:40 by rxg263
Problem Report: KSL CRAC DC-3, CRAC DC-4, CRAC DC-5 are not working temperature is at 89 degrees.
Problem: KSL CRAC DC-3, CRAC DC-4, CRAC DC-5 are not working temperature is at 89 degrees. Cause: Main pump is inoperable. Affects: All servers in the server room. Started: 11/11/2009 03:15 AM Resolved: 11/11/2009 09:52 AM
Notes:
KSL CRAC DC-3, CRAC DC-4, CRAC DC-5 are not working, temperature is at 89 degrees. Facilities are on site since 4:30 a.m. trying to get to the root of the problem. Technician is still here, will bring a portable a/c unit if available into the Data Center to help cool the area. Liebert has been called. The CRAC units are up and running again at 9:45 a.m.,there was a low coolant problem, the main pump is working again.
Created: 11/11/2009 06:41:09 by lab17
Updates: 11/11/2009 06:50:10 by lab17, 11/11/2009 08:28:06 by lab17
Problem Report: Novell server Pulitzer down
Problem: Novell server Pulitzer down Cause: unknown Affects: all users of Pulitzer Started: 11/07/2009 02:00 PM Resolved: 11/08/2009 01:45 AM
Notes:
hard crash; power-cycled system
Created: 11/08/2009 01:56:32 by jan3
Updates:
Problem Report: backup system down
Problem: backup system down Cause: primary server crashed (root cause unknown) Affects: users of the Legato Networker system Started: 11/06/2009 11:45 PM Resolved: 11/07/2009 02:05 PM
Notes:
system rebooted.
Created: 11/07/2009 14:17:58 by jan3
Updates:
Problem Report: Database error on LDAP replica (ldap-replica7)
Problem: Database error on LDAP replica (ldap-replica7) Cause: Unknown Affects: Direct users of LDAP and downstream applications such as Single Sign On Started: 11/07/2009 12:10 AM Resolved: 11/07/2009 10:15 AM
Notes:
The LDAP replica was complaining about problems with its internal user record database and was not serving information to any application querying it. One of the affected applications was the Single Sign On (SSO) service which would hang waiting for information that would never come. Stopping and restarting the replica caused it to rebuild its internal database which appears to have corrected the problem, in addition to clearing up the downstream issues (SSO).
We will continue to monitor the system throughout the weekend to make sure the issue does not reoccur.
Created: 11/07/2009 10:51:58 by dak
Updates:
Problem Report: Google Mail Issue
Problem: Google Mail Issue Cause: Google Minor Service Outage Affects: May not have affected anyone on Case campus Started: 11/01/2009 01:15 PM Resolved: 11/01/2009 08:30 PM
Notes:
Google experienced a service disruption that affected less that 0.001% of the GMail users. While the disruption was occurring, affected users were unable to access their mail.
While we have received no reports of outages for the Case campus, we are posting this report to notify our clients that a VERY minor disruption was experienced by Google.
Created: 11/02/2009 07:01:21 by dak
Updates:
Problem Report: Campus Unable to Login
Problem: Campus Unable to Login Cause: unknown Affects: user unable to access any systems Started: 11/01/2009 02:11 AM Resolved: 11/01/2009 02:40 AM
Notes:
[11/01/2009 9:30 AM] - By the time we had a chance to look into the issue it had resolved itself. We are guessing that it had something to do with the time change and computers not quite in sync as far a their clocks are concerned. Verified that the issue was resolved with the Help Desk (they were able to reach the Case-provided tools again).
Helpdesk called to notify me that users across campus receiving "Your session has expired." errors on all systems. Notified Dave Kovacic - currently looking into issue.
Created: 11/01/2009 02:13:11 by jxo63
Updates: 11/01/2009 09:36:21 by dak
Problem Report: Database Server (DB7) is down
Problem: Database Server (DB7) is down Cause: It looks like another bad memory module Affects: Several databases and the services that use them, see below Started: 10/30/2009 11:55 AM Resolved: 10/30/2009 04:30 PM
Notes:
The databases are back up and most services are back as well.
The system has bene rebooted an is on its way back up however it may take some time to bring all the databases back up
Affected applications include:
Advp, Apexp, appworxp, astrap, aurorap, blackboard, Degree Audit Reporting System, dashboard, dental school, IP Self Registration for computers, isisp, KSL Oracle Database, Mail Group, New My Case Portal, Oracle Internet Directory, onbasep (Financial Aid Application), oncorep, onregp, personp, Pinnacle Telephone Application, prosamp, prtlp, sccp, Serena Change Man, t2Parking, Unified Messaging, and WebEvent.
Because the Oracle Internet directory is on DB7 any database connections using LDAP may also be down.
This includes:
Password change/reset
Account activation
System registration tools
Mail list manager (Sympa)
Created: 10/30/2009 12:23:53 by dak
Updates: 10/30/2009 16:24:48 by dak
Problem Report: CDC CRAC 2 ALARM
Problem: CDC CRAC 2 ALARM Cause: unknown Affects: CDC Started: 10/19/2009 10:00 AM Resolved:
Notes:
This morning, the Facility Maintenance has been contacted.
Created: 10/19/2009 13:28:02 by euw
Updates:
Problem Report: Degraded Internet connectivity
Problem: Degraded Internet connectivity Cause: Internet routing problem in the global crossing network Affects: all internet services web browsing, email Started: 10/01/2009 02:24 PM Resolved:
Notes:
UPDATE: 10/1 19:16 ISP able to resolve routing issue at their end. Internet Connectivity restored. Engineers will continue to monitor the Case network.
UPDATE: 10/1 18:41 ISP is continuing working on the problem. End-users may have problem reaching the Internet. Offcampus users may also have problem reaching the Case network. No ETA at this point.
The problem is beyond our Internet provider.
The problem appears to be on the global crossing network
OUR ISP has a ticket open with Global crossing on this ticket.
local CASE engineers are monitoring this issue.
degraded access to cnn and msnbc have been noted.
we will advise as more information become available
Created: 10/01/2009 14:30:11 by lxc152
Updates: 10/01/2009 18:45:30 by wxc16, 10/01/2009 18:48:46 by wxc16, 10/01/2009 19:20:35 by wxc16
Problem Report: Degraded Internet Connectivity
Problem: Degraded Internet Connectivity Cause: Unexpected ISP maintenance issue Affects: Case Network Connecivity to the Internet Started: 09/30/2009 04:16 PM Resolved:
Notes:
UPDATE: 10/1 19:16 ISP able to resolve routing issue at their end. Internet Connectivity restored. Engineers will continue to monitor the Case network.
10/1 14:25 9/30 17:01 9/30 16:37 Engineers are aware of some intermittent Internet connectivity issue. No problem found with network connection inside campus. End users may experience problem connecting to the Internet. It appears to be problem at the ISP's end. Engineers are contacting ISP support.
Created: 09/30/2009 16:18:55 by wxc16 Updates: 09/30/2009 16:41:29 by wxc16, 09/30/2009 17:05:38 by wxc16, 10/01/2009 14:20:46 by lxc152, 10/01/2009 14:30:16 by wxc16, 10/01/2009 19:20:43 by wxc16 Engineers are investigating
Created: 09/25/2009 07:38:37 by man27 Updates: Engineer got reports regarding wireless outage throughout campus. Engineer is investigating.
Created: 08/31/2009 13:48:57 by wxc16 Updates: VG248 did not come back after recycle of power.
Created: 08/11/2009 10:21:57 by jhm Updates: 08/11/2009 11:00:54 by jhm Engineers are investigating the problem
Created: 07/30/2009 09:31:06 by lxc152 Updates: 07/30/2009 09:37:35 by man27 User may experience zero network connectivity after he or she established a Case VPN session. Suspect VPN server's threat detection erroneously dropped the returning traffic to the user. VPN Server's Threat Detection restarted. Engineer continue to monitor.
Created: 07/24/2009 18:11:27 by wxc16 Updates: Voicemail system's hardwares have been replaced. Voicemail system's software have been upgraded to version 2.4. Delayed LDAP response no longer seen in the new version of software.
Voicemail System has return back to normal performance. Engineer will continue monitor the system.
User may experiencing delay when trying to retrieve Voicemail messages via the telephone. User may experience up to 30 sec of delay after he or she enter the Passcode before the system responds / plays user's voicemail messages.
Created: 07/24/2009 17:33:39 by wxc16 Updates: 07/30/2009 17:14:39 by wxc16, 08/07/2009 22:19:50 by wxc16 The mailing list server is down because the database server crashed and Sympa can not run without a database.
Created: 07/15/2009 12:06:49 by emr Updates: 07/15/2009 15:47:12 by emr The Unix Server DB7 is currently experiencing problems and rebooted at approx. 10:45 am. This currently affects 40 Oracle databases including Blackboard, Advanced Contributions, All APEX Systems, APPWORX, Degree Audit, Dental School, KSL, ISIS, Pinnacle, Portal, MyCaseP, ONBASE, ONCORE, TeamTrack, T2Park, Serena. Server Engineering is currently working on the issue.
Created: 07/15/2009 11:41:13 by rxg263 Updates: 07/15/2009 12:04:11 by rxg263, 07/15/2009 12:10:27 by rxg263, 07/15/2009 15:10:23 by dxw134 Facility Maintenance has been notified, and
Created: 07/07/2009 11:55:30 by euw Updates: Resolved.
According to Facility, HVAC to the SER had to be shutdown to fix flooding in the building.
Created: 06/23/2009 09:04:48 by roo Updates: 07/21/2009 09:27:12 by roo Resolved and closed
2009, June 24, 08:15 AM, we restored the back-up link
2009, June 24, 06:30 AM, we restored the main link
June 24, 04:45 AM Lost several building networks because the AC failed again. Facility on the way.
June 22, 03:00 PM Hub experiencing cooling issues again. Plant services have been called. The SER AC keeps shutting down.
11:52PM: The line cards in the hubs have started experiencing temperature failure again. Called plant services to look into it. looks like the AC keeps tripping off.
03:30 am Update: Plan services is currently working on the SER cooling. The switch line cards are recovering from Temperature failure.
Investigating.
Created: 06/20/2009 02:52:45 by roo Updates: 06/20/2009 03:37:16 by roo, 06/20/2009 23:51:59 by roo, 06/22/2009 15:38:35 by roo, 06/24/2009 06:09:17 by roo, 06/24/2009 06:54:11 by euw, 06/24/2009 08:19:44 by euw, 07/21/2009 10:09:21 by roo Received report the financials process server was unavailable. Server Engeneering staff responded onsite and the server was reporting a hard disk error and hung.
Created: 06/17/2009 11:57:04 by rak7 Updates: Suspecting supervisor module failure.
Created: 06/08/2009 16:57:59 by jhm Updates: 06/08/2009 18:31:38 by roo, 06/09/2009 08:47:52 by euw [5/22/09 1:30 PM] - A reboot seems to have brought the confused disk back so the LDAP replica is back in operation. We have called the vendor to have a look at the system however and are leaving it out of any access paths for the moment in case the system dies permanently. We will close this problem report when we have a definitive closure to the problem.
Created: 05/22/2009 12:24:57 by dak Updates: 05/22/2009 13:27:28 by dak We will be clearing out some old files to free up disk space.
Created: 11/20/2009 12:34:55 by prj Updates: Several of the Router Interfaces have failed operation,
Created: 11/12/2009 18:18:30 by euw Updates: The file system that holds the mailing list archives is almost full. We will be expanding it on Friday morning. During the maintenance we will need to turn off archiving and access to the archives.
Created: 11/09/2009 14:32:44 by emr Updates: 1. First to install a WS-X6148V-GE-TX module in to an empty or vacant Slot 3.
Created: 11/06/2009 14:44:05 by euw Updates: 11/09/2009 10:18:14 by euw, 11/09/2009 13:45:13 by euw
Problem Report: VPN Unavailable
Problem: VPN Unavailable
Cause: unknown
Affects: any VPN User
Started: 09/25/2009 07:37 AM
Resolved:
Notes:
Problem Report: Wireless Network Outage
Problem: Wireless Network Outage
Cause: Unknown
Affects: Campus Wireless Network
Started: 08/31/2009 09:00 AM
Resolved:
Notes:
Problem Report: Fiji House
Problem: Fiji House
Cause: Power event
Affects: data, telephony
Started: 08/11/2009 03:20 AM
Resolved:
Notes:
Switch up. Switch config is lost. No one in the house yet,
Engineering got the analog phones working. data still down.
Problem Report: Network packet loss
Problem: Network packet loss
Cause: Unknown
Affects: All Internet Traffic
Started: 07/30/2009 09:23 AM
Resolved:
Notes:
Users are experiencing intermittent degradation in internet connection speed.
Problem Report: Case VPN Services - zero network connetivity after VPN session is established.
Problem: Case VPN Services - zero network connetivity after VPN session is established.
Cause: Unknown
Affects: Case VPN Services
Started: 07/20/2009 09:00 AM
Resolved:
Notes:
Workaround:
Disconnect VPN session and reconnect.
Problem Report: Case Voicemail Performance Degraded
Problem: Case Voicemail Performance Degraded
Cause: Unknown
Affects: Degrade Voicemail services
Started: 07/20/2009 09:00 AM
Resolved: 08/03/2009 05:00 PM
Notes:
Workaround:
1) Hang up and retry
2) Retrieve voicemail via your Emails.
Sorry for the inconvenience. Engineer is working on resolving this issue.
Problem Report: Sympa Mailing List Server is down
Problem: Sympa Mailing List Server is down
Cause: Database server is down
Affects: All mailing lists and admin aliases
Started: 07/15/2009 10:50 AM
Resolved: 07/15/2009 03:30 PM
Notes:
The database server group is working on restoring the database server.
Update: The DB was restored and mail flowing again at 2:15pm. All queued messages were delivered by 3:30pm. We are now running normally.
Problem Report: Unix Server DB7 is having problems
Problem: Unix Server DB7 is having problems
Cause: Unknown Hardware Problem
Affects: All Non PeopleSoft Production Oracle Databases
Started: 07/15/2009 10:45 AM
Resolved: 07/15/2009 02:15 PM
Notes:
The server was rebooted and all disk file systems had to be manually mounted. All of the databases were up at 2:15 pm. Diagnostic files have been sent to the vendor.
The server vendor has determined that a catastrophic memory failure in the system's main memory caused the CPU panic and subsequent crash. We will schedule an emergency maintenance outage during the maintenance window as soon as we are in contact with the vendor's server engineer.
Problem Report: KSL Data Center's CRAC DC-5 Humidity reading = 48%, setting = 44%
Problem: KSL Data Center's CRAC DC-5 Humidity reading = 48%, setting = 44%
Cause: unknown
Affects: KSL Data Center
Started: 07/07/2009 05:30 AM
Resolved:
Notes:
apprised of the situation.
Problem Report: Pathology-p3-e1 cooling issue
Problem: Pathology-p3-e1 cooling issue
Cause: HVAC issues
Affects: network equipment
Started: 06/22/2009 02:58 PM
Resolved: 07/21/2009 09:27 AM
Notes:
There is minimal impact at the moment but If prolonged high temperature in the SER, it will affect the network equipment causing potential outage to Wired, wireless, phone and security panels in pathology.
Problem Report: Bingham Hub
Problem: Bingham Hub
Cause: Cooling problem in Hub
Affects: Wired, wireless, phones, security panels for several buildings on south side
Started: 06/20/2009 02:48 AM
Resolved: 07/21/2009 10:09 AM
Notes:
between the Bingham Hub and the KSL Data Center.
between the Bingham Hub and the Crawford Data Center,
which restored all network Connections and Connectivity,
for the South Campus, with regards to the Bingham Hub;
however, the back-up link between the Bingham Hub and
the KSL Data Center, it is still down, and it will need
to be further investigated.
Jun 20 03:33:45 EDT: %C6KENV-SP-4-MINORTEMPALARMRECOVER: module 9 outlet temperature crossed threshold #1(=60C). It has returned to normal operating temperature range.
Services are coming back online.
Still monitoring.
Called Plant services to check cooling.
Suspect failed cooling in SER.
bingham-h0-e1>show mod
Mod MAC addresses Hw Fw Sw Status
--- ---------------------------------- ------ ------------ ------------ -------
1 0009.11f7.e830 to 0009.11f7.e83f 1.0 7.2(1) 8.5(0.46)RFW MinFail
2 000c.ceb5.a900 to 000c.ceb5.a90f 1.0 7.2(1) 8.5(0.46)RFW MinFail
3 000c.ceb5.aa40 to 000c.ceb5.aa4f 1.0 7.2(1) 8.5(0.46)RFW MinFail
4 0003.feac.7772 to 0003.feac.7779 2.0 7.2(1) 3.5(1) Ok
5 000c.ce63.e864 to 000c.ce63.e867 2.1 7.7(1) 12.2(18)SXF1 Ok
9 000d.6550.b866 to 000d.6550.b869 1.1 12.2(14r)S5 12.2(18)SXF1 MinFail
Mod Sub-Module Model Serial Hw Status
---- --------------------------- ------------------ ----------- ------- -------
1 Distributed Forwarding Card WS-F6K-DFC3A SAD072004CL 1.0 MinFail
2 Distributed Forwarding Card WS-F6K-DFC3A SAD072300XR 1.0 MinFail
3 Distributed Forwarding Card WS-F6K-DFC3A SAD072004BU 1.0 MinFail
5 Policy Feature Card 3 WS-F6K-PFC3A SAD072100G1 1.1 Ok
5 MSFC3 Daughterboard WS-SUP720 SAD072100JS 1.2 Ok
9 Distributed Forwarding Card WS-F6700-DFC3A SAD074805CH 1.0 MinFail
bingham-h0-e1>
Problem Report: ERP Financials server outage
Problem: ERP Financials server outage
Cause: Hardware error
Affects: Provided Financial ERP services
Started: 06/15/2009 08:00 AM
Resolved:
Notes:
Power-cycled the server and the services became available after the reboot completed. Services ran in a degraded state as the rebuild of the disk was running.
The rebuild of the hard disk failed, the hard drive was replaced, the rebuild was attempted again and failed.
Continuing to troubleshoot the issue.
Problem Report: Scholars house segemented from CCN
Problem: Scholars house segemented from CCN
Cause: Suspecting hardware failure.
Affects: No users are in this building for the summer.
Started: 06/08/2009 04:39 PM
Resolved:
Notes:
Investigating.
Problem Report: LDAP Replica (ldap-replica7) is down
Problem: LDAP Replica (ldap-replica7) is down
Cause: Seems to be a disk problem
Affects: Only people pointed directly at ldap-replica7.cwru.edu
Started: 05/22/2009 11:35 AM
Resolved:
Notes:
The disk on which the LDAP server binaries are stored appears to no longer be mounted on the system (this is a local disk). Server Engineering is looking into the issue.
The system is one of several redundant LDAP replicas. The only applications affected will be those who are pointed directly at this particular LDAP replica.
Scheduled Maintenance
Scheduled Maintenance: blackboard.case.edu downtime
Problem: blackboard.case.edu downtime
Cause: system maintenance
Affects: all Blackboard users
Started: 11/25/2009 03:00 AM
Resolved: 11/25/2009 04:30 AM
Notes:
Scheduled Maintenance: Cisco Catalyst C-3750 Router, Diner-M1-E1, to be power-cycled.
Problem: Cisco Catalyst C-3750 Router, Diner-M1-E1, to be power-cycled.
Cause: in order to hopefully restore its full operation
Affects: all of the end-users in the diner
Started: 11/13/2009 11:00 AM
Resolved: 11/13/2009 12:00 PM
Notes:
lately; so therefore, by power-cycling the entire router,
the hope is to be able to restore their full operation;
otherwise, the Cisco Catalyst C-3750 Router, Diner-M1-E1,
may then need to be replaced by something else, which is
comparable or better than before now; and the actual brief
network outage may be only for a few minutes now; where-by
wired and wireless, data and voice, they will be affected now.
Scheduled Maintenance: Mail list archives going offline
Problem: Mail list archives going offline
Cause: Disk Expansion
Affects: Mailing list archives
Started: 11/13/2009 05:00 AM
Resolved: 11/13/2009 06:00 AM
Notes:
Mailing list processing should not be affected.
If all goes well and the expansion manages to preserve the original files then we should be back online within the 1 hour maintenance. If, however, the expansion does wipe out the files then we will need to restore the files before we resume archiving. That could take up to 8 hours.
Archive requests will be queued up during the down time and will be applied once the system is back online.
Scheduled Maintenance: Veale-M1-E1; Module 3, WS-X6548-GE-TX; Bus Asic #0 transient Pb error. Recovered. (0x0002, 0x0000): Module needs troubleshooting or TAC
Problem: Veale-M1-E1; Module 3, WS-X6548-GE-TX; Bus Asic #0 transient Pb error. Recovered. (0x0002, 0x0000): Module needs troubleshooting or TAC
Cause: unknown
Affects: up-to-forty-eight direct data local-area-network connections
Started: 11/11/2009 08:00 AM
Resolved: 11/11/2009 09:00 AM
Notes:
2. Second to move all patches to Module 3 from Module 2.
3. The Forty-Eight directly-connected End-Users, they may experience a brief interruption in the service for possibly only a few minutes, while the patches are being moved, this morning.
4. We apologize for any inconvenience this may cause during this brief network outage.
5. Out-of-the-forty-eight direct network connections,
forty-three are of the general use, while five are of the use of the five vending machines.
