CASE.EDU:    HOME | DIRECTORIES | SEARCH

Emergency Maintenance

Emergency Maintenance: Language Learning Center server needs to be rebooted

Problem:   Language Learning Center server needs to be rebooted
Cause:     Unknown
Affects:   users of LLC web services
Started:   11/20/2009 03:00 PM
Resolved:  11/20/2009 06:10 PM

Notes:

Services required for management are slow to respond or non-responsive. No indication that the services provided by the server are affected. Restart scheduled for 6:05pm.


Created: 11/20/2009 17:13:37 by dmm61

Updates:


Emergency Maintenance: Squire Valley Farm - Time Warner Maintenance

Problem:   Squire Valley Farm - Time Warner Maintenance
Cause:     mandatory upgrade
Affects:   Squire Valley Farm
Started:   11/23/2009 12:00 AM
Resolved:  11/23/2009 06:00 AM

Notes:

Time Warner Cable will be performing emergency maintenance. Expected downtime is only 20 minutes - they are reserving the entire window.


Created: 11/20/2009 10:24:38 by jxo63

Updates:


Emergency Maintenance: PeoplsSoft Schedulers SADEV,SADMO,SAIPD and SATST not running on the Windows Server Kessel.

Problem:   PeoplsSoft Schedulers SADEV,SADMO,SAIPD and SATST not running on the Windows Server Kessel.
Cause:     Unknown
Affects:   SADEV,SADMO,SAIPD and SATST windows schedulers
Started:   11/19/2009 02:35 PM
Resolved:  11/19/2009 02:45 PM

Notes:

The PeopleSoft Instances did not start up properly. Server was rebooted to clear up the issue.

Server came back up and services are running properly now.


Created: 11/19/2009 14:45:24 by tpw9

Updates:


Emergency Maintenance: ITS-Services Needs Restarted

Problem:   ITS-Services Needs Restarted
Cause:     Apache
Affects:   ITS-Services
Started:   11/16/2009 09:48 AM
Resolved:  11/16/2009 09:50 AM

Notes:

Apache needs bounced to hopefully clear up some problems.


Created: 11/16/2009 09:49:50 by jms18

Updates:


Emergency Maintenance: Maintenance on KSLP

Problem:   Maintenance on KSLP
Cause:     database contention
Affects:   KSL Production Library will be unavailable for 30 minutes
Started:   11/16/2009 05:00 AM
Resolved:  11/16/2009 05:30 AM

Notes:

There has been contention within the FEDORA application which resides in the KSLP database which is severely impacting performance. Maintenance will be performed Monday 11/16 from 5:00am - 5:30 am to address this issue.

During that time the KSLP database will not be available.


Created: 11/13/2009 15:34:51 by rxg263

Updates:


Emergency Maintenance: Development ERP Windows Server Needs rebooted.

Problem:   Development ERP Windows Server Needs rebooted.
Cause:     Equipment maintenance
Affects:   Unknown
Started:   10/12/2009 12:00 PM
Resolved:  10/12/2009 12:30 PM

Notes:

The development Peoplesoft server Akita was having issues with the Symantec

Read more Emergency Maintenance posts. Subscribe

Problem Report

Problem Report: Backup DHCP server for VoIP offline

Problem:   Backup DHCP server for VoIP offline
Cause:     Disk Failure
Affects:   No one
Started:   11/20/2009 03:48 PM
Resolved:  

Notes:

The Backup Server Roo (VoIP) suffered a boot disk failure.
Server Engineers are aware of the problem and looking at the server.


Created: 11/20/2009 16:09:42 by dnd

Updates: 11/20/2009 16:12:28 by dnd


Problem Report: Non-ERP Databases down

Problem:   Non-ERP Databases down
Cause:     Server crash - Reasin unknown at this time
Affects:   See list below
Started:   11/20/2009 09:44 AM
Resolved:  11/20/2009 12:00 PM

Notes:

[11/20/2009 12:30PM] - Services seem to have been moved back to the other server and most if not all the services have been restored.

The non-ERP Oracle server has crashed (again). Server Engineering is looking into the issue but we currently have no ETA for the server or databases to be up at this time.

Affected applications include (but are not limited to):

Ad Astra
BlackBoard (Course mgmt. system)
Dental School Clinic
Identity Management (user account/password changes)
IP address/host name management
E-mail lists
MyCase portal
Pinnacle (phone billing)
University Library web sites (not including EuclidPLUS/on-line catalog system)
DARS
ProSam (Financial Aid)
Encyclopedia of Cleveland History
Internal IT applications: Appworx, ChangeMan, Mashups, VirtualCenter


Created: 11/20/2009 09:55:02 by dak

Updates: 11/20/2009 12:40:44 by dak


Problem Report: Non-ERP Oracle databased down

Problem:   Non-ERP Oracle databased down
Cause:     unknown (software panic/crash)
Affects:   many applications (see list below)
Started:   11/19/2009 05:03 PM
Resolved:  11/19/2009 07:30 PM

Notes:

UPDATE: all databases restarted by 7:15pm. Key applications (Blackboard, etc.) are being verified and/or re-started.


The non-ERP Oracle server has crashed (again...but not a hardware problem this time). ETA for service to be restored is 7:30pm.

Affected applications include (but are not limited to):

Ad Astra
BlackBoard (Course mgmt. system)
Dental School Clinic
Identity Management (user account/password changes)
IP address/host name management
E-mail lists
MyCase portal
Pinnacle (phone billing)
University Library web sites (not including EuclidPLUS/on-line catalog system)
DARS
ProSam (Financial Aid)
Encyclopedia of Cleveland History
Internal IT applications: Appworx, ChangeMan, Mashups, VirtualCenter


Created: 11/19/2009 18:22:30 by jan3

Updates: 11/19/2009 19:19:09 by jan3


Problem Report: Astra room scheduling application down

Problem:   Astra room scheduling application down
Cause:     system drive full
Affects:   Room scheduling across campus
Started:   11/19/2009 11:15 AM
Resolved:  11/19/2009 12:10 PM

Notes:

Server started crashing as system drive filled after OS patching. Server Engineering caught the problem and cleared space, but system still had to be physically cycled to restore service.

Service restored.


Created: 11/19/2009 15:21:12 by tev1

Updates:


Problem Report: KSL Database was bounced to resolve an issue with Fedora

Problem:   KSL Database was bounced to resolve an issue with Fedora
Cause:     Application contention on a Fedora table
Affects:   All KSL applications using the KSLP database
Started:   11/19/2009 11:58 AM
Resolved:  11/19/2009 11:59 AM

Notes:

Update: The Fedora problem returned again and the Database was bounced from 1:56pm to 1:57 pm. The KSL staff will refrain from making updates to the table which is at the root cause of the problem.

There was locking contention around a table in the Fedora Application. At the request of the KSL application support staff the database KSLP was bounced to resolve the issue.

IMPACT
The problem started affecting Fedora users from 11:32 am to 11:59 am.

All KSL users would have been impacted for 1 minute from 11:58 am to 11:59 am while the database was bounced.


Created: 11/19/2009 12:07:31 by rxg263

Updates: 11/19/2009 14:45:55 by rxg263


Problem Report: DB8 sftp to secure ftp server failures

Problem:   DB8 sftp to secure ftp server failures
Cause:     The wrong ip address was provided for db8
Affects:   sftp from db8
Started:   11/18/2009 08:29 AM
Resolved:  11/18/2009 08:29 AM

Notes:

The firewall required an additional push to added the correct ip address of db8


Created: 11/18/2009 08:31:30 by lxc152

Updates:


Problem Report: Campus Attendant Console is down

Problem:   Campus Attendant Console is down
Cause:     Campus Operator Answering Services is degraded (368-2000)
Affects:   Campus 3682000 answering time.
Started:   11/17/2009 01:00 PM
Resolved:  11/17/2009 04:58 PM

Notes:

Operator error found. Corrected problem.

Engineer is investigating the problem. Attendant Console software appears unstable on operator's PC.

Callers calling 368-2000 main campus number may experience delay.


Created: 11/17/2009 16:00:04 by wxc16

Updates: 11/17/2009 16:50:33 by wxc16


Problem Report: Non-ERP Oracle databases down

Problem:   Non-ERP Oracle databases down
Cause:     system configuration issue
Affects:   (Many systems--see Notes for detailed list)
Started:   11/11/2009 05:00 PM
Resolved:  11/11/2009 08:28 PM

Notes:

UPDATE: databases were restored to service by 8:20pm. Applications are now being re-started (if needed).

After the hardware failures this morning (and in previous weeks), we moved the Oracle databases to a different machine. This machine was not yet fully configured to handle a production workload. We are now correcting the configuration issue that caused this outage; ETA to restore service is around 8:00pm.

Affected applications include (but are not limited to):

Ad Astra
BlackBoard (Course mgmt. system)
Dental School Clinic
Identity Management (user account/password changes)
IP address/host name management
E-mail lists
MyCase portal
Pinnacle (phone billing)
University Library web sites (not including EuclidPLUS/on-line catalog system)
DARS
ProSam (Financial Aid)
Encyclopedia of Cleveland History
Internal IT applications: Appworx, ChangeMan, Mashups, VirtualCenter


Created: 11/11/2009 18:43:05 by jan3

Updates: 11/11/2009 20:28:40 by jan3


Problem Report: DB7 is down

Problem:   DB7 is down
Cause:     Hardware Problems
Affects:   Multiple Aplications
Started:   11/11/2009 02:32 AM
Resolved:  11/11/2009 12:45 PM

Notes:

Resolved: All Databases have been successfully failed over to DB8.


Update at 10:00 am: While attempting to restore all databases on DB7 the server crashed again. Server Engineering is no in the process of moving all of the databases to DB8. There is no ETA.


DB7 crashed at 2:30 am. due to hardware problems. Server Engineering is investigating. ETA is 9:30 am.

This affects the following Databases and applications:
Advp, Apexp, appworxp, astrap, aurorap, blackboard, Degree Audit Reporting System, dashboard, dental school, IP Self Registration for computers, isisp, KSL Oracle Database, Mail Group, New My Case Portal, Oracle Internet Directory, onbasep (Financial Aid Application), oncorep, onregp, personp, Pinnacle Telephone Application, prosamp, prtlp, sccp, Serena Change Man, t2Parking, Unified Messaging, and WebEvent.

Because the Oracle Internet directory is on DB7 any database connections using LDAP may also be affected. These include:
   Password reset/change
   Account activation
   Mail List Manager Sympa
   Any system registration, static IP, hostname and other IP changes
   Case WARN sign-up (actual notifications not affected)
   ITS/Help Desk internal tools (lookup, changin user, etc).


Created: 11/11/2009 07:42:42 by rxg263

Updates: 11/11/2009 09:57:33 by rxg263, 11/11/2009 12:44:40 by rxg263


Problem Report: KSL CRAC DC-3, CRAC DC-4, CRAC DC-5 are not working temperature is at 89 degrees.

Problem:   KSL CRAC DC-3, CRAC DC-4, CRAC DC-5 are not working temperature is at 89 degrees.
Cause:     Main pump is inoperable.
Affects:   All servers in the server room.
Started:   11/11/2009 03:15 AM
Resolved:  11/11/2009 09:52 AM

Notes:

KSL CRAC DC-3, CRAC DC-4, CRAC DC-5 are not working, temperature is at 89 degrees. Facilities are on site since 4:30 a.m. trying to get to the root of the problem. Technician is still here, will bring a portable a/c unit if available into the Data Center to help cool the area. Liebert has been called. The CRAC units are up and running again at 9:45 a.m.,there was a low coolant problem, the main pump is working again.


Created: 11/11/2009 06:41:09 by lab17

Updates: 11/11/2009 06:50:10 by lab17, 11/11/2009 08:28:06 by lab17


Problem Report: Novell server Pulitzer down

Problem:   Novell server Pulitzer down
Cause:     unknown
Affects:   all users of Pulitzer
Started:   11/07/2009 02:00 PM
Resolved:  11/08/2009 01:45 AM

Notes:

hard crash; power-cycled system


Created: 11/08/2009 01:56:32 by jan3

Updates:


Problem Report: backup system down

Problem:   backup system down
Cause:     primary server crashed (root cause unknown)
Affects:   users of the Legato Networker system
Started:   11/06/2009 11:45 PM
Resolved:  11/07/2009 02:05 PM

Notes:

system rebooted.


Created: 11/07/2009 14:17:58 by jan3

Updates:


Problem Report: Database error on LDAP replica (ldap-replica7)

Problem:   Database error on LDAP replica (ldap-replica7)
Cause:     Unknown
Affects:   Direct users of LDAP and downstream applications such as Single Sign On
Started:   11/07/2009 12:10 AM
Resolved:  11/07/2009 10:15 AM

Notes:

The LDAP replica was complaining about problems with its internal user record database and was not serving information to any application querying it. One of the affected applications was the Single Sign On (SSO) service which would hang waiting for information that would never come. Stopping and restarting the replica caused it to rebuild its internal database which appears to have corrected the problem, in addition to clearing up the downstream issues (SSO).

We will continue to monitor the system throughout the weekend to make sure the issue does not reoccur.


Created: 11/07/2009 10:51:58 by dak

Updates:


Problem Report: Google Mail Issue

Problem:   Google Mail Issue
Cause:     Google Minor Service Outage
Affects:   May not have affected anyone on Case campus
Started:   11/01/2009 01:15 PM
Resolved:  11/01/2009 08:30 PM

Notes:

Google experienced a service disruption that affected less that 0.001% of the GMail users. While the disruption was occurring, affected users were unable to access their mail.

While we have received no reports of outages for the Case campus, we are posting this report to notify our clients that a VERY minor disruption was experienced by Google.


Created: 11/02/2009 07:01:21 by dak

Updates:


Problem Report: Campus Unable to Login

Problem:   Campus Unable to Login
Cause:     unknown
Affects:   user unable to access any systems
Started:   11/01/2009 02:11 AM
Resolved:  11/01/2009 02:40 AM

Notes:

[11/01/2009 9:30 AM] - By the time we had a chance to look into the issue it had resolved itself. We are guessing that it had something to do with the time change and computers not quite in sync as far a their clocks are concerned. Verified that the issue was resolved with the Help Desk (they were able to reach the Case-provided tools again).

Helpdesk called to notify me that users across campus receiving "Your session has expired." errors on all systems. Notified Dave Kovacic - currently looking into issue.


Created: 11/01/2009 02:13:11 by jxo63

Updates: 11/01/2009 09:36:21 by dak


Problem Report: Database Server (DB7) is down

Problem:   Database Server (DB7) is down
Cause:     It looks like another bad memory module
Affects:   Several databases and the services that use them, see below
Started:   10/30/2009 11:55 AM
Resolved:  10/30/2009 04:30 PM

Notes:

The databases are back up and most services are back as well.

The system has bene rebooted an is on its way back up however it may take some time to bring all the databases back up

Affected applications include:

Advp, Apexp, appworxp, astrap, aurorap, blackboard, Degree Audit Reporting System, dashboard, dental school, IP Self Registration for computers, isisp, KSL Oracle Database, Mail Group, New My Case Portal, Oracle Internet Directory, onbasep (Financial Aid Application), oncorep, onregp, personp, Pinnacle Telephone Application, prosamp, prtlp, sccp, Serena Change Man, t2Parking, Unified Messaging, and WebEvent.

Because the Oracle Internet directory is on DB7 any database connections using LDAP may also be down.
This includes:
   Password change/reset
   Account activation
   System registration tools
   Mail list manager (Sympa)


Created: 10/30/2009 12:23:53 by dak

Updates: 10/30/2009 16:24:48 by dak


Problem Report: Intermittent access to some services

Problem:   Intermittent access to some services
Cause:     Unknown - possible data center or network problem?
Affects:   Users of services including ERP - all services in Crawford data center seem intermittently affected
Started:   10/23/2009 04:30 PM
Resolved:  10/23/2009 06:10 PM

Notes:

[10/23/09 7:22PM] - The problems we were seeing seemed to clear up around 6:10 PM - Network engineers are continuing to dig into the issue to see if they can pinpoint and prevent reoccurences.

The issues started about 4:30PM as far as we can tell - the outages are so short that we may have missed them earlier in the day. Things seem to become unresponsive for about 30-60 seconds and then come back. Only certain parts of each service seem affected, for instance you cannot connect to calendar.case.edu, but the server is still reachable to log into....

A network engineer is looking into the problem. We will post updates as they become available.


Created: 10/23/2009 18:49:08 by dak

Updates: 10/23/2009 19:21:54 by dak


Problem Report: MyCase Down

Problem:   MyCase Down
Cause:     DB7 Maintenance
Affects:   All users of the MyCase portal
Started:   10/22/2009 03:00 AM
Resolved:  10/22/2009 07:15 AM

Notes:

Following the db7 maintenance the MyCase portal needed a gentle nudge to come back on line.


Created: 10/22/2009 07:22:00 by jms20

Updates:


Problem Report: CDC CRAC 2 ALARM

Problem:   CDC CRAC 2 ALARM
Cause:     unknown
Affects:   CDC
Started:   10/19/2009 10:00 AM
Resolved:  

Notes:

This morning, the Facility Maintenance has been contacted.


Created: 10/19/2009 13:28:02 by euw

Updates:


Problem Report: Degraded Internet connectivity

Problem:   Degraded Internet connectivity 
Cause:     Internet routing problem in the global crossing network
Affects:   all internet services web browsing, email
Started:   10/01/2009 02:24 PM
Resolved:  

Notes:

UPDATE: 10/1 19:16 ISP able to resolve routing issue at their end. Internet Connectivity restored. Engineers will continue to monitor the Case network.

UPDATE: 10/1 18:41 ISP is continuing working on the problem. End-users may have problem reaching the Internet. Offcampus users may also have problem reaching the Case network. No ETA at this point.

The problem is beyond our Internet provider.
The problem appears to be on the global crossing network

OUR ISP has a ticket open with Global crossing on this ticket.

local CASE engineers are monitoring this issue.
degraded access to cnn and msnbc have been noted.

we will advise as more information become available


Created: 10/01/2009 14:30:11 by lxc152

Updates: 10/01/2009 18:45:30 by wxc16, 10/01/2009 18:48:46 by wxc16, 10/01/2009 19:20:35 by wxc16


Problem Report: Degraded Internet Connectivity

Problem:   Degraded Internet Connectivity
Cause:     Unexpected ISP maintenance issue
Affects:   Case Network Connecivity to the Internet
Started:   09/30/2009 04:16 PM
Resolved:  

Notes:

UPDATE: 10/1 19:16 ISP able to resolve routing issue at their end. Internet Connectivity restored. Engineers will continue to monitor the Case network.

10/1 14:25 - Case Network is currently experiencing another intermittent outage to the Internet again due to problem with our ISP. ISP is aware of this issue and working on resolving it. Engineer will continue monitor.

9/30 17:01 - Our ISP informed us that the Internet connectivity issue has been partially resolved. The problem was caused by ISP maintenance which resulted in improper routing of Case network traffic. Engineers will continue to monitor this issue.


9/30 16:37 - The problem appears to be beyond our ISP. Engineers continue to monitor the issue.

Engineers are aware of some intermittent Internet connectivity issue. No problem found with network connection inside campus. End users may experience problem connecting to the Internet. It appears to be problem at the ISP's end. Engineers are contacting ISP support.


Created: 09/30/2009 16:18:55 by wxc16

Updates: 09/30/2009 16:41:29 by wxc16, 09/30/2009 17:05:38 by wxc16, 10/01/2009 14:20:46 by lxc152, 10/01/2009 14:30:16 by wxc16, 10/01/2009 19:20:43 by wxc16


Problem Report: VPN Unavailable

Problem:   VPN Unavailable
Cause:     unknown
Affects:   any VPN User
Started:   09/25/2009 07:37 AM
Resolved:  

Notes:

Engineers are investigating


Created: 09/25/2009 07:38:37 by man27

Updates:


Problem Report: Wireless Network Outage

Problem:   Wireless Network Outage
Cause:     Unknown
Affects:   Campus Wireless Network
Started:   08/31/2009 09:00 AM
Resolved:  

Notes:

Engineer got reports regarding wireless outage throughout campus. Engineer is investigating.


Created: 08/31/2009 13:48:57 by wxc16

Updates:


Problem Report: Fiji House

Problem:   Fiji House
Cause:     Power event
Affects:   data, telephony
Started:   08/11/2009 03:20 AM
Resolved:  

Notes:

VG248 did not come back after recycle of power.
Switch up. Switch config is lost. No one in the house yet,
Engineering got the analog phones working. data still down.


Created: 08/11/2009 10:21:57 by jhm

Updates: 08/11/2009 11:00:54 by jhm


Problem Report: Network packet loss

Problem:   Network packet loss
Cause:     Unknown
Affects:   All Internet Traffic
Started:   07/30/2009 09:23 AM
Resolved:  

Notes:

Engineers are investigating the problem
Users are experiencing intermittent degradation in internet connection speed.


Created: 07/30/2009 09:31:06 by lxc152

Updates: 07/30/2009 09:37:35 by man27


Problem Report: Case VPN Services - zero network connetivity after VPN session is established.

Problem:   Case VPN Services - zero network connetivity after VPN session is established.
Cause:     Unknown
Affects:   Case VPN Services
Started:   07/20/2009 09:00 AM
Resolved:  

Notes:

User may experience zero network connectivity after he or she established a Case VPN session. Suspect VPN server's threat detection erroneously dropped the returning traffic to the user. VPN Server's Threat Detection restarted. Engineer continue to monitor.

Workaround:

Disconnect VPN session and reconnect.


Created: 07/24/2009 18:11:27 by wxc16

Updates:


Problem Report: Case Voicemail Performance Degraded

Problem:   Case Voicemail Performance Degraded
Cause:     Unknown
Affects:   Degrade Voicemail services
Started:   07/20/2009 09:00 AM
Resolved:  08/03/2009 05:00 PM

Notes:

Voicemail system's hardwares have been replaced. Voicemail system's software have been upgraded to version 2.4. Delayed LDAP response no longer seen in the new version of software.

Voicemail System has return back to normal performance. Engineer will continue monitor the system.

User may experiencing delay when trying to retrieve Voicemail messages via the telephone. User may experience up to 30 sec of delay after he or she enter the Passcode before the system responds / plays user's voicemail messages.

Workaround:
1) Hang up and retry
2) Retrieve voicemail via your Emails.

Sorry for the inconvenience. Engineer is working on resolving this issue.


Created: 07/24/2009 17:33:39 by wxc16

Updates: 07/30/2009 17:14:39 by wxc16, 08/07/2009 22:19:50 by wxc16


Problem Report: Sympa Mailing List Server is down

Problem:   Sympa Mailing List Server is down
Cause:     Database server is down
Affects:   All mailing lists and admin aliases
Started:   07/15/2009 10:50 AM
Resolved:  07/15/2009 03:30 PM

Notes:

The mailing list server is down because the database server crashed and Sympa can not run without a database.

The database server group is working on restoring the database server.

Update: The DB was restored and mail flowing again at 2:15pm. All queued messages were delivered by 3:30pm. We are now running normally.


Created: 07/15/2009 12:06:49 by emr

Updates: 07/15/2009 15:47:12 by emr


Problem Report: Unix Server DB7 is having problems

Problem:   Unix Server DB7 is having problems
Cause:     Unknown Hardware Problem
Affects:   All Non PeopleSoft Production Oracle Databases
Started:   07/15/2009 10:45 AM
Resolved:  07/15/2009 02:15 PM

Notes:

The Unix Server DB7 is currently experiencing problems and rebooted at approx. 10:45 am. This currently affects 40 Oracle databases including Blackboard, Advanced Contributions, All APEX Systems, APPWORX, Degree Audit, Dental School, KSL, ISIS, Pinnacle, Portal, MyCaseP, ONBASE, ONCORE, TeamTrack, T2Park, Serena. Server Engineering is currently working on the issue.

The server was rebooted and all disk file systems had to be manually mounted. All of the databases were up at 2:15 pm. Diagnostic files have been sent to the vendor.

The server vendor has determined that a catastrophic memory failure in the system's main memory caused the CPU panic and subsequent crash. We will schedule an emergency maintenance outage during the maintenance window as soon as we are in contact with the vendor's server engineer.
   


Created: 07/15/2009 11:41:13 by rxg263

Updates: 07/15/2009 12:04:11 by rxg263, 07/15/2009 12:10:27 by rxg263, 07/15/2009 15:10:23 by dxw134


Problem Report: KSL Data Center's CRAC DC-5 Humidity reading = 48%, setting = 44%

Problem:   KSL Data Center's CRAC DC-5 Humidity reading = 48%, setting = 44%
Cause:     unknown
Affects:   KSL Data Center
Started:   07/07/2009 05:30 AM
Resolved:  

Notes:

Facility Maintenance has been notified, and
apprised of the situation.


Created: 07/07/2009 11:55:30 by euw

Updates:


Problem Report: Pathology-p3-e1 cooling issue

Problem:   Pathology-p3-e1 cooling issue
Cause:     HVAC issues
Affects:   network equipment
Started:   06/22/2009 02:58 PM
Resolved:  07/21/2009 09:27 AM

Notes:

Resolved.

According to Facility, HVAC to the SER had to be shutdown to fix flooding in the building.
There is minimal impact at the moment but If prolonged high temperature in the SER, it will affect the network equipment causing potential outage to Wired, wireless, phone and security panels in pathology.


Created: 06/23/2009 09:04:48 by roo

Updates: 07/21/2009 09:27:12 by roo


Problem Report: Bingham Hub

Problem:   Bingham Hub
Cause:     Cooling problem in Hub
Affects:   Wired, wireless, phones, security panels for several buildings on south side
Started:   06/20/2009 02:48 AM
Resolved:  07/21/2009 10:09 AM

Notes:

Resolved and closed

2009, June 24, 08:15 AM, we restored the back-up link
between the Bingham Hub and the KSL Data Center.

2009, June 24, 06:30 AM, we restored the main link
between the Bingham Hub and the Crawford Data Center,
which restored all network Connections and Connectivity,
for the South Campus, with regards to the Bingham Hub;
however, the back-up link between the Bingham Hub and
the KSL Data Center, it is still down, and it will need
to be further investigated.

June 24, 04:45 AM Lost several building networks because the AC failed again. Facility on the way.

June 22, 03:00 PM Hub experiencing cooling issues again. Plant services have been called. The SER AC keeps shutting down.

11:52PM: The line cards in the hubs have started experiencing temperature failure again. Called plant services to look into it. looks like the AC keeps tripping off.

03:30 am Update: Plan services is currently working on the SER cooling. The switch line cards are recovering from Temperature failure.
Jun 20 03:33:45 EDT: %C6KENV-SP-4-MINORTEMPALARMRECOVER: module 9 outlet temperature crossed threshold #1(=60C). It has returned to normal operating temperature range.

Services are coming back online.
Still monitoring.

Investigating.
Called Plant services to check cooling.
Suspect failed cooling in SER.
bingham-h0-e1>show mod
Mod MAC addresses Hw Fw Sw Status
--- ---------------------------------- ------ ------------ ------------ -------
   1 0009.11f7.e830 to 0009.11f7.e83f 1.0 7.2(1) 8.5(0.46)RFW MinFail
   2 000c.ceb5.a900 to 000c.ceb5.a90f 1.0 7.2(1) 8.5(0.46)RFW MinFail
   3 000c.ceb5.aa40 to 000c.ceb5.aa4f 1.0 7.2(1) 8.5(0.46)RFW MinFail
   4 0003.feac.7772 to 0003.feac.7779 2.0 7.2(1) 3.5(1) Ok
   5 000c.ce63.e864 to 000c.ce63.e867 2.1 7.7(1) 12.2(18)SXF1 Ok
   9 000d.6550.b866 to 000d.6550.b869 1.1 12.2(14r)S5 12.2(18)SXF1 MinFail

Mod Sub-Module Model Serial Hw Status
---- --------------------------- ------------------ ----------- ------- -------
   1 Distributed Forwarding Card WS-F6K-DFC3A SAD072004CL 1.0 MinFail
   2 Distributed Forwarding Card WS-F6K-DFC3A SAD072300XR 1.0 MinFail
   3 Distributed Forwarding Card WS-F6K-DFC3A SAD072004BU 1.0 MinFail
   5 Policy Feature Card 3 WS-F6K-PFC3A SAD072100G1 1.1 Ok
   5 MSFC3 Daughterboard WS-SUP720 SAD072100JS 1.2 Ok
   9 Distributed Forwarding Card WS-F6700-DFC3A SAD074805CH 1.0 MinFail

bingham-h0-e1>


Created: 06/20/2009 02:52:45 by roo

Updates: 06/20/2009 03:37:16 by roo, 06/20/2009 23:51:59 by roo, 06/22/2009 15:38:35 by roo, 06/24/2009 06:09:17 by roo, 06/24/2009 06:54:11 by euw, 06/24/2009 08:19:44 by euw, 07/21/2009 10:09:21 by roo


Problem Report: ERP Financials server outage

Problem:   ERP Financials server outage 
Cause:     Hardware error
Affects:   Provided Financial ERP services
Started:   06/15/2009 08:00 AM
Resolved:  

Notes:

Received report the financials process server was unavailable. Server Engeneering staff responded onsite and the server was reporting a hard disk error and hung.

Power-cycled the server and the services became available after the reboot completed. Services ran in a degraded state as the rebuild of the disk was running.

The rebuild of the hard disk failed, the hard drive was replaced, the rebuild was attempted again and failed.

Continuing to troubleshoot the issue.


Created: 06/17/2009 11:57:04 by rak7

Updates:


Problem Report: Scholars house segemented from CCN

Problem:   Scholars house segemented from CCN
Cause:     Suspecting hardware failure.  
Affects:   No users are in this building for the summer.  
Started:   06/08/2009 04:39 PM
Resolved:  

Notes:

Suspecting supervisor module failure.

Investigating.


Created: 06/08/2009 16:57:59 by jhm

Updates: 06/08/2009 18:31:38 by roo, 06/09/2009 08:47:52 by euw


Problem Report: LDAP Replica (ldap-replica7) is down

Problem:   LDAP Replica (ldap-replica7) is down
Cause:     Seems to be a disk problem
Affects:   Only people pointed directly at ldap-replica7.cwru.edu
Started:   05/22/2009 11:35 AM
Resolved:  

Notes:

[5/22/09 1:30 PM] - A reboot seems to have brought the confused disk back so the LDAP replica is back in operation. We have called the vendor to have a look at the system however and are leaving it out of any access paths for the moment in case the system dies permanently. We will close this problem report when we have a definitive closure to the problem.

The disk on which the LDAP server binaries are stored appears to no longer be mounted on the system (this is a local disk). Server Engineering is looking into the issue.

The system is one of several redundant LDAP replicas. The only applications affected will be those who are pointed directly at this particular LDAP replica.


Created: 05/22/2009 12:24:57 by dak

Updates: 05/22/2009 13:27:28 by dak


Read more Problem Report posts. Subscribe

Scheduled Maintenance

Scheduled Maintenance: blackboard.case.edu downtime

Problem:   blackboard.case.edu downtime
Cause:     system maintenance
Affects:   all Blackboard users
Started:   11/25/2009 03:00 AM
Resolved:  11/25/2009 04:30 AM

Notes:

We will be clearing out some old files to free up disk space.


Created: 11/20/2009 12:34:55 by prj

Updates:


Scheduled Maintenance: Cisco Catalyst C-3750 Router, Diner-M1-E1, to be power-cycled.

Problem:   Cisco Catalyst C-3750 Router, Diner-M1-E1, to be power-cycled.  
Cause:     in order to hopefully restore its full operation
Affects:   all of the end-users in the diner
Started:   11/13/2009 11:00 AM
Resolved:  11/13/2009 12:00 PM

Notes:

Several of the Router Interfaces have failed operation,
lately; so therefore, by power-cycling the entire router,
the hope is to be able to restore their full operation;
otherwise, the Cisco Catalyst C-3750 Router, Diner-M1-E1,
may then need to be replaced by something else, which is
comparable or better than before now; and the actual brief
network outage may be only for a few minutes now; where-by
wired and wireless, data and voice, they will be affected now.


Created: 11/12/2009 18:18:30 by euw

Updates:


Scheduled Maintenance: Mail list archives going offline

Problem:   Mail list archives going offline
Cause:     Disk Expansion
Affects:   Mailing list archives
Started:   11/13/2009 05:00 AM
Resolved:  11/13/2009 06:00 AM

Notes:

The file system that holds the mailing list archives is almost full. We will be expanding it on Friday morning. During the maintenance we will need to turn off archiving and access to the archives.

Mailing list processing should not be affected.

If all goes well and the expansion manages to preserve the original files then we should be back online within the 1 hour maintenance. If, however, the expansion does wipe out the files then we will need to restore the files before we resume archiving. That could take up to 8 hours.

Archive requests will be queued up during the down time and will be applied once the system is back online.


Created: 11/09/2009 14:32:44 by emr

Updates:


Scheduled Maintenance: Veale-M1-E1; Module 3, WS-X6548-GE-TX; Bus Asic #0 transient Pb error. Recovered. (0x0002, 0x0000): Module needs troubleshooting or TAC

Problem:   Veale-M1-E1; Module 3, WS-X6548-GE-TX; Bus Asic #0 transient Pb error.  Recovered. (0x0002, 0x0000): Module needs troubleshooting or TAC
Cause:     unknown
Affects:   up-to-forty-eight direct data local-area-network connections
Started:   11/11/2009 08:00 AM
Resolved:  11/11/2009 09:00 AM

Notes:

1. First to install a WS-X6148V-GE-TX module in to an empty or vacant Slot 3.
2. Second to move all patches to Module 3 from Module 2.
3. The Forty-Eight directly-connected End-Users, they may experience a brief interruption in the service for possibly only a few minutes, while the patches are being moved, this morning.
4. We apologize for any inconvenience this may cause during this brief network outage.
5. Out-of-the-forty-eight direct network connections,
forty-three are of the general use, while five are of the use of the five vending machines.


Created: 11/06/2009 14:44:05 by euw

Updates: 11/09/2009 10:18:14 by euw, 11/09/2009 13:45:13 by euw


Read more Scheduled Maintenance posts. Subscribe