CASE.EDU:    HOME | DIRECTORIES | SEARCH

Emergency Maintenance

Emergency Maintenance: Shut down of iPlanet Delegated Administrator (ims-web.case.edu)

Problem:   Shut down of iPlanet Delegated Administrator (ims-web.case.edu)
Cause:     Unnecessary now that migration to Google Mail is complete
Affects:   No one
Started:   02/05/2010 05:00 AM
Resolved:  02/05/2010 05:30 AM

Notes:

The iPlanet Delegated Administrator program is used to allow people to change their iPlanet forwarding, vacation messages, and filters. Since all user mail accounts have been migrated to Google Apps which has its own tools for the above functions, we are permanently shutting down the Delegated Administrator portion of the iPlanet system. Access to the mailbox and any mail remaining in your iPlanet mailbox will continue for the short-term future.


Created: 02/04/2010 06:34:52 by dak

Updates:


Emergency Maintenance: PeopleSoft Student and Data Warehouse database server maintenance

Problem:   PeopleSoft Student and Data Warehouse database server maintenance 
Cause:     restoring HA cluster service
Affects:   should have no impact, but a cluster issue may cause a database outage
Started:   02/04/2010 05:00 AM
Resolved:  02/04/2010 06:00 AM

Notes:

A machine that failed last week has been repaired and returned to service. It now needs to be re-joined to a cluster in order to move applications back.

This should not cause any disruption, but there is a risk of a cluster fault that may cause a short database outage.


Created: 02/03/2010 14:57:07 by jan3

Updates:


Emergency Maintenance: Development ERP Windows Server Needs rebooted.

Problem:   Development ERP Windows Server Needs rebooted.
Cause:     Equipment maintenance
Affects:   Unknown
Started:   10/12/2009 12:00 PM
Resolved:  10/12/2009 12:30 PM

Notes:

The development Peoplesoft server Akita was having issues with the Symantec

Read more Emergency Maintenance posts. Subscribe

Problem Report

Problem Report: The Crac-2 alarm is going off in the Crawford Data Center.

Problem:   The Crac-2 alarm is going off in the Crawford Data Center.  
Cause:     unknown
Affects:   the machines in the data center
Started:   02/08/2010 05:00 AM
Resolved:  02/08/2010 06:07 PM

Notes:

Facilities was called.
An HVAC Technician is on-site investigating the problem now.
humid now.


Created: 02/08/2010 06:24:58 by euw

Updates: 02/08/2010 06:26:46 by euw, 02/08/2010 18:07:18 by jhm


Problem Report: Crac-1 and Crac-3 alarm in Crawford Data Center

Problem:   Crac-1 and Crac-3 alarm in Crawford Data Center 
Cause:     Contact closed
Affects:   Machines in Data Center
Started:   02/08/2010 06:02 AM
Resolved:  02/08/2010 06:08 PM

Notes:

Facilities was called. Technician is on-site investigating the problem. There is a problem with Crac-2 also.
fixed by plant.


Created: 02/08/2010 06:04:47 by lab17

Updates: 02/08/2010 06:19:25 by lab17, 02/08/2010 18:08:24 by jhm


Problem Report: PeopleSoft SARPT clone from Production Failed

Problem:   PeopleSoft SARPT clone from Production Failed 
Cause:     Insufficient Disk Space
Affects:   PeopleSoft SIS  Reporting Database
Started:   02/05/2010 07:00 AM
Resolved:  02/05/2010 09:30 AM

Notes:

SARPT is back up. The root cause of the problem was insufficient disk space. We have removed uneeded files, and rerun the clone scrips and everything is back up now. We will be investigating a more permanent solution to ensure this doesn't happen again.

The PeopleSoft SIS Clone from production to SARPT failed this morning due to insufficient disk space. Old files were deleted to reclaim disk space and the clone is being rerun. ETA is 10:00 am.


Created: 02/05/2010 08:45:45 by rxg263

Updates: 02/05/2010 09:31:38 by rxg263


Problem Report: Extremely slow performance on PeopleSoft Student and Warehouse Server

Problem:   Extremely slow performance on PeopleSoft Student and Warehouse Server
Cause:     Extremely high disk activity
Affects:   PeopleSoft SIS and Data Warehouse Production
Started:   02/04/2010 06:41 AM
Resolved:  02/04/2010 07:52 AM

Notes:

There was extremely slow performance on all the databases that reside on DB5. This includes PeopleSoft SIS production and reporting, PeopleSoft Data Warehouse and the Budgeting dadtabases. The problem appeared to be caused by high disk activity and cleared up by 7:52 am. Server Engineering and the Database Team are looking into the root cause.


Created: 02/04/2010 09:02:44 by rxg263

Updates:


Problem Report: PeopleSoft SARPT is unavailable

Problem:   PeopleSoft SARPT is unavailable
Cause:     Unknown
Affects:   PeopleSoft Student Reporting System
Started:   02/04/2010 07:30 AM
Resolved:  02/04/2010 11:30 AM

Notes:

PeopleSoft Reporting Database and Application are back on line.

There was a problem encountered with the PeopleSoft Student clone of production data to the reporting database. The Database Team is currently investigating. There is no ETA as of yet.


Created: 02/04/2010 08:46:59 by rxg263

Updates: 02/04/2010 11:48:36 by rxg263


Problem Report: Nottingham Spirk, formerly known as the First Church of Christ Scientist

Problem:   Nottingham Spirk, formerly known as the First Church of Christ Scientist
Cause:     unknown
Affects:   all of the End-Users, in the Nottingham Spirk
Started:   02/03/2010 12:00 AM
Resolved:  02/03/2010 09:00 AM

Notes:

An electrical power outage occurred last night, sometimes in the middle of the night. The Cleveland Electrical Power Company was able to finally restore their electrical power this morning, sometimes in the middle of the morning. And everything was put back to where they should be then.


Created: 02/03/2010 16:23:56 by euw

Updates:


Problem Report: glennan-p2-e1 system was unreachable

Problem:   glennan-p2-e1 system was unreachable
Cause:     hardware failure
Affects:   Supervisor Module failed
Started:   02/02/2010 12:20 AM
Resolved:  02/02/2010 09:00 AM

Notes:

Unable to access the switch across the network. Engineering determined Supervisor Module unresponsive replaced Supervisor Module, appiled config


Created: 02/02/2010 13:41:09 by dmw132

Updates:


Problem Report: Internet Connectivity to many off-campus sites is very slow

Problem:   Internet Connectivity to many off-campus sites is very slow
Cause:     Unknown - it appears to be some sort of off-campus network issue
Affects:   Off-campus connections (mail, web, etc) to many sites
Started:   02/02/2010 12:53 PM
Resolved:  02/02/2010 01:30 PM

Notes:

Off campus mail connections restored at 4PM. Mail backlogs are now cleared.

This problem was resolved by 13:30. The cause was high CPU on several of the backbone routers.

We are noticing buildups of mail to several off-campus sites, as well as very slow response times on web page loads to off-campus sites like cnn.com.

Network Engineering has been informed of the issue and is looking into the problem.


Created: 02/02/2010 13:17:59 by dak

Updates: 02/02/2010 15:09:31 by cpr, 02/02/2010 16:27:52 by emr


Problem Report: Silver Spartan Diner, Cisco Catalyst Switch 3550

Problem:   Silver Spartan Diner, Cisco Catalyst Switch 3550
Cause:     unknown
Affects:   all of the End-Users, in the Silver Spartan Diner
Started:   02/01/2010 12:00 PM
Resolved:  02/02/2010 11:00 AM

Notes:

This morning, the full functionality of the switch was restored.


Created: 02/02/2010 10:57:00 by euw

Updates:


Problem Report: Glennan Hall, Fourth and Fifth Floors

Problem:   Glennan Hall, Fourth and Fifth Floors
Cause:     unknown
Affects:   all of the End-Users, in the Glennan Hall, on the Fourth and Fifth Floors
Started:   02/01/2010 12:00 PM
Resolved:  02/02/2010 09:00 AM

Notes:

A Cisco Catalyst Switch C6509, Glennan-P2-E1, its Supervisor Module, Sup2, died or failed; so therefore, it was replaced and reprogrammed with another one of the same kind.


Created: 02/02/2010 09:48:54 by euw

Updates:


Problem Report: PBL Classrooms 003 and 202 Cisco Catalyst Switches 3750s

Problem:   PBL Classrooms 003 and 202 Cisco Catalyst Switches 3750s
Cause:     unknown
Affects:   all of the End-Users, in the PBL Classrooms 003 and 202
Started:   02/01/2010 12:00 PM
Resolved:  02/02/2010 08:00 AM

Notes:

This morning, the full functionality of the four switches was restored.


Created: 02/02/2010 08:17:33 by euw

Updates:


Problem Report: Extremely slow response from some Oracle databases

Problem:   Extremely slow response from some Oracle databases
Cause:     runaway process, possibly triggered by earlier network outage
Affects:   Many applications--see Notes below
Started:   02/01/2010 03:15 PM
Resolved:  02/01/2010 04:02 PM

Notes:

The non-ERP production Oracle database server suffered severely degraded performance this afternoon. The root cause has not been determined, but one database instance had spawned an excessive number of processes and had to be stopped & re-started in order to restore the system to normal operation.

Affected applications include (but are not limited to):
Ad Astra
BlackBoard (Course mgmt. system)
Dental School Clinic
Identity Management (user account/password changes)
IP address/host name management
E-mail lists
MyCase portal
Pinnacle (phone billing)
University Library web sites (not including EuclidPLUS/on-line catalog system)
DARS
ProSam (Financial Aid)
Encyclopedia of Cleveland History
Internal IT applications: Appworx, ChangeMan, Mashups, VirtualCenter

The PeopleSoft Student ERP system was also affected indirectly (due to a database link to one of the affectd DBs).


Created: 02/01/2010 16:33:57 by jan3

Updates:


Problem Report: High cpu use on Network Core

Problem:   High cpu use on Network Core
Cause:     Hardware Failure
Affects:   all on and off campus conectivity
Started:   02/01/2010 01:09 PM
Resolved:  02/01/2010 03:05 PM

Notes:

engineers are aware and working on this issue.

16:01 Engineers found a malfunctioning switch in UCRC2. Malfunctioning switch caused high CPU in all distribution routers, which resulted in high network latency. Problem switch has been isolated and removed from the network.


Created: 02/01/2010 13:10:36 by lxc152

Updates: 02/01/2010 16:01:19 by tpr20


Problem Report: Some web applications may not authenticate to CAS correctly

Problem:   Some web applications may not authenticate to CAS correctly
Cause:     Updated SSL certificate in one of the CAS servers
Affects:   Access to some CAS-protected web applications
Started:   01/28/2010 12:00 PM
Resolved:  01/28/2010 04:04 PM

Notes:

The SSL certificate for one of the Single Sign-On (CAS) servers was renewed today to a new version signed with a newer Root Certificate Authority certificate. Because of this, CAS clients connecting to that server will likely experience a certificate handshake/validation error and intermittent errors in authentication.

Web servers running a CAS client should update their CAS CA certificate file from the old version (which contains one CA certificate) to the new one (which contains three certificates) to correct this.

The new "certificate bundle" can be found at this URL:
http://wiki.case.edu/Central_Authentication_Service


Created: 01/28/2010 16:04:46 by sdh7

Updates:


Problem Report: PeopleSoft Student system, Data Warehouse down

Problem:   PeopleSoft Student system, Data Warehouse down
Cause:     database server down
Affects:   all users of the Student system and Data Warehouse
Started:   01/25/2010 03:28 PM
Resolved:  01/25/2010 04:30 PM

Notes:

[resolved time anticipated--databases are backk up, application/web servers being restarted as of 4:10pm]


Created: 01/25/2010 16:10:43 by jan3

Updates:


Problem Report: server PULITZER down

Problem:   server PULITZER down
Cause:     crashed
Affects:   all users of PULITZER
Started:   01/22/2010 03:50 PM
Resolved:  01/22/2010 06:00 PM

Notes:

rebooted system


Created: 01/22/2010 19:08:36 by jan3

Updates:


Problem Report: PeopleSoft SIS is unresponsive

Problem:   PeopleSoft SIS is unresponsive
Cause:     Unknown
Affects:   PeopleSoft SIS 
Started:   01/22/2010 12:45 PM
Resolved:  01/22/2010 03:30 PM

Notes:

Update SAPRD had to be failed over to its failover server. As of 3:30 PM the PeopleSoft Student was up and running.


The PeopleSoft SIS was applicationhas been unresponsive since 12:45 pm. The server is in the process of being rebooted to clear up these issues. The ETA for getting things back is 3:00 pm.


Created: 01/22/2010 14:12:27 by rxg263

Updates: 01/22/2010 15:24:33 by rxg263


Problem Report: Bingham-H0-E1's PD-002 failed.

Problem:   Bingham-H0-E1's PD-002 failed.  
Cause:     unknown
Affects:   no one
Started:   01/20/2010 06:38 PM
Resolved:  01/21/2010 01:38 PM

Notes:

A technician is on-site working on the problem now.

In the Bingham Hub, there are located two Uninterruptible Power Systems, or UPSs, and in each UPS, there is
situated a Power Distribution Unit, or PDU, and because of
an electrical power failure last night, the first
UPS's PDU lost its electrical power last night.

So therefore, today upon discovery of the main reason for
the one failure, the 208-Volts A/C was restored; however,
this UPS's PDU was not recover-able, or in other words,
it was no good anymore; so therefore, we replaced it with
one of our spare PDUs today.


Created: 01/21/2010 07:34:58 by euw

Updates: 01/21/2010 13:28:25 by euw, 01/21/2010 18:41:49 by euw


Problem Report: Bingham-H0-U1's P/S1 failed.

Problem:   Bingham-H0-U1's P/S1 failed.  
Cause:     unknown
Affects:   no one
Started:   01/20/2010 06:38 PM
Resolved:  01/21/2010 01:38 PM

Notes:

A technician is on-site working on the problem now.

In the Bingham Hub, there are located two Distribution Routers, and in each distribution router, there are
situated two 2500-Watts Power-Supplies, and because of
an electrical power failure last night, the first
distribution router's first power-supply and
the second distribution router's second power-supply,
they both lost their electrical power last night.

So therefore, today upon discovery of the main reason for
the two failures, the 208-Volts A/C was restored; however,
one of two non-working power-supplies was not recover-able,
or in other words, it was no good anymore; so therefore,
we replaced it with one of our spare power-supplies today.


Created: 01/21/2010 07:33:11 by euw

Updates: 01/21/2010 13:26:00 by euw, 01/21/2010 18:29:31 by euw


Problem Report: ucrc2-m1-e1 is down

Problem:   ucrc2-m1-e1 is down
Cause:     unknown
Affects:   unknown
Started:   01/20/2010 06:38 PM
Resolved:  01/21/2010 07:18 AM

Notes:

Technician is on-site working on the problem.
Problem started at 06:38 P.M..
Problem solved at 07:18 A.M..

1. In the Bingham Hub,
the first one of two Cisco Catalyst C6509
Distribution-Level Routers, Bingham-H0-E1,
its first one of two 2500-Watts Power-Supplies Failed.
2. In the Bingham Hub,
the first one of two Emerson/Liebert UPSs,
Bingham-H0-U1, its Power Distribution Unit, PDU Failed.
3. In the Bingham Hub, a power-strip, which runs off of
this particular PDU, it lost its electrical power,
at around six-thirty, this Wednesday Evening, or
very early last night.
4. In the Bingham Hub, an Allied-Telesyn network switch,
which runs off of this particular power-strip,
it lost its electrical power, at about the same time.
5. This particular network switch, it also does
the FO/UTP media conversion for the two up-links,
for the Cisco Catalyst Switch, UCRC2-M1-E1, which
provides the Local Area Network, for the UCRC2 Building.
6. So therefore, the UCRC2 Building, it lost its two
network connections to the Local Area Network, for
the CWRU, at around the same time.
7. However, we did not discover the root cause or
the source of the problem, until early this morning.
8. So therefore, first
we moved this particular power-strip,
to the Bingham-H0-U2's PDU,
from the Bingham-H0-U1's PDU.
9. And then the network connectivity was fully-restored,
for the UCRC2 Building, for the CWRU's local area network.
10. None of this work affected their network connectivity,
for the UH's local area network, in the UCRC2 Building.

11. An electrical circuit breaker, 208-Volts AC, was
tripped, early last night.
12. An electrical technician, he reset this electrical
circuit breaker this morning.
13. We replaced the failed power-supply with a spare.
14. We replaced the failed PDU also with a spare.
15. And thus, by these ways, we brought the UPS back to
life, this morning, and everything is back to where they
should be now.


Created: 01/21/2010 06:08:07 by lab17

Updates: 01/21/2010 07:28:58 by euw, 01/21/2010 14:14:18 by euw


Problem Report: Case VPN Service is Down

Problem:   Case VPN Service is Down
Cause:     Cannot authenticate User
Affects:   Case VPN Services
Started:   01/20/2010 02:45 PM
Resolved:  01/20/2010 03:08 PM

Notes:

Update: Engineer re-established communication between ASA Server and LDAP server for authentication. VPN services restored.

VPN Servers are having issue authenticating user against LDAP. Engineer is investigating the issue..


Created: 01/20/2010 14:46:12 by wxc16

Updates: 01/20/2010 15:23:36 by wxc16, 01/20/2010 15:25:18 by wxc16


Problem Report: Mail List manager is down

Problem:   Mail List manager is down
Cause:     Bulk mailing to several thousand people choked system and ran it out of disk
Affects:   Users of the mail list manager (Sympa)
Started:   01/19/2010 05:00 PM
Resolved:  01/19/2010 06:15 PM

Notes:

[01/19/2010 6:15PM] - We've managed to get the logjam of messages cleared and have brought the mail list manager back up.


We are working to unjam the system and get Sympa running again. We will post an update as soon as possible.


Created: 01/19/2010 17:45:18 by dak

Updates: 01/19/2010 18:15:39 by dak


Problem Report: blackboard.case.edu outage

Problem:   blackboard.case.edu outage
Cause:     database error
Affects:   all Blackboard users
Started:   01/18/2010 12:57 AM
Resolved:  01/18/2010 12:37 PM

Notes:

One of the database tablespaces had filled up. It has now been enlarged and Blackboard is working again.


Created: 01/18/2010 13:19:40 by prj

Updates:


Problem Report: Oracle Databases on DB7 are down

Problem:   Oracle Databases on DB7 are down
Cause:     DB7 server problems after planned maintenance
Affects:   Non PeopleSoft Oracle Production Databases 
Started:   01/18/2010 09:00 AM
Resolved:  01/18/2010 11:30 AM

Notes:

All Databases were up at 10:48 am with the exception of kslp which was up by 11:30 am.


After the planned maintenance to replace a DIMM memory module on DB7 this morning, the server would not boot up and server-engineering is failing over all of the databases on DB7 to DB8. This affects all Non PeopleSoft Oracle Production Databases.

The ETA for coming back on line is 11:30 am.


Created: 01/18/2010 10:19:50 by rxg263

Updates: 01/18/2010 11:30:31 by rxg263


Problem Report: One LDAP replica (ldap-replica8) had a database get corrupted

Problem:   One LDAP replica (ldap-replica8) had a database get corrupted
Cause:     Unknown why the database got corrupted
Affects:   Single Sign ON users and some other systems
Started:   01/15/2010 05:30 PM
Resolved:  01/15/2010 06:05 PM

Notes:

The internal database that stores user records became corrupted on one of the LDAP replicas causing every login attempt to that replica to fail. While the Single Sign On (SSO) system itself was still running, many people failed to log in due to the failed LDAP replica.

Restarting the failed system and letting it recover its database resolved the issue.


Created: 01/15/2010 18:16:43 by dak

Updates:


Problem Report: Spam filter machine (mpspam3) for local mailboxes has died

Problem:   Spam filter machine (mpspam3) for local mailboxes has died
Cause:     We suspect a failed disk
Affects:   About 1/4 of those people still not migrated to Google Apps
Started:   01/13/2010 07:15 AM
Resolved:  01/13/2010 10:00 AM

Notes:

[01/13/2010 10:30AM] - We have done the necessary work to remove the failed hardware from the mail delivery path. At this point mail that was queued on other systems waiting for the failed hardware has been delivered.

Mail is currently being delivered into the local (iPlanet) mail system without the benefit of any spam filtering other than the most basic for those people on the failed server (about 1700 people), who had spam filtering turned on at spamcontrol.case.edu. Connection to spamcontrol.case.edu is no longer possible for those individuals as well. Individuals who did not have spam filtering turned on will not notice any difference in the amount of spam received.

The failed equipment is no longer supported by the vendor so it is unlikely that we will be able to replace the failed hardware. For that reason, people are encouraged to migrate fully to their Google Apps mailbox immediately. Note that mail migration is required for everyone before February 2010 in any event.

One of the four machines that filter spam for the local (iPlanet) mailboxes has died. This machine only serves 1/4 of the people who have not yet migrated to Google Apps (about 1700 people).

We suspect a failed disk as the culprit. A mail administrator is on the way to see if the machine can be revived now.

While the system is down, no mail will be lost as it will just queue on other systems.


Created: 01/13/2010 07:29:18 by dak

Updates: 01/13/2010 10:34:39 by dak


Problem Report: Case privat network connectivity to Metrohealth medical center

Problem:   Case privat network connectivity to Metrohealth medical center 
Cause:     Unknown 
Affects:   Case private network connectivity to mhmc rammelkemp center 
Started:   01/10/2010 01:02 PM
Resolved:  01/10/2010 03:07 PM

Notes:

Engineer received notification that case private link to mhmc rammelkemp center is lost. Engineer is investigating the issue. No ETA at this point.

13:30 Engineer able to restore connectivity to MHMC. Connection restored.


Created: 01/10/2010 13:04:59 by wxc16

Updates: 01/10/2010 15:07:46 by wxc16


Problem Report: Millie science center network is down.

Problem:   Millie science center network is down.
Cause:     Unknown
Affects:   Network connectivity in millis sci center. Data and voice. 
Started:   01/10/2010 12:59 AM
Resolved:  

Notes:

Engineer is investigating the issue. No ETA at this point

Engineer discovered equipment room's Network Switches hung. Reboot switches and restored network connectivity.


Created: 01/10/2010 13:01:35 by wxc16

Updates: 01/10/2010 15:07:50 by wxc16


Problem Report: In the Wickenden Hall, SER 1, both Cisco Switches lost power, at about ten o'clock this morning.

Problem:   In the Wickenden Hall, SER 1, both Cisco Switches lost power, at about ten o'clock this morning.  
Cause:     Both Emerson Liebert UPSs Normal Mode Failed, at about ten o'clock this morning.  
Affects:   Data and Voice, Wired and Wireless
Started:   12/16/2009 10:00 AM
Resolved:  

Notes:

At about eleven o'clock this morning,
both Cisco Switches are running off of
both Emerson Liebert UPSs, which are now running,
in the Bypass Mode, and both Internal Batteries are now
not actively part of the present electrical power circuits.


Created: 12/16/2009 11:06:18 by euw

Updates:


Problem Report: WRB Hall, Room 1-336, SER 3, UPS 1, WRB-P3-U1, its Web-Page hasn't been loading up now.

Problem:   WRB Hall, Room 1-336, SER 3, UPS 1, WRB-P3-U1, its Web-Page hasn't been loading up now.  
Cause:     Its Web-Card may need to be reprogrammed and reseated now.  
Affects:   You aren't be able to access its SNMP card remotely now.  
Started:   12/15/2009 10:33 AM
Resolved:  

Notes:

WRB Hall, Room 1-336, SER 3, UPS 1, WRB-P3-U1,
its Web-Page hasn't been loading up now.
Its Web-Card may need to be reprogrammed and reseated now.
You aren't be able to access its SNMP card remotely now.

Model No. GXT5000R-208.
Serial No. 030-700-300-6BW-502.


Created: 12/15/2009 13:53:56 by euw

Updates:


Problem Report: WRB Hall, Room 2-406, SER 5, UPS 1, WRB-P5-U1, its Web-Page hasn't been loading up now.

Problem:   WRB Hall, Room 2-406, SER 5, UPS 1, WRB-P5-U1, its Web-Page hasn't been loading up now.  
Cause:     Its Web-Card may need to be reprogrammed and reseated now.  
Affects:   You aren't be able to access its SNMP card remotely now.  
Started:   12/15/2009 10:33 AM
Resolved:  

Notes:

WRB Hall, Room 2-406, SER 5, UPS 1, WRB-P5-U1,
its Web-Page hasn't been loading up now.
Its Web-Card may need to be reprogrammed and reseated now.
You aren't be able to access its SNMP card remotely now.

Model No. GXT5000R-208.
Serial No. 030-700-302-0BW-502.


Created: 12/15/2009 13:43:16 by euw

Updates:


Problem Report: WRB Hall, Room 3-411, SER 6, UPS 1, WRB-P6-U1, its Web-Page hasn't been loading up now.

Problem:   WRB Hall, Room 3-411, SER 6, UPS 1, WRB-P6-U1, its Web-Page hasn't been loading up now.  
Cause:     Its Web-Card may need to be reprogrammed and reseated now.  
Affects:   You aren't be able to access its SNMP card remotely now.  
Started:   12/15/2009 10:33 AM
Resolved:  

Notes:

WRB Hall, Room 3-411, SER 6, UPS 1, WRB-P6-U1,
its Web-Page hasn't been loading up now.
Its Web-Card may need to be reprogrammed and reseated now.
You aren't be able to access its SNMP card remotely now.

Model No. GXT5000R-208.
Serial No. 030-700-302-2BW-502.


Created: 12/15/2009 13:35:47 by euw

Updates:


Problem Report: WRB Hall, Room 3-406, SER 7, UPS 1, WRB-P7-U1, its Web-Page hasn't been loading up now.

Problem:   WRB Hall, Room 3-406, SER 7, UPS 1, WRB-P7-U1, its Web-Page hasn't been loading up now.  
Cause:     Its Web-Card may need to be reprogrammed and reseated now.  
Affects:   You aren't be able to access its SNMP card remotely now.  
Started:   12/15/2009 10:33 AM
Resolved:  

Notes:

WRB Hall, Room 3-406, SER 7, UPS 1, WRB-P7-U1,
its Web-Page hasn't been loading up now.
Its Web-Card may need to be reprogrammed and reseated now.
You aren't be able to access its SNMP card remotely now.

Model No. GXT5000R-208.
Serial No. 030-700-301-9BW-502.


Created: 12/15/2009 13:28:06 by euw

Updates:


Problem Report: WRB Hall, Room 6-411, SER 12, UPS 1, WRB-P12-U1, its Web-Page hasn't been loading up now.

Problem:   WRB Hall, Room 6-411, SER 12, UPS 1, WRB-P12-U1, its Web-Page hasn't been loading up now.  
Cause:     Its Web-Card may need to be reprogrammed and reseated now.  
Affects:   You aren't be able to access its SNMP card remotely now.  
Started:   12/15/2009 10:33 AM
Resolved:  

Notes:

WRB Hall, Room 6-411, SER 12, UPS 1, WRB-P12-U1,
its Web-Page hasn't been loading up now.
Its Web-Card may need to be reprogrammed and reseated now.
You aren't be able to access its SNMP card remotely now.

Model No. GXT2-6000RT208.
Serial No. 082-90R-004-1BW-571.


Created: 12/15/2009 13:08:17 by euw

Updates:


Problem Report: WRB Hall, Room 6-406, SER 13, UPS 1, WRB-P13-U1, its Web-Page hasn't been loading up now.

Problem:   WRB Hall, Room 6-406, SER 13, UPS 1, WRB-P13-U1, its Web-Page hasn't been loading up now.  
Cause:     Its Web-Card may need to be reprogrammed and reseated now.  
Affects:   You aren't be able to access its SNMP card remotely now.  
Started:   12/15/2009 10:33 AM
Resolved:  

Notes:

WRB Hall, Room 6-406, SER 13, UPS 1, WRB-P13-U1,
its Web-Page hasn't been loading up now.
Its Web-Card may need to be reprogrammed and reseated now.
You aren't be able to access its SNMP card remotely now.

Model No. GXT2-6000RT208.
Serial No. 051-310-007-6BW-572.


Created: 12/15/2009 12:34:04 by euw

Updates:


Problem Report: Reported BOTNET infected system on the wireless network

Problem:   Reported BOTNET infected system on the wireless network
Cause:     Infect computer system on the wireless system
Affects:   All wireless users that DO NOT VPN into the university from the wireless network
Started:   12/14/2009 10:21 AM
Resolved:  

Notes:



Greetings,

The host(s) listed at the bottom of this message have been identified as likely bot infected. The specific type of bot infection may or may not be known.

If a source port is identified below, this is the source port used by the infected machine to contact a miscreant server.

Please examine this machine for signs of break-in. Should you feel you've received this report in error, please let us know.
Wireless users should VPN back into the university if they find websites that are not responding or responding slowly


Engineers are investigating now

we currently do not have an ETA for repair
All times are -0000 (UTC)

IP Address Timestamp
----------------------------------------
192.5.109.49 2009-12-13.02:35:28-0000 SrcPort:TCP/61700 MalwareType:Torpig


Created: 12/14/2009 10:26:12 by lxc152

Updates:


Problem Report: Backup DHCP server for VoIP offline

Problem:   Backup DHCP server for VoIP offline
Cause:     Disk Failure
Affects:   No one
Started:   11/20/2009 03:48 PM
Resolved:  

Notes:

The Backup Server Roo (VoIP) suffered a boot disk failure.
Server Engineers are aware of the problem and looking at the server.


Created: 11/20/2009 16:09:42 by dnd

Updates: 11/20/2009 16:12:28 by dnd


Problem Report: CDC CRAC 2 ALARM

Problem:   CDC CRAC 2 ALARM
Cause:     unknown
Affects:   CDC
Started:   10/19/2009 10:00 AM
Resolved:  

Notes:

This morning, the Facility Maintenance has been contacted.


Created: 10/19/2009 13:28:02 by euw

Updates:


Problem Report: Degraded Internet connectivity

Problem:   Degraded Internet connectivity 
Cause:     Internet routing problem in the global crossing network
Affects:   all internet services web browsing, email
Started:   10/01/2009 02:24 PM
Resolved:  

Notes:

UPDATE: 10/1 19:16 ISP able to resolve routing issue at their end. Internet Connectivity restored. Engineers will continue to monitor the Case network.

UPDATE: 10/1 18:41 ISP is continuing working on the problem. End-users may have problem reaching the Internet. Offcampus users may also have problem reaching the Case network. No ETA at this point.

The problem is beyond our Internet provider.
The problem appears to be on the global crossing network

OUR ISP has a ticket open with Global crossing on this ticket.

local CASE engineers are monitoring this issue.
degraded access to cnn and msnbc have been noted.

we will advise as more information become available


Created: 10/01/2009 14:30:11 by lxc152

Updates: 10/01/2009 18:45:30 by wxc16, 10/01/2009 18:48:46 by wxc16, 10/01/2009 19:20:35 by wxc16


Problem Report: Degraded Internet Connectivity

Problem:   Degraded Internet Connectivity
Cause:     Unexpected ISP maintenance issue
Affects:   Case Network Connecivity to the Internet
Started:   09/30/2009 04:16 PM
Resolved:  

Notes:

UPDATE: 10/1 19:16 ISP able to resolve routing issue at their end. Internet Connectivity restored. Engineers will continue to monitor the Case network.

10/1 14:25 - Case Network is currently experiencing another intermittent outage to the Internet again due to problem with our ISP. ISP is aware of this issue and working on resolving it. Engineer will continue monitor.

9/30 17:01 - Our ISP informed us that the Internet connectivity issue has been partially resolved. The problem was caused by ISP maintenance which resulted in improper routing of Case network traffic. Engineers will continue to monitor this issue.


9/30 16:37 - The problem appears to be beyond our ISP. Engineers continue to monitor the issue.

Engineers are aware of some intermittent Internet connectivity issue. No problem found with network connection inside campus. End users may experience problem connecting to the Internet. It appears to be problem at the ISP's end. Engineers are contacting ISP support.


Created: 09/30/2009 16:18:55 by wxc16

Updates: 09/30/2009 16:41:29 by wxc16, 09/30/2009 17:05:38 by wxc16, 10/01/2009 14:20:46 by lxc152, 10/01/2009 14:30:16 by wxc16, 10/01/2009 19:20:43 by wxc16


Problem Report: VPN Unavailable

Problem:   VPN Unavailable
Cause:     unknown
Affects:   any VPN User
Started:   09/25/2009 07:37 AM
Resolved:  

Notes:

Engineers are investigating


Created: 09/25/2009 07:38:37 by man27

Updates:


Problem Report: Wireless Network Outage

Problem:   Wireless Network Outage
Cause:     Unknown
Affects:   Campus Wireless Network
Started:   08/31/2009 09:00 AM
Resolved:  

Notes:

Engineer got reports regarding wireless outage throughout campus. Engineer is investigating.


Created: 08/31/2009 13:48:57 by wxc16

Updates:


Problem Report: Fiji House

Problem:   Fiji House
Cause:     Power event
Affects:   data, telephony
Started:   08/11/2009 03:20 AM
Resolved:  

Notes:

VG248 did not come back after recycle of power.
Switch up. Switch config is lost. No one in the house yet,
Engineering got the analog phones working. data still down.


Created: 08/11/2009 10:21:57 by jhm

Updates: 08/11/2009 11:00:54 by jhm


Problem Report: Network packet loss

Problem:   Network packet loss
Cause:     Unknown
Affects:   All Internet Traffic
Started:   07/30/2009 09:23 AM
Resolved:  

Notes:

Engineers are investigating the problem
Users are experiencing intermittent degradation in internet connection speed.


Created: 07/30/2009 09:31:06 by lxc152

Updates: 07/30/2009 09:37:35 by man27


Problem Report: Case VPN Services - zero network connetivity after VPN session is established.

Problem:   Case VPN Services - zero network connetivity after VPN session is established.
Cause:     Unknown
Affects:   Case VPN Services
Started:   07/20/2009 09:00 AM
Resolved:  

Notes:

User may experience zero network connectivity after he or she established a Case VPN session. Suspect VPN server's threat detection erroneously dropped the returning traffic to the user. VPN Server's Threat Detection restarted. Engineer continue to monitor.

Workaround:

Disconnect VPN session and reconnect.


Created: 07/24/2009 18:11:27 by wxc16

Updates:


Problem Report: Case Voicemail Performance Degraded

Problem:   Case Voicemail Performance Degraded
Cause:     Unknown
Affects:   Degrade Voicemail services
Started:   07/20/2009 09:00 AM
Resolved:  08/03/2009 05:00 PM

Notes:

Voicemail system's hardwares have been replaced. Voicemail system's software have been upgraded to version 2.4. Delayed LDAP response no longer seen in the new version of software.

Voicemail System has return back to normal performance. Engineer will continue monitor the system.

User may experiencing delay when trying to retrieve Voicemail messages via the telephone. User may experience up to 30 sec of delay after he or she enter the Passcode before the system responds / plays user's voicemail messages.

Workaround:
1) Hang up and retry
2) Retrieve voicemail via your Emails.

Sorry for the inconvenience. Engineer is working on resolving this issue.


Created: 07/24/2009 17:33:39 by wxc16

Updates: 07/30/2009 17:14:39 by wxc16, 08/07/2009 22:19:50 by wxc16


Problem Report: Sympa Mailing List Server is down

Problem:   Sympa Mailing List Server is down
Cause:     Database server is down
Affects:   All mailing lists and admin aliases
Started:   07/15/2009 10:50 AM
Resolved:  07/15/2009 03:30 PM

Notes:

The mailing list server is down because the database server crashed and Sympa can not run without a database.

The database server group is working on restoring the database server.

Update: The DB was restored and mail flowing again at 2:15pm. All queued messages were delivered by 3:30pm. We are now running normally.


Created: 07/15/2009 12:06:49 by emr

Updates: 07/15/2009 15:47:12 by emr


Problem Report: Unix Server DB7 is having problems

Problem:   Unix Server DB7 is having problems
Cause:     Unknown Hardware Problem
Affects:   All Non PeopleSoft Production Oracle Databases
Started:   07/15/2009 10:45 AM
Resolved:  07/15/2009 02:15 PM

Notes:

The Unix Server DB7 is currently experiencing problems and rebooted at approx. 10:45 am. This currently affects 40 Oracle databases including Blackboard, Advanced Contributions, All APEX Systems, APPWORX, Degree Audit, Dental School, KSL, ISIS, Pinnacle, Portal, MyCaseP, ONBASE, ONCORE, TeamTrack, T2Park, Serena. Server Engineering is currently working on the issue.

The server was rebooted and all disk file systems had to be manually mounted. All of the databases were up at 2:15 pm. Diagnostic files have been sent to the vendor.

The server vendor has determined that a catastrophic memory failure in the system's main memory caused the CPU panic and subsequent crash. We will schedule an emergency maintenance outage during the maintenance window as soon as we are in contact with the vendor's server engineer.
   


Created: 07/15/2009 11:41:13 by rxg263

Updates: 07/15/2009 12:04:11 by rxg263, 07/15/2009 12:10:27 by rxg263, 07/15/2009 15:10:23 by dxw134


Problem Report: KSL Data Center's CRAC DC-5 Humidity reading = 48%, setting = 44%

Problem:   KSL Data Center's CRAC DC-5 Humidity reading = 48%, setting = 44%
Cause:     unknown
Affects:   KSL Data Center
Started:   07/07/2009 05:30 AM
Resolved:  

Notes:

Facility Maintenance has been notified, and
apprised of the situation.


Created: 07/07/2009 11:55:30 by euw

Updates:


Problem Report: Pathology-p3-e1 cooling issue

Problem:   Pathology-p3-e1 cooling issue
Cause:     HVAC issues
Affects:   network equipment
Started:   06/22/2009 02:58 PM
Resolved:  07/21/2009 09:27 AM

Notes:

Resolved.

According to Facility, HVAC to the SER had to be shutdown to fix flooding in the building.
There is minimal impact at the moment but If prolonged high temperature in the SER, it will affect the network equipment causing potential outage to Wired, wireless, phone and security panels in pathology.


Created: 06/23/2009 09:04:48 by roo

Updates: 07/21/2009 09:27:12 by roo


Problem Report: Bingham Hub

Problem:   Bingham Hub
Cause:     Cooling problem in Hub
Affects:   Wired, wireless, phones, security panels for several buildings on south side
Started:   06/20/2009 02:48 AM
Resolved:  07/21/2009 10:09 AM

Notes:

Resolved and closed

2009, June 24, 08:15 AM, we restored the back-up link
between the Bingham Hub and the KSL Data Center.

2009, June 24, 06:30 AM, we restored the main link
between the Bingham Hub and the Crawford Data Center,
which restored all network Connections and Connectivity,
for the South Campus, with regards to the Bingham Hub;
however, the back-up link between the Bingham Hub and
the KSL Data Center, it is still down, and it will need
to be further investigated.

June 24, 04:45 AM Lost several building networks because the AC failed again. Facility on the way.

June 22, 03:00 PM Hub experiencing cooling issues again. Plant services have been called. The SER AC keeps shutting down.

11:52PM: The line cards in the hubs have started experiencing temperature failure again. Called plant services to look into it. looks like the AC keeps tripping off.

03:30 am Update: Plan services is currently working on the SER cooling. The switch line cards are recovering from Temperature failure.
Jun 20 03:33:45 EDT: %C6KENV-SP-4-MINORTEMPALARMRECOVER: module 9 outlet temperature crossed threshold #1(=60C). It has returned to normal operating temperature range.

Services are coming back online.
Still monitoring.

Investigating.
Called Plant services to check cooling.
Suspect failed cooling in SER.
bingham-h0-e1>show mod
Mod MAC addresses Hw Fw Sw Status
--- ---------------------------------- ------ ------------ ------------ -------
   1 0009.11f7.e830 to 0009.11f7.e83f 1.0 7.2(1) 8.5(0.46)RFW MinFail
   2 000c.ceb5.a900 to 000c.ceb5.a90f 1.0 7.2(1) 8.5(0.46)RFW MinFail
   3 000c.ceb5.aa40 to 000c.ceb5.aa4f 1.0 7.2(1) 8.5(0.46)RFW MinFail
   4 0003.feac.7772 to 0003.feac.7779 2.0 7.2(1) 3.5(1) Ok
   5 000c.ce63.e864 to 000c.ce63.e867 2.1 7.7(1) 12.2(18)SXF1 Ok
   9 000d.6550.b866 to 000d.6550.b869 1.1 12.2(14r)S5 12.2(18)SXF1 MinFail

Mod Sub-Module Model Serial Hw Status
---- --------------------------- ------------------ ----------- ------- -------
   1 Distributed Forwarding Card WS-F6K-DFC3A SAD072004CL 1.0 MinFail
   2 Distributed Forwarding Card WS-F6K-DFC3A SAD072300XR 1.0 MinFail
   3 Distributed Forwarding Card WS-F6K-DFC3A SAD072004BU 1.0 MinFail
   5 Policy Feature Card 3 WS-F6K-PFC3A SAD072100G1 1.1 Ok
   5 MSFC3 Daughterboard WS-SUP720 SAD072100JS 1.2 Ok
   9 Distributed Forwarding Card WS-F6700-DFC3A SAD074805CH 1.0 MinFail

bingham-h0-e1>


Created: 06/20/2009 02:52:45 by roo

Updates: 06/20/2009 03:37:16 by roo, 06/20/2009 23:51:59 by roo, 06/22/2009 15:38:35 by roo, 06/24/2009 06:09:17 by roo, 06/24/2009 06:54:11 by euw, 06/24/2009 08:19:44 by euw, 07/21/2009 10:09:21 by roo


Problem Report: ERP Financials server outage

Problem:   ERP Financials server outage 
Cause:     Hardware error
Affects:   Provided Financial ERP services
Started:   06/15/2009 08:00 AM
Resolved:  

Notes:

Received report the financials process server was unavailable. Server Engeneering staff responded onsite and the server was reporting a hard disk error and hung.

Power-cycled the server and the services became available after the reboot completed. Services ran in a degraded state as the rebuild of the disk was running.

The rebuild of the hard disk failed, the hard drive was replaced, the rebuild was attempted again and failed.

Continuing to troubleshoot the issue.


Created: 06/17/2009 11:57:04 by rak7

Updates:


Problem Report: Scholars house segemented from CCN

Problem:   Scholars house segemented from CCN
Cause:     Suspecting hardware failure.  
Affects:   No users are in this building for the summer.  
Started:   06/08/2009 04:39 PM
Resolved:  

Notes:

Suspecting supervisor module failure.

Investigating.


Created: 06/08/2009 16:57:59 by jhm

Updates: 06/08/2009 18:31:38 by roo, 06/09/2009 08:47:52 by euw


Problem Report: LDAP Replica (ldap-replica7) is down

Problem:   LDAP Replica (ldap-replica7) is down
Cause:     Seems to be a disk problem
Affects:   Only people pointed directly at ldap-replica7.cwru.edu
Started:   05/22/2009 11:35 AM
Resolved:  

Notes:

[5/22/09 1:30 PM] - A reboot seems to have brought the confused disk back so the LDAP replica is back in operation. We have called the vendor to have a look at the system however and are leaving it out of any access paths for the moment in case the system dies permanently. We will close this problem report when we have a definitive closure to the problem.

The disk on which the LDAP server binaries are stored appears to no longer be mounted on the system (this is a local disk). Server Engineering is looking into the issue.

The system is one of several redundant LDAP replicas. The only applications affected will be those who are pointed directly at this particular LDAP replica.


Created: 05/22/2009 12:24:57 by dak

Updates: 05/22/2009 13:27:28 by dak


Read more Problem Report posts. Subscribe

Scheduled Maintenance

Scheduled Maintenance: A. W. Smith Hall, a 96-Port Voice and Wireless Module needs to be refreshed by being reseated.

Problem:   A. W. Smith Hall, a 96-Port Voice and Wireless Module needs to be refreshed by being reseated.  
Cause:     unknown
Affects:   all of the End-Users, in the A. W. Smith Hall, for their voice and wireless network connections
Started:   02/03/2010 05:00 AM
Resolved:  02/03/2010 06:00 AM

Notes:

This Wednesday Morning, between the hours of five o'clock and six o'clock, during the Standard Maintenance Window, this module will be refreshed by being reseated.

And we apologize for any inconveniences this brief network outage may cause very early this Wednesday morning.


Created: 02/06/2010 10:14:23 by euw

Updates:


Scheduled Maintenance: Wood Hall, Room WB-14, SER 1

Problem:   Wood Hall, Room WB-14, SER 1
Cause:     A 16-Port Gigabit Ethernet Module has failed repeatedly lately.  
Affects:   up-to-eight fiber-optical data-ports, in the Wood Hall, on the Ground Floor
Started:   02/05/2010 05:00 AM
Resolved:  02/05/2010 06:00 AM

Notes:

This Friday Morning, between the hours of five o'clock and six o'clock, during the Standard Maintenance Window, this module will be replaced with one of the spares.

And we apologize for any inconveniences this brief network outage may cause very early tomorrow morning.


Created: 02/04/2010 18:44:16 by euw

Updates:


Read more Scheduled Maintenance posts. Subscribe