Emergency Maintenance
Emergency Maintenance: Shut down of iPlanet Delegated Administrator (ims-web.case.edu)
Problem: Shut down of iPlanet Delegated Administrator (ims-web.case.edu) Cause: Unnecessary now that migration to Google Mail is complete Affects: No one Started: 02/05/2010 05:00 AM Resolved: 02/05/2010 05:30 AM
Notes:
The iPlanet Delegated Administrator program is used to allow people to change their iPlanet forwarding, vacation messages, and filters. Since all user mail accounts have been migrated to Google Apps which has its own tools for the above functions, we are permanently shutting down the Delegated Administrator portion of the iPlanet system. Access to the mailbox and any mail remaining in your iPlanet mailbox will continue for the short-term future.
Created: 02/04/2010 06:34:52 by dak
Updates:
Emergency Maintenance: PeopleSoft Student and Data Warehouse database server maintenance
Problem: PeopleSoft Student and Data Warehouse database server maintenance Cause: restoring HA cluster service Affects: should have no impact, but a cluster issue may cause a database outage Started: 02/04/2010 05:00 AM Resolved: 02/04/2010 06:00 AM
Notes:
A machine that failed last week has been repaired and returned to service. It now needs to be re-joined to a cluster in order to move applications back.
This should not cause any disruption, but there is a risk of a cluster fault that may cause a short database outage.
Created: 02/03/2010 14:57:07 by jan3
Updates:
Emergency Maintenance: Development ERP Windows Server Needs rebooted.
Problem: Development ERP Windows Server Needs rebooted. Cause: Equipment maintenance Affects: Unknown Started: 10/12/2009 12:00 PM Resolved: 10/12/2009 12:30 PM
Notes:
The development Peoplesoft server Akita was having issues with the Symantec
Problem Report
Problem Report: The Crac-2 alarm is going off in the Crawford Data Center.
Problem: The Crac-2 alarm is going off in the Crawford Data Center. Cause: unknown Affects: the machines in the data center Started: 02/08/2010 05:00 AM Resolved: 02/08/2010 06:07 PM
Notes:
Facilities was called.
An HVAC Technician is on-site investigating the problem now.
humid now.
Created: 02/08/2010 06:24:58 by euw
Updates: 02/08/2010 06:26:46 by euw, 02/08/2010 18:07:18 by jhm
Problem Report: Crac-1 and Crac-3 alarm in Crawford Data Center
Problem: Crac-1 and Crac-3 alarm in Crawford Data Center Cause: Contact closed Affects: Machines in Data Center Started: 02/08/2010 06:02 AM Resolved: 02/08/2010 06:08 PM
Notes:
Facilities was called. Technician is on-site investigating the problem. There is a problem with Crac-2 also.
fixed by plant.
Created: 02/08/2010 06:04:47 by lab17
Updates: 02/08/2010 06:19:25 by lab17, 02/08/2010 18:08:24 by jhm
Problem Report: PeopleSoft SARPT clone from Production Failed
Problem: PeopleSoft SARPT clone from Production Failed Cause: Insufficient Disk Space Affects: PeopleSoft SIS Reporting Database Started: 02/05/2010 07:00 AM Resolved: 02/05/2010 09:30 AM
Notes:
SARPT is back up. The root cause of the problem was insufficient disk space. We have removed uneeded files, and rerun the clone scrips and everything is back up now. We will be investigating a more permanent solution to ensure this doesn't happen again.
The PeopleSoft SIS Clone from production to SARPT failed this morning due to insufficient disk space. Old files were deleted to reclaim disk space and the clone is being rerun. ETA is 10:00 am.
Created: 02/05/2010 08:45:45 by rxg263
Updates: 02/05/2010 09:31:38 by rxg263
Problem Report: Extremely slow performance on PeopleSoft Student and Warehouse Server
Problem: Extremely slow performance on PeopleSoft Student and Warehouse Server Cause: Extremely high disk activity Affects: PeopleSoft SIS and Data Warehouse Production Started: 02/04/2010 06:41 AM Resolved: 02/04/2010 07:52 AM
Notes:
There was extremely slow performance on all the databases that reside on DB5. This includes PeopleSoft SIS production and reporting, PeopleSoft Data Warehouse and the Budgeting dadtabases. The problem appeared to be caused by high disk activity and cleared up by 7:52 am. Server Engineering and the Database Team are looking into the root cause.
Created: 02/04/2010 09:02:44 by rxg263
Updates:
Problem Report: PeopleSoft SARPT is unavailable
Problem: PeopleSoft SARPT is unavailable Cause: Unknown Affects: PeopleSoft Student Reporting System Started: 02/04/2010 07:30 AM Resolved: 02/04/2010 11:30 AM
Notes:
PeopleSoft Reporting Database and Application are back on line.
There was a problem encountered with the PeopleSoft Student clone of production data to the reporting database. The Database Team is currently investigating. There is no ETA as of yet.
Created: 02/04/2010 08:46:59 by rxg263
Updates: 02/04/2010 11:48:36 by rxg263
Problem Report: Nottingham Spirk, formerly known as the First Church of Christ Scientist
Problem: Nottingham Spirk, formerly known as the First Church of Christ Scientist Cause: unknown Affects: all of the End-Users, in the Nottingham Spirk Started: 02/03/2010 12:00 AM Resolved: 02/03/2010 09:00 AM
Notes:
An electrical power outage occurred last night, sometimes in the middle of the night. The Cleveland Electrical Power Company was able to finally restore their electrical power this morning, sometimes in the middle of the morning. And everything was put back to where they should be then.
Created: 02/03/2010 16:23:56 by euw
Updates:
Problem Report: glennan-p2-e1 system was unreachable
Problem: glennan-p2-e1 system was unreachable Cause: hardware failure Affects: Supervisor Module failed Started: 02/02/2010 12:20 AM Resolved: 02/02/2010 09:00 AM
Notes:
Unable to access the switch across the network. Engineering determined Supervisor Module unresponsive replaced Supervisor Module, appiled config
Created: 02/02/2010 13:41:09 by dmw132
Updates:
Problem Report: Internet Connectivity to many off-campus sites is very slow
Problem: Internet Connectivity to many off-campus sites is very slow Cause: Unknown - it appears to be some sort of off-campus network issue Affects: Off-campus connections (mail, web, etc) to many sites Started: 02/02/2010 12:53 PM Resolved: 02/02/2010 01:30 PM
Notes:
Off campus mail connections restored at 4PM. Mail backlogs are now cleared.
This problem was resolved by 13:30. The cause was high CPU on several of the backbone routers.
We are noticing buildups of mail to several off-campus sites, as well as very slow response times on web page loads to off-campus sites like cnn.com.
Network Engineering has been informed of the issue and is looking into the problem.
Created: 02/02/2010 13:17:59 by dak
Updates: 02/02/2010 15:09:31 by cpr, 02/02/2010 16:27:52 by emr
Problem Report: Silver Spartan Diner, Cisco Catalyst Switch 3550
Problem: Silver Spartan Diner, Cisco Catalyst Switch 3550 Cause: unknown Affects: all of the End-Users, in the Silver Spartan Diner Started: 02/01/2010 12:00 PM Resolved: 02/02/2010 11:00 AM
Notes:
This morning, the full functionality of the switch was restored.
Created: 02/02/2010 10:57:00 by euw
Updates:
Problem Report: Glennan Hall, Fourth and Fifth Floors
Problem: Glennan Hall, Fourth and Fifth Floors Cause: unknown Affects: all of the End-Users, in the Glennan Hall, on the Fourth and Fifth Floors Started: 02/01/2010 12:00 PM Resolved: 02/02/2010 09:00 AM
Notes:
A Cisco Catalyst Switch C6509, Glennan-P2-E1, its Supervisor Module, Sup2, died or failed; so therefore, it was replaced and reprogrammed with another one of the same kind.
Created: 02/02/2010 09:48:54 by euw
Updates:
Problem Report: PBL Classrooms 003 and 202 Cisco Catalyst Switches 3750s
Problem: PBL Classrooms 003 and 202 Cisco Catalyst Switches 3750s Cause: unknown Affects: all of the End-Users, in the PBL Classrooms 003 and 202 Started: 02/01/2010 12:00 PM Resolved: 02/02/2010 08:00 AM
Notes:
This morning, the full functionality of the four switches was restored.
Created: 02/02/2010 08:17:33 by euw
Updates:
Problem Report: Extremely slow response from some Oracle databases
Problem: Extremely slow response from some Oracle databases Cause: runaway process, possibly triggered by earlier network outage Affects: Many applications--see Notes below Started: 02/01/2010 03:15 PM Resolved: 02/01/2010 04:02 PM
Notes:
The non-ERP production Oracle database server suffered severely degraded performance this afternoon. The root cause has not been determined, but one database instance had spawned an excessive number of processes and had to be stopped & re-started in order to restore the system to normal operation.
Affected applications include (but are not limited to):
Ad Astra
BlackBoard (Course mgmt. system)
Dental School Clinic
Identity Management (user account/password changes)
IP address/host name management
E-mail lists
MyCase portal
Pinnacle (phone billing)
University Library web sites (not including EuclidPLUS/on-line catalog system)
DARS
ProSam (Financial Aid)
Encyclopedia of Cleveland History
Internal IT applications: Appworx, ChangeMan, Mashups, VirtualCenter
The PeopleSoft Student ERP system was also affected indirectly (due to a database link to one of the affectd DBs).
Created: 02/01/2010 16:33:57 by jan3
Updates:
Problem Report: High cpu use on Network Core
Problem: High cpu use on Network Core Cause: Hardware Failure Affects: all on and off campus conectivity Started: 02/01/2010 01:09 PM Resolved: 02/01/2010 03:05 PM
Notes:
engineers are aware and working on this issue.
16:01 Engineers found a malfunctioning switch in UCRC2. Malfunctioning switch caused high CPU in all distribution routers, which resulted in high network latency. Problem switch has been isolated and removed from the network.
Created: 02/01/2010 13:10:36 by lxc152
Updates: 02/01/2010 16:01:19 by tpr20
Problem Report: Some web applications may not authenticate to CAS correctly
Problem: Some web applications may not authenticate to CAS correctly Cause: Updated SSL certificate in one of the CAS servers Affects: Access to some CAS-protected web applications Started: 01/28/2010 12:00 PM Resolved: 01/28/2010 04:04 PM
Notes:
The SSL certificate for one of the Single Sign-On (CAS) servers was renewed today to a new version signed with a newer Root Certificate Authority certificate. Because of this, CAS clients connecting to that server will likely experience a certificate handshake/validation error and intermittent errors in authentication.
Web servers running a CAS client should update their CAS CA certificate file from the old version (which contains one CA certificate) to the new one (which contains three certificates) to correct this.
The new "certificate bundle" can be found at this URL:
http://wiki.case.edu/Central_Authentication_Service
Created: 01/28/2010 16:04:46 by sdh7
Updates:
Problem Report: PeopleSoft Student system, Data Warehouse down
Problem: PeopleSoft Student system, Data Warehouse down Cause: database server down Affects: all users of the Student system and Data Warehouse Started: 01/25/2010 03:28 PM Resolved: 01/25/2010 04:30 PM
Notes:
[resolved time anticipated--databases are backk up, application/web servers being restarted as of 4:10pm]
Created: 01/25/2010 16:10:43 by jan3
Updates:
Problem Report: server PULITZER down
Problem: server PULITZER down Cause: crashed Affects: all users of PULITZER Started: 01/22/2010 03:50 PM Resolved: 01/22/2010 06:00 PM
Notes:
rebooted system
Created: 01/22/2010 19:08:36 by jan3
Updates:
Problem Report: PeopleSoft SIS is unresponsive
Problem: PeopleSoft SIS is unresponsive Cause: Unknown Affects: PeopleSoft SIS Started: 01/22/2010 12:45 PM Resolved: 01/22/2010 03:30 PM
Notes:
Update SAPRD had to be failed over to its failover server. As of 3:30 PM the PeopleSoft Student was up and running.
The PeopleSoft SIS was applicationhas been unresponsive since 12:45 pm. The server is in the process of being rebooted to clear up these issues. The ETA for getting things back is 3:00 pm.
Created: 01/22/2010 14:12:27 by rxg263
Updates: 01/22/2010 15:24:33 by rxg263
Problem Report: Bingham-H0-E1's PD-002 failed.
Problem: Bingham-H0-E1's PD-002 failed. Cause: unknown Affects: no one Started: 01/20/2010 06:38 PM Resolved: 01/21/2010 01:38 PM
Notes:
A technician is on-site working on the problem now.
In the Bingham Hub, there are located two Uninterruptible Power Systems, or UPSs, and in each UPS, there is
situated a Power Distribution Unit, or PDU, and because of
an electrical power failure last night, the first
UPS's PDU lost its electrical power last night.
So therefore, today upon discovery of the main reason for
the one failure, the 208-Volts A/C was restored; however,
this UPS's PDU was not recover-able, or in other words,
it was no good anymore; so therefore, we replaced it with
one of our spare PDUs today.
Created: 01/21/2010 07:34:58 by euw
Updates: 01/21/2010 13:28:25 by euw, 01/21/2010 18:41:49 by euw
Problem Report: Bingham-H0-U1's P/S1 failed.
Problem: Bingham-H0-U1's P/S1 failed. Cause: unknown Affects: no one Started: 01/20/2010 06:38 PM Resolved: 01/21/2010 01:38 PM
Notes:
A technician is on-site working on the problem now.
In the Bingham Hub, there are located two Distribution Routers, and in each distribution router, there are
situated two 2500-Watts Power-Supplies, and because of
an electrical power failure last night, the first
distribution router's first power-supply and
the second distribution router's second power-supply,
they both lost their electrical power last night.
So therefore, today upon discovery of the main reason for
the two failures, the 208-Volts A/C was restored; however,
one of two non-working power-supplies was not recover-able,
or in other words, it was no good anymore; so therefore,
we replaced it with one of our spare power-supplies today.
Created: 01/21/2010 07:33:11 by euw
Updates: 01/21/2010 13:26:00 by euw, 01/21/2010 18:29:31 by euw
Problem Report: ucrc2-m1-e1 is down
Problem: ucrc2-m1-e1 is down Cause: unknown Affects: unknown Started: 01/20/2010 06:38 PM Resolved: 01/21/2010 07:18 AM
Notes:
Technician is on-site working on the problem.
Problem started at 06:38 P.M..
Problem solved at 07:18 A.M..
1. In the Bingham Hub,
the first one of two Cisco Catalyst C6509
Distribution-Level Routers, Bingham-H0-E1,
its first one of two 2500-Watts Power-Supplies Failed.
2. In the Bingham Hub,
the first one of two Emerson/Liebert UPSs,
Bingham-H0-U1, its Power Distribution Unit, PDU Failed.
3. In the Bingham Hub, a power-strip, which runs off of
this particular PDU, it lost its electrical power,
at around six-thirty, this Wednesday Evening, or
very early last night.
4. In the Bingham Hub, an Allied-Telesyn network switch,
which runs off of this particular power-strip,
it lost its electrical power, at about the same time.
5. This particular network switch, it also does
the FO/UTP media conversion for the two up-links,
for the Cisco Catalyst Switch, UCRC2-M1-E1, which
provides the Local Area Network, for the UCRC2 Building.
6. So therefore, the UCRC2 Building, it lost its two
network connections to the Local Area Network, for
the CWRU, at around the same time.
7. However, we did not discover the root cause or
the source of the problem, until early this morning.
8. So therefore, first
we moved this particular power-strip,
to the Bingham-H0-U2's PDU,
from the Bingham-H0-U1's PDU.
9. And then the network connectivity was fully-restored,
for the UCRC2 Building, for the CWRU's local area network.
10. None of this work affected their network connectivity,
for the UH's local area network, in the UCRC2 Building.
11. An electrical circuit breaker, 208-Volts AC, was
tripped, early last night.
12. An electrical technician, he reset this electrical
circuit breaker this morning.
13. We replaced the failed power-supply with a spare.
14. We replaced the failed PDU also with a spare.
15. And thus, by these ways, we brought the UPS back to
life, this morning, and everything is back to where they
should be now.
Created: 01/21/2010 06:08:07 by lab17
Updates: 01/21/2010 07:28:58 by euw, 01/21/2010 14:14:18 by euw
Problem Report: Case VPN Service is Down
Problem: Case VPN Service is Down Cause: Cannot authenticate User Affects: Case VPN Services Started: 01/20/2010 02:45 PM Resolved: 01/20/2010 03:08 PM
Notes:
Update: Engineer re-established communication between ASA Server and LDAP server for authentication. VPN services restored.
VPN Servers are having issue authenticating user against LDAP. Engineer is investigating the issue..
Created: 01/20/2010 14:46:12 by wxc16
Updates: 01/20/2010 15:23:36 by wxc16, 01/20/2010 15:25:18 by wxc16
Problem Report: Mail List manager is down
Problem: Mail List manager is down Cause: Bulk mailing to several thousand people choked system and ran it out of disk Affects: Users of the mail list manager (Sympa) Started: 01/19/2010 05:00 PM Resolved: 01/19/2010 06:15 PM
Notes:
[01/19/2010 6:15PM] - We've managed to get the logjam of messages cleared and have brought the mail list manager back up.
We are working to unjam the system and get Sympa running again. We will post an update as soon as possible.
Created: 01/19/2010 17:45:18 by dak
Updates: 01/19/2010 18:15:39 by dak
Problem Report: blackboard.case.edu outage
Problem: blackboard.case.edu outage Cause: database error Affects: all Blackboard users Started: 01/18/2010 12:57 AM Resolved: 01/18/2010 12:37 PM
Notes:
One of the database tablespaces had filled up. It has now been enlarged and Blackboard is working again.
Created: 01/18/2010 13:19:40 by prj
Updates:
Problem Report: Oracle Databases on DB7 are down
Problem: Oracle Databases on DB7 are down Cause: DB7 server problems after planned maintenance Affects: Non PeopleSoft Oracle Production Databases Started: 01/18/2010 09:00 AM Resolved: 01/18/2010 11:30 AM
Notes:
All Databases were up at 10:48 am with the exception of kslp which was up by 11:30 am.
After the planned maintenance to replace a DIMM memory module on DB7 this morning, the server would not boot up and server-engineering is failing over all of the databases on DB7 to DB8. This affects all Non PeopleSoft Oracle Production Databases.
The ETA for coming back on line is 11:30 am.
Created: 01/18/2010 10:19:50 by rxg263
Updates: 01/18/2010 11:30:31 by rxg263
Problem Report: One LDAP replica (ldap-replica8) had a database get corrupted
Problem: One LDAP replica (ldap-replica8) had a database get corrupted Cause: Unknown why the database got corrupted Affects: Single Sign ON users and some other systems Started: 01/15/2010 05:30 PM Resolved: 01/15/2010 06:05 PM
Notes:
The internal database that stores user records became corrupted on one of the LDAP replicas causing every login attempt to that replica to fail. While the Single Sign On (SSO) system itself was still running, many people failed to log in due to the failed LDAP replica.
Restarting the failed system and letting it recover its database resolved the issue.
Created: 01/15/2010 18:16:43 by dak
Updates:
Problem Report: Spam filter machine (mpspam3) for local mailboxes has died
Problem: Spam filter machine (mpspam3) for local mailboxes has died Cause: We suspect a failed disk Affects: About 1/4 of those people still not migrated to Google Apps Started: 01/13/2010 07:15 AM Resolved: 01/13/2010 10:00 AM
Notes:
[01/13/2010 10:30AM] - We have done the necessary work to remove the failed hardware from the mail delivery path. At this point mail that was queued on other systems waiting for the failed hardware has been delivered.
Mail is currently being delivered into the local (iPlanet) mail system without the benefit of any spam filtering other than the most basic for those people on the failed server (about 1700 people), who had spam filtering turned on at spamcontrol.case.edu. Connection to spamcontrol.case.edu is no longer possible for those individuals as well. Individuals who did not have spam filtering turned on will not notice any difference in the amount of spam received.
The failed equipment is no longer supported by the vendor so it is unlikely that we will be able to replace the failed hardware. For that reason, people are encouraged to migrate fully to their Google Apps mailbox immediately. Note that mail migration is required for everyone before February 2010 in any event.
One of the four machines that filter spam for the local (iPlanet) mailboxes has died. This machine only serves 1/4 of the people who have not yet migrated to Google Apps (about 1700 people).
We suspect a failed disk as the culprit. A mail administrator is on the way to see if the machine can be revived now.
While the system is down, no mail will be lost as it will just queue on other systems.
Created: 01/13/2010 07:29:18 by dak
Updates: 01/13/2010 10:34:39 by dak
Problem Report: Case privat network connectivity to Metrohealth medical center
Problem: Case privat network connectivity to Metrohealth medical center Cause: Unknown Affects: Case private network connectivity to mhmc rammelkemp center Started: 01/10/2010 01:02 PM Resolved: 01/10/2010 03:07 PM
Notes:
Engineer received notification that case private link to mhmc rammelkemp center is lost. Engineer is investigating the issue. No ETA at this point.
13:30 Engineer able to restore connectivity to MHMC. Connection restored.
Created: 01/10/2010 13:04:59 by wxc16
Updates: 01/10/2010 15:07:46 by wxc16
Problem Report: Millie science center network is down.
Problem: Millie science center network is down. Cause: Unknown Affects: Network connectivity in millis sci center. Data and voice. Started: 01/10/2010 12:59 AM Resolved:
Notes:
Engineer is investigating the issue. No ETA at this point
Engineer discovered equipment room's Network Switches hung. Reboot switches and restored network connectivity.
Created: 01/10/2010 13:01:35 by wxc16
Updates: 01/10/2010 15:07:50 by wxc16
Problem Report: In the Wickenden Hall, SER 1, both Cisco Switches lost power, at about ten o'clock this morning.
Problem: In the Wickenden Hall, SER 1, both Cisco Switches lost power, at about ten o'clock this morning. Cause: Both Emerson Liebert UPSs Normal Mode Failed, at about ten o'clock this morning. Affects: Data and Voice, Wired and Wireless Started: 12/16/2009 10:00 AM Resolved:
Notes:
At about eleven o'clock this morning,
both Cisco Switches are running off of
both Emerson Liebert UPSs, which are now running,
in the Bypass Mode, and both Internal Batteries are now
not actively part of the present electrical power circuits.
Created: 12/16/2009 11:06:18 by euw
Updates:
Problem Report: WRB Hall, Room 1-336, SER 3, UPS 1, WRB-P3-U1, its Web-Page hasn't been loading up now.
Problem: WRB Hall, Room 1-336, SER 3, UPS 1, WRB-P3-U1, its Web-Page hasn't been loading up now. Cause: Its Web-Card may need to be reprogrammed and reseated now. Affects: You aren't be able to access its SNMP card remotely now. Started: 12/15/2009 10:33 AM Resolved:
Notes:
WRB Hall, Room 1-336, SER 3, UPS 1, WRB-P3-U1,
its Web-Page hasn't been loading up now.
Its Web-Card may need to be reprogrammed and reseated now.
You aren't be able to access its SNMP card remotely now.
Model No. GXT5000R-208.
Serial No. 030-700-300-6BW-502.
Created: 12/15/2009 13:53:56 by euw
Updates:
Problem Report: WRB Hall, Room 2-406, SER 5, UPS 1, WRB-P5-U1, its Web-Page hasn't been loading up now.
Problem: WRB Hall, Room 2-406, SER 5, UPS 1, WRB-P5-U1, its Web-Page hasn't been loading up now. Cause: Its Web-Card may need to be reprogrammed and reseated now. Affects: You aren't be able to access its SNMP card remotely now. Started: 12/15/2009 10:33 AM Resolved:
Notes:
WRB Hall, Room 2-406, SER 5, UPS 1, WRB-P5-U1,
its Web-Page hasn't been loading up now.
Its Web-Card may need to be reprogrammed and reseated now.
You aren't be able to access its SNMP card remotely now.
Model No. GXT5000R-208.
Serial No. 030-700-302-0BW-502.
Created: 12/15/2009 13:43:16 by euw
Updates:
Problem Report: WRB Hall, Room 3-411, SER 6, UPS 1, WRB-P6-U1, its Web-Page hasn't been loading up now.
Problem: WRB Hall, Room 3-411, SER 6, UPS 1, WRB-P6-U1, its Web-Page hasn't been loading up now. Cause: Its Web-Card may need to be reprogrammed and reseated now. Affects: You aren't be able to access its SNMP card remotely now. Started: 12/15/2009 10:33 AM Resolved:
Notes:
WRB Hall, Room 3-411, SER 6, UPS 1, WRB-P6-U1,
its Web-Page hasn't been loading up now.
Its Web-Card may need to be reprogrammed and reseated now.
You aren't be able to access its SNMP card remotely now.
Model No. GXT5000R-208.
Serial No. 030-700-302-2BW-502.
Created: 12/15/2009 13:35:47 by euw
Updates:
Problem Report: WRB Hall, Room 3-406, SER 7, UPS 1, WRB-P7-U1, its Web-Page hasn't been loading up now.
Problem: WRB Hall, Room 3-406, SER 7, UPS 1, WRB-P7-U1, its Web-Page hasn't been loading up now. Cause: Its Web-Card may need to be reprogrammed and reseated now. Affects: You aren't be able to access its SNMP card remotely now. Started: 12/15/2009 10:33 AM Resolved:
Notes:
WRB Hall, Room 3-406, SER 7, UPS 1, WRB-P7-U1,
its Web-Page hasn't been loading up now.
Its Web-Card may need to be reprogrammed and reseated now.
You aren't be able to access its SNMP card remotely now.
Model No. GXT5000R-208.
Serial No. 030-700-301-9BW-502.
Created: 12/15/2009 13:28:06 by euw
Updates:
Problem Report: WRB Hall, Room 6-411, SER 12, UPS 1, WRB-P12-U1, its Web-Page hasn't been loading up now.
Problem: WRB Hall, Room 6-411, SER 12, UPS 1, WRB-P12-U1, its Web-Page hasn't been loading up now. Cause: Its Web-Card may need to be reprogrammed and reseated now. Affects: You aren't be able to access its SNMP card remotely now. Started: 12/15/2009 10:33 AM Resolved:
Notes:
WRB Hall, Room 6-411, SER 12, UPS 1, WRB-P12-U1,
its Web-Page hasn't been loading up now.
Its Web-Card may need to be reprogrammed and reseated now.
You aren't be able to access its SNMP card remotely now.
Model No. GXT2-6000RT208.
Serial No. 082-90R-004-1BW-571.
Created: 12/15/2009 13:08:17 by euw
Updates:
Problem Report: WRB Hall, Room 6-406, SER 13, UPS 1, WRB-P13-U1, its Web-Page hasn't been loading up now.
Problem: WRB Hall, Room 6-406, SER 13, UPS 1, WRB-P13-U1, its Web-Page hasn't been loading up now. Cause: Its Web-Card may need to be reprogrammed and reseated now. Affects: You aren't be able to access its SNMP card remotely now. Started: 12/15/2009 10:33 AM Resolved:
Notes:
WRB Hall, Room 6-406, SER 13, UPS 1, WRB-P13-U1,
its Web-Page hasn't been loading up now.
Its Web-Card may need to be reprogrammed and reseated now.
You aren't be able to access its SNMP card remotely now.
Model No. GXT2-6000RT208.
Serial No. 051-310-007-6BW-572.
Created: 12/15/2009 12:34:04 by euw
Updates:
Problem Report: Reported BOTNET infected system on the wireless network
Problem: Reported BOTNET infected system on the wireless network Cause: Infect computer system on the wireless system Affects: All wireless users that DO NOT VPN into the university from the wireless network Started: 12/14/2009 10:21 AM Resolved:
Notes:
Greetings,
The host(s) listed at the bottom of this message have been identified as likely bot infected. The specific type of bot infection may or may not be known.
If a source port is identified below, this is the source port used by the infected machine to contact a miscreant server.
Please examine this machine for signs of break-in. Should you feel you've received this report in error, please let us know.
Wireless users should VPN back into the university if they find websites that are not responding or responding slowly
Engineers are investigating now
we currently do not have an ETA for repair
All times are -0000 (UTC)
IP Address Timestamp
----------------------------------------
192.5.109.49 2009-12-13.02:35:28-0000 SrcPort:TCP/61700 MalwareType:Torpig
Created: 12/14/2009 10:26:12 by lxc152
Updates:
Problem Report: Backup DHCP server for VoIP offline
Problem: Backup DHCP server for VoIP offline Cause: Disk Failure Affects: No one Started: 11/20/2009 03:48 PM Resolved:
Notes:
The Backup Server Roo (VoIP) suffered a boot disk failure.
Server Engineers are aware of the problem and looking at the server.
Created: 11/20/2009 16:09:42 by dnd
Updates: 11/20/2009 16:12:28 by dnd
Problem Report: CDC CRAC 2 ALARM
Problem: CDC CRAC 2 ALARM Cause: unknown Affects: CDC Started: 10/19/2009 10:00 AM Resolved:
Notes:
This morning, the Facility Maintenance has been contacted.
Created: 10/19/2009 13:28:02 by euw
Updates:
Problem Report: Degraded Internet connectivity
Problem: Degraded Internet connectivity Cause: Internet routing problem in the global crossing network Affects: all internet services web browsing, email Started: 10/01/2009 02:24 PM Resolved:
Notes:
UPDATE: 10/1 19:16 ISP able to resolve routing issue at their end. Internet Connectivity restored. Engineers will continue to monitor the Case network.
UPDATE: 10/1 18:41 ISP is continuing working on the problem. End-users may have problem reaching the Internet. Offcampus users may also have problem reaching the Case network. No ETA at this point.
The problem is beyond our Internet provider.
The problem appears to be on the global crossing network
OUR ISP has a ticket open with Global crossing on this ticket.
local CASE engineers are monitoring this issue.
degraded access to cnn and msnbc have been noted.
we will advise as more information become available
Created: 10/01/2009 14:30:11 by lxc152
Updates: 10/01/2009 18:45:30 by wxc16, 10/01/2009 18:48:46 by wxc16, 10/01/2009 19:20:35 by wxc16
Problem Report: Degraded Internet Connectivity
Problem: Degraded Internet Connectivity Cause: Unexpected ISP maintenance issue Affects: Case Network Connecivity to the Internet Started: 09/30/2009 04:16 PM Resolved:
Notes:
UPDATE: 10/1 19:16 ISP able to resolve routing issue at their end. Internet Connectivity restored. Engineers will continue to monitor the Case network.
10/1 14:25 9/30 17:01 9/30 16:37 Engineers are aware of some intermittent Internet connectivity issue. No problem found with network connection inside campus. End users may experience problem connecting to the Internet. It appears to be problem at the ISP's end. Engineers are contacting ISP support.
Created: 09/30/2009 16:18:55 by wxc16 Updates: 09/30/2009 16:41:29 by wxc16, 09/30/2009 17:05:38 by wxc16, 10/01/2009 14:20:46 by lxc152, 10/01/2009 14:30:16 by wxc16, 10/01/2009 19:20:43 by wxc16 Engineers are investigating
Created: 09/25/2009 07:38:37 by man27 Updates: Engineer got reports regarding wireless outage throughout campus. Engineer is investigating.
Created: 08/31/2009 13:48:57 by wxc16 Updates: VG248 did not come back after recycle of power.
Created: 08/11/2009 10:21:57 by jhm Updates: 08/11/2009 11:00:54 by jhm Engineers are investigating the problem
Created: 07/30/2009 09:31:06 by lxc152 Updates: 07/30/2009 09:37:35 by man27 User may experience zero network connectivity after he or she established a Case VPN session. Suspect VPN server's threat detection erroneously dropped the returning traffic to the user. VPN Server's Threat Detection restarted. Engineer continue to monitor.
Created: 07/24/2009 18:11:27 by wxc16 Updates: Voicemail system's hardwares have been replaced. Voicemail system's software have been upgraded to version 2.4. Delayed LDAP response no longer seen in the new version of software.
Voicemail System has return back to normal performance. Engineer will continue monitor the system.
User may experiencing delay when trying to retrieve Voicemail messages via the telephone. User may experience up to 30 sec of delay after he or she enter the Passcode before the system responds / plays user's voicemail messages.
Created: 07/24/2009 17:33:39 by wxc16 Updates: 07/30/2009 17:14:39 by wxc16, 08/07/2009 22:19:50 by wxc16 The mailing list server is down because the database server crashed and Sympa can not run without a database.
Created: 07/15/2009 12:06:49 by emr Updates: 07/15/2009 15:47:12 by emr The Unix Server DB7 is currently experiencing problems and rebooted at approx. 10:45 am. This currently affects 40 Oracle databases including Blackboard, Advanced Contributions, All APEX Systems, APPWORX, Degree Audit, Dental School, KSL, ISIS, Pinnacle, Portal, MyCaseP, ONBASE, ONCORE, TeamTrack, T2Park, Serena. Server Engineering is currently working on the issue.
Created: 07/15/2009 11:41:13 by rxg263 Updates: 07/15/2009 12:04:11 by rxg263, 07/15/2009 12:10:27 by rxg263, 07/15/2009 15:10:23 by dxw134 Facility Maintenance has been notified, and
Created: 07/07/2009 11:55:30 by euw Updates: Resolved.
According to Facility, HVAC to the SER had to be shutdown to fix flooding in the building.
Created: 06/23/2009 09:04:48 by roo Updates: 07/21/2009 09:27:12 by roo Resolved and closed
2009, June 24, 08:15 AM, we restored the back-up link
2009, June 24, 06:30 AM, we restored the main link
June 24, 04:45 AM Lost several building networks because the AC failed again. Facility on the way.
June 22, 03:00 PM Hub experiencing cooling issues again. Plant services have been called. The SER AC keeps shutting down.
11:52PM: The line cards in the hubs have started experiencing temperature failure again. Called plant services to look into it. looks like the AC keeps tripping off.
03:30 am Update: Plan services is currently working on the SER cooling. The switch line cards are recovering from Temperature failure.
Investigating.
Created: 06/20/2009 02:52:45 by roo Updates: 06/20/2009 03:37:16 by roo, 06/20/2009 23:51:59 by roo, 06/22/2009 15:38:35 by roo, 06/24/2009 06:09:17 by roo, 06/24/2009 06:54:11 by euw, 06/24/2009 08:19:44 by euw, 07/21/2009 10:09:21 by roo Received report the financials process server was unavailable. Server Engeneering staff responded onsite and the server was reporting a hard disk error and hung.
Created: 06/17/2009 11:57:04 by rak7 Updates: Suspecting supervisor module failure.
Created: 06/08/2009 16:57:59 by jhm Updates: 06/08/2009 18:31:38 by roo, 06/09/2009 08:47:52 by euw [5/22/09 1:30 PM] - A reboot seems to have brought the confused disk back so the LDAP replica is back in operation. We have called the vendor to have a look at the system however and are leaving it out of any access paths for the moment in case the system dies permanently. We will close this problem report when we have a definitive closure to the problem.
Created: 05/22/2009 12:24:57 by dak Updates: 05/22/2009 13:27:28 by dak This Wednesday Morning, between the hours of five o'clock and six o'clock, during the Standard Maintenance Window, this module will be refreshed by being reseated.
Created: 02/06/2010 10:14:23 by euw Updates: This Friday Morning, between the hours of five o'clock and six o'clock, during the Standard Maintenance Window, this module will be replaced with one of the spares.
Created: 02/04/2010 18:44:16 by euw Updates:
Problem Report: VPN Unavailable
Problem: VPN Unavailable
Cause: unknown
Affects: any VPN User
Started: 09/25/2009 07:37 AM
Resolved:
Notes:
Problem Report: Wireless Network Outage
Problem: Wireless Network Outage
Cause: Unknown
Affects: Campus Wireless Network
Started: 08/31/2009 09:00 AM
Resolved:
Notes:
Problem Report: Fiji House
Problem: Fiji House
Cause: Power event
Affects: data, telephony
Started: 08/11/2009 03:20 AM
Resolved:
Notes:
Switch up. Switch config is lost. No one in the house yet,
Engineering got the analog phones working. data still down.
Problem Report: Network packet loss
Problem: Network packet loss
Cause: Unknown
Affects: All Internet Traffic
Started: 07/30/2009 09:23 AM
Resolved:
Notes:
Users are experiencing intermittent degradation in internet connection speed.
Problem Report: Case VPN Services - zero network connetivity after VPN session is established.
Problem: Case VPN Services - zero network connetivity after VPN session is established.
Cause: Unknown
Affects: Case VPN Services
Started: 07/20/2009 09:00 AM
Resolved:
Notes:
Workaround:
Disconnect VPN session and reconnect.
Problem Report: Case Voicemail Performance Degraded
Problem: Case Voicemail Performance Degraded
Cause: Unknown
Affects: Degrade Voicemail services
Started: 07/20/2009 09:00 AM
Resolved: 08/03/2009 05:00 PM
Notes:
Workaround:
1) Hang up and retry
2) Retrieve voicemail via your Emails.
Sorry for the inconvenience. Engineer is working on resolving this issue.
Problem Report: Sympa Mailing List Server is down
Problem: Sympa Mailing List Server is down
Cause: Database server is down
Affects: All mailing lists and admin aliases
Started: 07/15/2009 10:50 AM
Resolved: 07/15/2009 03:30 PM
Notes:
The database server group is working on restoring the database server.
Update: The DB was restored and mail flowing again at 2:15pm. All queued messages were delivered by 3:30pm. We are now running normally.
Problem Report: Unix Server DB7 is having problems
Problem: Unix Server DB7 is having problems
Cause: Unknown Hardware Problem
Affects: All Non PeopleSoft Production Oracle Databases
Started: 07/15/2009 10:45 AM
Resolved: 07/15/2009 02:15 PM
Notes:
The server was rebooted and all disk file systems had to be manually mounted. All of the databases were up at 2:15 pm. Diagnostic files have been sent to the vendor.
The server vendor has determined that a catastrophic memory failure in the system's main memory caused the CPU panic and subsequent crash. We will schedule an emergency maintenance outage during the maintenance window as soon as we are in contact with the vendor's server engineer.
Problem Report: KSL Data Center's CRAC DC-5 Humidity reading = 48%, setting = 44%
Problem: KSL Data Center's CRAC DC-5 Humidity reading = 48%, setting = 44%
Cause: unknown
Affects: KSL Data Center
Started: 07/07/2009 05:30 AM
Resolved:
Notes:
apprised of the situation.
Problem Report: Pathology-p3-e1 cooling issue
Problem: Pathology-p3-e1 cooling issue
Cause: HVAC issues
Affects: network equipment
Started: 06/22/2009 02:58 PM
Resolved: 07/21/2009 09:27 AM
Notes:
There is minimal impact at the moment but If prolonged high temperature in the SER, it will affect the network equipment causing potential outage to Wired, wireless, phone and security panels in pathology.
Problem Report: Bingham Hub
Problem: Bingham Hub
Cause: Cooling problem in Hub
Affects: Wired, wireless, phones, security panels for several buildings on south side
Started: 06/20/2009 02:48 AM
Resolved: 07/21/2009 10:09 AM
Notes:
between the Bingham Hub and the KSL Data Center.
between the Bingham Hub and the Crawford Data Center,
which restored all network Connections and Connectivity,
for the South Campus, with regards to the Bingham Hub;
however, the back-up link between the Bingham Hub and
the KSL Data Center, it is still down, and it will need
to be further investigated.
Jun 20 03:33:45 EDT: %C6KENV-SP-4-MINORTEMPALARMRECOVER: module 9 outlet temperature crossed threshold #1(=60C). It has returned to normal operating temperature range.
Services are coming back online.
Still monitoring.
Called Plant services to check cooling.
Suspect failed cooling in SER.
bingham-h0-e1>show mod
Mod MAC addresses Hw Fw Sw Status
--- ---------------------------------- ------ ------------ ------------ -------
1 0009.11f7.e830 to 0009.11f7.e83f 1.0 7.2(1) 8.5(0.46)RFW MinFail
2 000c.ceb5.a900 to 000c.ceb5.a90f 1.0 7.2(1) 8.5(0.46)RFW MinFail
3 000c.ceb5.aa40 to 000c.ceb5.aa4f 1.0 7.2(1) 8.5(0.46)RFW MinFail
4 0003.feac.7772 to 0003.feac.7779 2.0 7.2(1) 3.5(1) Ok
5 000c.ce63.e864 to 000c.ce63.e867 2.1 7.7(1) 12.2(18)SXF1 Ok
9 000d.6550.b866 to 000d.6550.b869 1.1 12.2(14r)S5 12.2(18)SXF1 MinFail
Mod Sub-Module Model Serial Hw Status
---- --------------------------- ------------------ ----------- ------- -------
1 Distributed Forwarding Card WS-F6K-DFC3A SAD072004CL 1.0 MinFail
2 Distributed Forwarding Card WS-F6K-DFC3A SAD072300XR 1.0 MinFail
3 Distributed Forwarding Card WS-F6K-DFC3A SAD072004BU 1.0 MinFail
5 Policy Feature Card 3 WS-F6K-PFC3A SAD072100G1 1.1 Ok
5 MSFC3 Daughterboard WS-SUP720 SAD072100JS 1.2 Ok
9 Distributed Forwarding Card WS-F6700-DFC3A SAD074805CH 1.0 MinFail
bingham-h0-e1>
Problem Report: ERP Financials server outage
Problem: ERP Financials server outage
Cause: Hardware error
Affects: Provided Financial ERP services
Started: 06/15/2009 08:00 AM
Resolved:
Notes:
Power-cycled the server and the services became available after the reboot completed. Services ran in a degraded state as the rebuild of the disk was running.
The rebuild of the hard disk failed, the hard drive was replaced, the rebuild was attempted again and failed.
Continuing to troubleshoot the issue.
Problem Report: Scholars house segemented from CCN
Problem: Scholars house segemented from CCN
Cause: Suspecting hardware failure.
Affects: No users are in this building for the summer.
Started: 06/08/2009 04:39 PM
Resolved:
Notes:
Investigating.
Problem Report: LDAP Replica (ldap-replica7) is down
Problem: LDAP Replica (ldap-replica7) is down
Cause: Seems to be a disk problem
Affects: Only people pointed directly at ldap-replica7.cwru.edu
Started: 05/22/2009 11:35 AM
Resolved:
Notes:
The disk on which the LDAP server binaries are stored appears to no longer be mounted on the system (this is a local disk). Server Engineering is looking into the issue.
The system is one of several redundant LDAP replicas. The only applications affected will be those who are pointed directly at this particular LDAP replica.
Scheduled Maintenance
Scheduled Maintenance: A. W. Smith Hall, a 96-Port Voice and Wireless Module needs to be refreshed by being reseated.
Problem: A. W. Smith Hall, a 96-Port Voice and Wireless Module needs to be refreshed by being reseated.
Cause: unknown
Affects: all of the End-Users, in the A. W. Smith Hall, for their voice and wireless network connections
Started: 02/03/2010 05:00 AM
Resolved: 02/03/2010 06:00 AM
Notes:
And we apologize for any inconveniences this brief network outage may cause very early this Wednesday morning.
Scheduled Maintenance: Wood Hall, Room WB-14, SER 1
Problem: Wood Hall, Room WB-14, SER 1
Cause: A 16-Port Gigabit Ethernet Module has failed repeatedly lately.
Affects: up-to-eight fiber-optical data-ports, in the Wood Hall, on the Ground Floor
Started: 02/05/2010 05:00 AM
Resolved: 02/05/2010 06:00 AM
Notes:
And we apologize for any inconveniences this brief network outage may cause very early tomorrow morning.
