首页 > 学技术 > 技术网文 > IBM AS400应用论坛 > 正文

[精彩] 救命:关于坏硬盘问题


来源 chinaunix.net 酷勤网整理

各位老大,因为AS400/有块硬盘Failed(失败) ,左右的两块就Unprotected (无保护),我现在是先要拿掉 Failed(失败)的盘,让RAID有保护,不是更换,硬盘作过RAID5的,如何作,为了保险,我需要详细的操作步骤和注意事项,比如进STT或者DST后怎么作,是否需要IPL,等等,因为小弟没有做过,所以向老大们请教,谢谢。
  原因:今天AS400突然宕机,所有用户都被踢出,系统控制台和5250都不能进,PING IP也不通,后查LOG和STT是一块硬盘报错,接着其他两块也接着报错,晕呀,在此之前什么提示都没有,根本就没机会查问题,AS400也如此可怕。

ASP  Unit   Number      Type Model  Name        Status              
  1                                             Unprotected         
        1   21-4C06D    4327  070   DD001       DPY/Unprotected     
        2   21-593A7    4327  078   DD003       DPY/Failed          
        3   21-583D4    4327  078   DD002       DPY/Unprotected     
        4   21-07780    4327  078   DD005       DPY/Active          
        5   21-18F93    4327  070   DD006       DPY/Active          
        6   21-0D1F7    4327  078   DD004       DPY/Active          
        7   68-0CEC613  4327  072   DD007       DPY/Active          
        8   68-0CF7A32  4327  074   DD010       DPY/Active          
        9   68-0CF6FCB  4327  072   DD017       DPY/Active          
       10   68-0CF7974  4327  072   DD009       DPY/Active



 zhaoming1214 回复于:2006-06-28 11:16:39

你们公司没买MA?这个我没把握,帮顶下


 qingzhou 回复于:2006-06-28 11:17:39

晕S~~~ :m01:

WRKPRB肯定可以查到报警信息,SST里面也可以查到,是你没去定期检查系统信息吧?!

ASP这个玩意可不能开玩笑的,还是找家可靠的MA公司帮你搞定吧~~~

作用是:把数据风险降低最小,把个人责任降低最小。 :mrgreen:


 zhaoming1214 回复于:2006-06-28 11:23:05

同意qingzhou (轻舟), 这台是生产机吧


 just a kid 回复于:2006-06-28 11:40:18

把那块FAILED的赶紧换了先,然后把报错的也换了,这样保证数据无损,抓紧啊,不然再FAILED一块,就回天无力了


 心城 回复于:2006-06-28 12:13:52

我早上查了STT没有报警的,再说就是有一块报警也不致于宕机呀(公司也前一个系统管理,一块盘坏了半年,我接手才发现,也没宕过机,或者有两块盘变成无保护),我是作了RAID5的,现在要命的是我手里没有盘。我不能换,只能拿掉,怎么拿呀


 心城 回复于:2006-06-28 12:18:51

再次,感谢上面各位老大,我就是怕再坏盘,新的盘要3天后才到,我是想拿掉坏的盘,或者将无保护的加入到保护里去,有什么法子没有?请赐教呀


 心城 回复于:2006-06-28 12:43:05

-------- >-------- --------< ----  ---------------------------- ---------- --------- ---------- --------  ----------------
 B6005120  27/06/06 13:00:30  Info  2/  1/0/ 16- /  /  /  /  /   CMB01      284E 001  09-3141055 80000257  System LIC detec
 B6000255  27/06/06 13:01:58  Reco  2/  1/0/ 16-2/ 2/ 0/ 3/ 0/   DD003      4327 078  21-593A7   8000025A  Contact was lost
 B6005123  27/06/06 13:02:57  LIC   2/  1/0/ 16-2/ 2/ 0/ 3/ 0/   DD003      4327 078  21-593A7   8000025B  System LIC progr
 B6005120  27/06/06 13:08:12  Info  2/ 25/0/ 32- /  /  /  /  /   CMB03      2844 001  10-5294165 8000025C  System LIC detec
 B6005120  27/06/06 13:08:12  Info  2/ 25/0/ 32- /  /  /  /  /   CMB03      2844 001  10-5294165 8000025D  System LIC detec
 B6005120  27/06/06 13:08:12  Info  2/ 25/0/ 32- /  /  /  /  /   CMB03      2844 001  10-5294165 8000025E  System LIC detec
 B6005120  27/06/06 13:08:12  Info  2/ 25/0/ 32- /  /  /  /  /   CMB03      2844 001  10-5294165 8000025F  System LIC detec
 B6005120  27/06/06 13:08:12  Info  2/ 25/0/ 32- /  /  /  /  /   CMB03      2844 001  10-5294165 80000260  System LIC detec
 B6005120  27/06/06 13:08:12  Info  2/ 25/0/ 32- /  /  /  /  /   CMB03      2844 001  10-5294165 80000261  System LIC detec
 B6005120  27/06/06 13:12:58  Info  2/  1/0/ 16-2/ 2/  /  /  /   DC01       5703 001  0C-3338421 80000262  System LIC detec


 just a kid 回复于:2006-06-28 13:00:39

B6000255  27/06/06 13:01:58  Reco  2/  1/0/ 16-2/ 2/ 0/ 3/ 0/   DD003      4327 078  21-593A7   8000025A  Contact was lost

这块已经挂了,还嫌400不厚道啊,连哪块盘有错都告诉你了,哈哈,从你贴的图看应该是起了2组RAID,至少是2组,新盘来了后把坏的拔了,用新的做REBUILD就可以
另外两个报错的找时间也给换了,省得后患


 心城 回复于:2006-06-28 13:17:53

谢谢,呵呵,他是说挂就挂,
B6000255  27/06/06 13:01:58  Reco  2/  1/0/ 16-2/ 2/ 0/ 3/ 0/   DD003      4327 078  21-593A7   8000025A  Contact was lost
一出问题,所有用户都被踢了出去,控制台也进不去。


 qingzhou 回复于:2006-06-28 13:34:07

引用:原帖由 心城 于 2006-6-28 13:17 发表
谢谢,呵呵,他是说挂就挂,
B6000255  27/06/06 13:01:58  Reco  2/  1/0/ 16-2/ 2/ 0/ 3/ 0/   DD003      4327 078  21-593A7   8000025A  Contact was lost
一出问题,所有用户都被踢了出去,控制台也进不去。 


ASP坏掉一般都有个过程,会先在SST里面报个Temp临时性错误,如果没有及时采取措施,就会在过段时间报Perm永久性错,这个就真正是挂了~~~ :em10:

赶快先做个全系统备份吧,GO SAVE —> 21 ,,,, :em15:


 心城 回复于:2006-06-28 13:48:09

谢谢qingzhou (轻舟) ,但在WRKPRB和STT里我都没有看到临时错误,如何预防请指教。


 心城 回复于:2006-06-28 13:49:47

是sst


 qingzhou 回复于:2006-06-28 14:11:42

引用:原帖由 心城 于 2006-6-28 13:48 发表
谢谢qingzhou (轻舟) ,但在WRKPRB和STT里我都没有看到临时错误,如何预防请指教。 


<系统运行中途AS400液晶面板突然报SRC:A6010255,真倒霉!>
http://bbs.chinaunix.net/viewthread.php?tid=570494&highlight=qingzhou


 心城 回复于:2006-06-28 14:21:47

qingzhou (轻舟),我没有查到有临时错误,是不是我的查法有问题,如何查?谢谢。


 qingzhou 回复于:2006-06-28 14:34:16

引用:原帖由 心城 于 2006-6-28 14:21 发表
qingzhou (轻舟),我没有查到有临时错误,是不是我的查法有问题,如何查?谢谢。 


SST查找错误信息的方法如下:

1、以QSECOFR登陆OS/400;
2、STRSST登陆SST;
3、选择1. Start a service tool 

                       System Service Tools (SST)                               
                                                                                
 Select one of the following:                                                   
                                                                                
      1. Start a service tool                                                   
      2. Work with active service tools                                         
      3. Work with disk units                                                   
      4. Work with diskette data recovery                                       
      5. Work with system partitions                                            
      6. Work with system capacity                                              
      7. Work with system security                                              
      8. Work with service tools user IDs                                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
 Selection                                                                      
      1                                                                         
                                                                                
 F3=Exit         F10=Command entry         F12=Cancel      

4、再选择1. Product activity log    
                          Start a Service Tool                                  
                                                                                
 Warning: Incorrect use of this service tool can cause damage                   
 to data in this system.  Contact your service representative                   
 for assistance.                                                                
                                                                                
 Select one of the following:                                                   
                                                                                
      1. Product activity log                                                   
      2. Trace Licensed Internal Code                                           
      3. Work with communications trace                                         
      4. Display/Alter/Dump                                                     
      5. Licensed Internal Code log                                             
      6. Main storage dump manager                                              
      7. Hardware service manager                                               
                                                                                
                                                                                
                                                                                
                                                                                
 Selection                                                                      
      1                                                                         
                                                                                
 F3=Exit         F12=Cancel         F16=SST menu                                
                                                                                
5、再选择1. Analyze log 

                             Product Activity Log                               
                                                                                
 Select one of the following:                                                   
                                                                                
      1. Analyze log                                                            
      2. Display or print by log ID                                             
      3. Change log sizes                                                       
      4. Work with removable media lifetime statistics                          
      5. Display or print removable media session statistics                    
      6. Reference code description                                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
 Selection                                                                      
      1                                                                         
                                                                                
 F3=Exit       F12=Cancel                                                       
                                                                                
6、Log选择1=all logs,From Date:修改为上次检查日期(比如:3个月前的今天),时间设长点以免遗漏错误信息。

                             Select Subsystem Data                              
                                                                                
 Type choices, press Enter.                                                     
                                                                                
   Log . . . . . . . . . .  1          1=All logs                               
                                       2=Processor                              
                                       3=Magnetic media                         
                                       4=Local work station                     
                                       5=Communications                         
                                       6=Power                                  
                                       7=Cryptography                           
                                       8=Licensed program                       
                                       9=Licensed Internal Code                 
                                                                                
   From:                                                                        
     Date . . . . . . . .   27/[color=Red]03[/color]/06   DD/MM/YY                                 
     Time . . . . . . . .   14:28:38   HH:MM:SS                                 
                                                                                
   To:                                                                          
     Date . . . . . . . .   28/06/06   DD/MM/YY                                 
     Time . . . . . . . .   14:28:38   HH:MM:SS                                 
                                                                                
 F3=Exit              F5=Refresh           F12=Cancel                           
                                                                                
7、按照默认参数直接回车:

                         Select Analysis Report Options                         
                                                                                
 Type choices, press Enter.                                                     
                                                                                
   Report type . . . . . . . . .   1   1=Display analysis, 2=Display summary,   
                                       3=Print options                          
   Optional entries to include:                                                 
     Informational . . . . . . .   Y   Y=Yes, N=No                              
     Statistic . . . . . . . . .   N   Y=Yes, N=No                              
                                                                                
   Reference code selection:                                                    
     Option  . . . . . . . . . .   1   1=Include, 2=Omit                        
     Reference codes                                                            
     *ALL                                                *ALL...                
                                                                                
   Device selection:                                                            
     Option  . . . . . . . . . .   1   1=Types, 2=Resource names                
     Device types or Resource names                                             
     *ALL                                                *ALL...                
                                                                                
                                                                                
                                                                                
 F3=Exit       F5=Refresh       F9=Sort by ...        F12=Cancel                
                                                                                

8、逐个查找错误信息:

                              Log Analysis Report                               
                                                                                
 From  . . :   27/03/06  14:28:38      To . . :   28/06/06  14:28:38            
                                                                                
 Type options, press Enter.                                                     
   5=Display report   6=Print report                                            
                                                                                
      System                                  Resource      Resource            
 Opt  Ref Code     Date      Time      Class  Name          Type                
      B0035410     27/06/06  10:29:54  Perm   CMN03         2793                
      B0035410     27/06/06  10:29:54  Perm   CMN03         2793                
      B0035410     27/06/06  10:29:54  Perm   CMN03         2793                
      B0035410     27/06/06  10:29:54  Perm   CMN03         2793                
      B0036890     27/06/06  10:29:55  Perm   CMN03         2793                
      B0035A54     27/06/06  10:29:56  Perm   CMN03         2793                
      B0036890     27/06/06  10:29:57  Perm   CMN03         2793                
      B0035A20     27/06/06  10:30:29  Perm   CMN03         2793                
      B0036890     27/06/06  10:32:36  Perm   CMN03         2793                
      B0036890     27/06/06  10:32:42  Perm   CMN03         2793                
                                                                                
                                                                         Bottom 
  F3=Exit                                                                       
  F11=View Description                        F12=Cancel


 心城 回复于:2006-06-28 14:44:49

非常感谢,我也是这样查的,这次问题之前没有任何错误提示,2月份IBM-MA还作过巡检,现在SST里面除了这次报错,与上次报错没有其他报错,唉,晕,突发问题吧。


 chts 回复于:2006-06-28 16:12:04

现在怎么不买MA了? 用 4327 , 你的机器应该还是比较新的!
换硬盘还是找专业的人来做吧。 你偶然做一次风险很大, 虽然并不难。


 居士 回复于:2006-06-28 20:27:41

一般你做了RAID保护后,一块硬盘故障系统可以照常运行(不会出现用户被退出),只是运行效率会变慢。如果坏的是load source盘,且没有做load source的mirror的话,系统重启后无法启动。

通过你提供的信息看,你是3个盘做为一个RAID组,坏了1个盘,另外2个盘自动解除RAID,不在保护状态。

更换故障硬盘后,做个rebuild就可以了。


 mhdc 回复于:2006-06-28 21:28:38

IBM有的服务是按次收费的,可以考虑。建议快换,可以在线换的,然后再加RAID。


 hahawang 回复于:2006-06-29 09:56:23

现在能做的就是等硬盘,你的另外两块盘没坏,抱错是因为他们没保护了。
因为RAID5最少要3块盘,而你的这组RAID只有3块,所以你现在拿掉坏盘也不能恢复保护。
耐心点,等盘吧!


 ixiao_116 回复于:2006-06-29 15:26:43

从你给的信息可以看出是DD003这块硬盘坏了。准备好硬盘
第一步  换硬盘
1 ,strsst  回车
输入dst的用户口令进入
2,选1,start a service tool
3,选7,hardware service manager
4,选8,device concureent...

device resource name    DD003
action to be performed :  1
time delay needed in minutes :01
其他默认  回车
1分钟后会看到一块硬盘灯狂闪,就是这块硬盘坏了,拔出,换掉

第二步  做rebuild
选3 ,work with dsikunits
选3, work with dsik units recovery
选6,rebuid disk unitdata
确认
按F5刷新,直到100%,完成。

[ 本帖最后由 ixiao_116 于 2006-6-29 15:32 编辑 ]


 qingzhou 回复于:2006-06-29 16:37:32

引用:原帖由 ixiao_116 于 2006-6-29 15:26 发表
从你给的信息可以看出是DD003这块硬盘坏了。准备好硬盘
第一步  换硬盘
1 ,strsst  回车
输入dst的用户口令进入
2,选1,start a service tool
3,选7,hardware service manager
4,选8,device concureent. ... 


辛苦了,,,写得简洁明了~~~

加入精华。


 hahawang 回复于:2006-06-29 16:46:54

等等!老兄。
硬盘拔的时候是热插拔,可你插的时后就少做了一步。危险啊!!


 qingzhou 回复于:2006-06-29 16:54:59

引用:原帖由 hahawang 于 2006-6-29 16:46 发表
等等!老兄。
硬盘拔的时候是热插拔,可你插的时后就少做了一步。危险啊!! 



欲知后事,请见下文分解~~~ :mrgreen:

<(原)在as400上更换硬盘>
http://bbs.chinaunix.net/viewthread.php?tid=362131&extra=page%3D5%26filter%3Ddigest


 心城 回复于:2006-06-29 17:18:32

谢谢,各位老大的解答。


 blogliou 回复于:2006-06-29 17:32:18

呵呵, 曾经公司搬家, 搬家后重新启动400, 一查发现一硬盘FAILED. 公司硬件工程师把硬盘重新插拔后, 再启动查OK, 也晕啊!


 suckstar 回复于:2006-07-02 15:16:15

肏,这也屏蔽,用水洗吧!


 semiwinter 回复于:2006-07-02 19:49:18

看个热闹,混个脸熟


 Anabble 回复于:2006-07-03 08:57:22

引用:原帖由 blogliou 于 2006-6-29 17:32 发表
呵呵, 曾经公司搬家, 搬家后重新启动400, 一查发现一硬盘FAILED. 公司硬件工程师把硬盘重新插拔后, 再启动查OK, 也晕啊! 



汗~:em06::em06:


 luck_jogger 回复于:2006-07-03 15:37:35

用strsst可以进行拆除硬盘及重新做rebuild的操作,而且时间还挺快的。




原文链接:http://bbs.chinaunix.net/viewthread.php?tid=781649
转载请注明作者名及原文出处



收藏本页到: