Announcement

Collapse

http://progeeking.com

See more
See less

Dropped Disk due to errors

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropped Disk due to errors

    Hi everyone,
    Here is an interesting scenario for dropped disk that I encountered last week not sure if that's the place for it.

    Production Database 11.2.0.3

    Firmware Version 2.8.0.0.0
    BM ODA

    We had 7-8 disks reporting error
    Code:
    ORA-27061: waiting for async I/Os failed
    and this was an ongoing issue for months and this is a known bug due to having an old version of firmware 2.8.0.0.0
    Tuesday morning the ASM alert log started reporting that disk 13 will be dropped.


    Code:
    WARNING: Disk 13 (HDD_E0_S13_718559432P1) in group 1 will be dropped in: (12960) secs on ASM inst 1 
    Tue Mar 28 09:11:58 2017 
    WARNING: Disk 13 (HDD_E0_S13_718559432P1) in group 1 will be dropped in: (12777) secs on ASM inst 1 
    Tue Mar 28 09:15:01 2017 
    WARNING: Disk 13 (HDD_E0_S13_718559432P1) in group 1 will be dropped in: (12594) secs on ASM inst 1 
    Tue Mar 28 09:18:04 2017 
    Once the disk was dropped an automatic rebalance started with default power 1 and I increased it to 6.
    Once it was completed it gave an error :


    Code:
    ORA-15041: diskgroup "DATA" space exhausted 
    Tue Mar 28 14:58:53 2017 
    NOTE: stopping process ARB0 
    NOTE: rebalance interrupted for group 1/0x5ce88489 (DATA) 
    This was due to having only 116 MB in disk 12.

    Attachment

    At this time I was collaborating with Oracle Support and they didn't want to dispatch a replacement disk because the issue was not with the disk but with the firmware version so they offered two possible solutions.
    First one was to drop disk 12 with NOFORCE option and add disk 13_p1 with FORCE add command and then run UNDROP command for disk 12.
    Second solution was to set the ASM_IMBALANCE_TOLERANCE parameter to 0

    Code:
    ALTER SYSTEM SET "_ASM_IMBALANCE_TOLERANCE"=0 SID='*' SCOPE=BOTH; 
    ALTER DISKGROUP DATA CHECK ALL REPAIR; 
    ALTER DISKGROUP DATA rebalance power 11; 
    alter diskgroup /*+ _OAK_AsmCookie */ DATA add disk '/dev/mapper/HDD_E0_S13_718559432P1 ' name HDD_E0_S13_718559432P1 force; 
    We went with the second solution but again once the rebalance completed it throwed the error ORA-15041 and shortly after,
    we hit another bug and we started getting the error ORA-1683 and ORA-1692 in the alert log which caused transactions to fail on the clinical tablespace.

    Code:
    ORA-1683: unable to extend index SHR_DEP2_CLINICAL.PK_PTA_CLINICAL_LINK partition PW93614 by 128 in tablespace SHR_DEP2_CLINICAL 
    ORA-1683: unable to extend index SHR_DEP2_CLINICAL.PK_PTA_CLINICAL_LINK partition PW93614 by 1024 in tablespace SHR_DEP2_CLINICAL 
    ORA-1683: unable to extend index SHR_DEP2_CLINICAL.PK_PTA_CLINICAL_LINK partition PW93614 by 128 in tablespace SHR_DEP2_CLINICAL 
    ORA-1683: unable to extend index SHR_DEP2_CLINICAL.PK_PTA_CLINICAL_LINK partition PW93614 by 1024 in tablespace SHR_DEP2_CLINICAL 
    We had plenty of physical space free 300-400GB and the Clinical tablespace had 30GB free + it could extend up to 30TB so we had no idea why it was acting like this.
    A decision was taken to bring the system offline and prepare for switch over to standby.
    Before a switch over we decided to just run add command and see what will happen
    Code:
    alter diskgroup /*+ _OAK_AsmCookie */ DATA add disk '/dev/mapper/HDD_E0_S13_718559432P1 ' name HDD_E0_S13_718559432P1 force; 
    The result was that the disk was added successfully to the ASM and rebalance was also completed without errors and disk 12 ended up with 80GB free like all the other disks.
    The system was brought back ONLINE to see if we will still get ORA-1683 and 1692 unable to extend tablespace but there was no trace of that issue

    Not sure if it will be useful to anyone but it was a fun situation to deal with and the solution was rather simple so don't over complicate a given situation more than you need to

    Thanks,
    Kris
Working...
X