Database Validation in Data Guard 12c

August 20, 2013, 6:13 am

≫ Next: Database 12c: What’s New with Data Pump? Lots.

≪ Previous: Oracle Database 12c: Network Recovery in RMAN

DGMGRL was never difficult to use; however, it was not providing enough information before switchover/failover to make sure it happened without issues. Even if “show configuration verbose” was indicating that everything had the SUCCESS status, switchover could lead to issues during role changes.

However, Data Guard Broker in #DB12c introduced a new command, VALIDATE DATABASE, which tremendously lightens the process of information gathering to make sure role change operations succeed. It is very helpful in the case of RAC database usage in Data Guard configuration.

Let’s see a test example of the command Data Guard configuration with a primary two instance RAC database and a standalone standby database.

DGMGRL> show configuration

Configuration - dg_d

  Protection Mode: MaxPerformance
  Databases:
  d  - Primary database
    d1 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

DGMGRL> show database d

Database - d

  Role:              PRIMARY
  Intended State:    TRANSPORT-ON
  Instance(s):
    d1
    d2

Database Status:
SUCCESS

DGMGRL> show database d1

Database - d1

  Role:              PHYSICAL STANDBY
  Intended State:    APPLY-ON
  Transport Lag:     0 seconds (computed 0 seconds ago)
  Apply Lag:         0 seconds (computed 0 seconds ago)
  Apply Rate:        205.00 KByte/s
  Real Time Query:   OFF
  Instance(s):
    d1

Database Status:
SUCCESS

The DG configuration has the SUCCESS status. Let’s validate the standby database to see if it is ready for switchover:

DGMGRL> validate database verbose d1

  Database Role:     Physical standby database
  Primary Database:  d

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Capacity Information:
    Database  Instances        Threads
    d         2                2
    d1        1                2
    Warning: the target standby has fewer instances than the
    primary database, this may impact application performance

  Temporary Tablespace File Information:
    d TEMP Files:   2
    d1 TEMP Files:  1

  Flashback Database Status:
    d:   Off
    d1:  Off

  Data file Online Move in Progress:
    d:   No
    d1:  No

  Standby Apply-Related Information:
    Apply State:      Running
    Apply Lag:        0 seconds
    Apply Delay:      0 minutes

  Transport-Related Information:
    Transport On:      Yes
    Gap Status:        No Gap
    Transport Lag:     0 seconds
    Transport Status:  Success

  Log Files Cleared:
    d Standby Redo Log Files:  Cleared
    d1 Online Redo Log Files:  Cleared

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups   Standby Redo Log Groups
              (d)                      (d1)
    1         2                        1
    2         2                        1

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups   Standby Redo Log Groups
              (d1)                     (d)
    1         2                        1
    2         2                        1

  Current Configuration Log File Sizes:
    Thread #   Smallest Online Redo      Smallest Standby Redo
               Log File Size             Log File Size
               (d)                       (d1)
    1          50 MBytes                 50 MBytes
    2          50 MBytes                 50 MBytes

  Future Configuration Log File Sizes:
    Thread #   Smallest Online Redo      Smallest Standby Redo
               Log File Size             Log File Size
               (d1)                      (d)
    1          50 MBytes                 50 MBytes
    2          50 MBytes                 50 MBytes

  Apply-Related Property Settings:
    Property                        d Value                  d1 Value
    DelayMins                       0                        0
    ApplyParallel                   AUTO                     AUTO

  Transport-Related Property Settings:
    Property                        d Value                  d1 Value
    LogXptMode                      ASYNC                    ASYNC
    Dependency                      <empty>                  <empty>
    DelayMins                       0                        0
    Binding                         optional                 optional
    MaxFailure                      0                        0
    MaxConnections                  1                        1
    ReopenSecs                      300                      300
    NetTimeout                      30                       30
    RedoCompression                 DISABLE                  DISABLE
    LogShipping                     ON                       ON

  Automatic Diagnostic Repository Errors:
    Error                       d        d1
    No logging operation        NO       NO
    Control file corruptions    NO       NO
    SRL Group Unavailable       NO       NO
    System data file missing    NO       NO
    System data file corrupted  NO       NO
    System data file offline    NO       NO
    User data file missing      NO       NO
    User data file corrupted    NO       NO
    User data file offline      NO       NO
    Block Corruptions found     NO       NO

As you can see, the database is ready for switchover. There is no lag, and the DG broker has gathered other important bits and pieces, like clearance of standby and redo logs, temp files (there seems to be a bug showing 2 tempfiles for the 2 instance RAC database – does it select data from gv$tempfile?), and the status of data file movements related to the new #db12c feature.

Let’s see what will happen if transfer to standby is turned off:

DGMGRL> edit database d set state='TRANSPORT-OFF';
Succeeded.
DGMGRL> validate database d1

Database Role:     Physical standby database
Primary Database:  d

Ready for Switchover:  No
Ready for Failover:    Yes (Primary Running)

...

The standby database is not ready. But what is the reason for the status? DG log has more information:

08/20/2013 13:45:28
Primary completed health check
EDIT DATABASE d SET STATE = TRANSPORT-OFF
08/20/2013 13:45:41
Switchover to d1 not possible, verification returned error ORA-16466
See database alert log for more details

The alert log doesn’t contain much information. There is only a command to disable standby destination, and the error description gives the following:

$ oerr ora 16466
16466, 0000, "invalid switchover target"
// *Cause:  The switchover target was not a valid, enabled, or active
//          standby database.
// *Action: Fix the problem in the switchover target and reissue the command

The issue is not clearly stated, but the Data Guard configuration should certainly be checked and fixed before a switchover operation.

I tested several different scenarios. One of them was to shut down the standby database the with abort clause (I could not even get results from “validate database d1″ since it was not available) and start it back up with disabled transport from the primary database. Data Guard command “show database d1″ indicated that lags were unknown (which is related to NULL values in v$dataguard_stats). However, “validate database d1″ was showing a constant Apply Lag of “286 days 11 hours 35 minutes 12 seconds”. It was not growing and stated the same thing until I enabled transport from primary. Where did it come from? Who knows…

DGMGRL> show database d1

Database - d1

  Role:              PHYSICAL STANDBY
  Intended State:    APPLY-ON
  Transport Lag:     (unknown)
  Apply Lag:         (unknown)
  Apply Rate:        0 Byte/s
  Real Time Query:   OFF
  Instance(s):
    d1

Database Status:
SUCCESS

DGMGRL> validate database d1

  Database Role:     Physical standby database
  Primary Database:  d

  Ready for Switchover:  No
  Ready for Failover:    Yes (Primary Running)

  Capacity Information:
    Database  Instances        Threads
    d         2                2
    d1        1                2
    Warning: the target standby has fewer instances than the
    primary database, this may impact application performance

  Temporary Tablespace File Information:
    d TEMP Files:   2
    d1 TEMP Files:  1

  Flashback Database Status:
    d:   Off
    d1:  Off

  Standby Apply-Related Information:
    Apply State:      Running
    Apply Lag:        286 days 11 hours 35 minutes 12 seconds
    Apply Delay:      0 minutes

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups   Standby Redo Log Groups
              (d)                      (d1)
    1         2                        1
    2         2                        1

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups   Standby Redo Log Groups
              (d1)                     (d)
    1         2                        1
    2         2                        1

Lag came back to 0 seconds when transport was enabled again.

There are definitely areas that should be improved to provide a clear understanding of VALIDATE DATABASE output and messages in drc (and alert…) logs, but this new feature improved and simplified information gathering for Data Guard configuration. This change will allow us to save time during possible role change operations.

Happy Data Guard’ing!

↧

Database 12c: What’s New with Data Pump? Lots.

August 20, 2013, 6:14 am

≫ Next: RAC Attack at Oracle OpenWorld 2013 (Operation Ninja)

≪ Previous: Database Validation in Data Guard 12c

This article will discuss some of the new stuff on board with Oracle Database 12c and one of our favorite tools: data pump. When Oracle Data Pump hit the streets, there was a veritable gold mine of opportunities to play with the new toy. Seasoned presenters such as yours truly embraced the new product. It was a nice marriage for attendees at tech presentations on this topic. They were hungry for new stuff, and these sessions provided fast-tracked learning. Sounds to me like a dream come true for all.

We will look at the following new parameters:

LOGGING
DISABLE_ARCHIVE_LOGGING (part of the TRANSFORM parameter)
ENCRYPTION_PWD_PROMPT
COMPRESSION_ALGORITHM

Let’s get started…

LOGGING

DBAs and other technical personnel thirst for answers to nagging questions:

How long is this going to take?
Does the time-to-market to complete a job grow at the same rate as the data volume?
Can we predict how long work will take based on past experiences?

With Oracle Database 12c, some of these questions can be addressed by a new parameter introduced in Oracle Data Pump – LOGGING. This command-line parameter can have four values:

NONE: No timestamp information is displayed. (This is the default.)
STATUS: Timestamp messages on status are displayed.
LOGFILE: Same as STATUS, but only displayed for logfile messages.
ALL: A combination of STATUS and LOGFILE.

Recently, we needed to copy a schema from development to production for a client, and one of the approaches we considered was data pump exp/imp. While the jobs were running, we leveraged one of the DBA’s best friends, V$SESSION_LONGOPS. Coupled with the information displayed based on the setting for the LOGGING command-line parameter, we have more information at our fingertips. Is this a big deal on its own? Some may say it is, and to those who don’t: Remember that every enhancement rolled together with others becomes a big deal. Next is a quick look at a data pump export job with LOGGING set to ALL:

oracle@dlabvm46.dlab.pythian.com--> (pythian) ** Master **
/home/oracle> expdp full=y dumpfile=pythian_logging logtime=all

Export: Release 12.1.0.1.0 - Production on Fri Aug 16 18:11:24 2013

Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.

Username: / as sysdba

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Database Directory Object has defaulted to: "DPDUMP".
16-AUG-13 18:11:35.115: Starting "SYS"."SYS_EXPORT_FULL_01":  /******** AS SYSDBA full=y dumpfile=pythian_logging logtime=all 
16-AUG-13 18:11:36.703: Estimate in progress using BLOCKS method...
16-AUG-13 18:11:40.411: Processing object type DATABASE_EXPORT/EARLY_OPTIONS/VIEWS_AS_TABLES/TABLE_DATA
16-AUG-13 18:11:41.966: Processing object type DATABASE_EXPORT/NORMAL_OPTIONS/TABLE_DATA
16-AUG-13 18:11:43.494: Processing object type DATABASE_EXPORT/NORMAL_OPTIONS/VIEWS_AS_TABLES/TABLE_DATA
16-AUG-13 18:11:48.396: Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
16-AUG-13 18:11:48.594: Total estimation using BLOCKS method: 2.890 MB
16-AUG-13 18:11:49.255: Processing object type DATABASE_EXPORT/PRE_SYSTEM_IMPCALLOUT/MARKER
16-AUG-13 18:11:49.269: Processing object type DATABASE_EXPORT/PRE_INSTANCE_IMPCALLOUT/MARKER
16-AUG-13 18:11:49.489: Processing object type DATABASE_EXPORT/TABLESPACE
16-AUG-13 18:11:49.812: Processing object type DATABASE_EXPORT/PROFILE
16-AUG-13 18:11:49.855: Processing object type DATABASE_EXPORT/SYS_USER/USER
16-AUG-13 18:11:49.888: Processing object type DATABASE_EXPORT/SCHEMA/USER
16-AUG-13 18:11:49.957: Processing object type DATABASE_EXPORT/ROLE
16-AUG-13 18:11:49.991: Processing object type DATABASE_EXPORT/RADM_FPTM
16-AUG-13 18:11:50.471: Processing object type DATABASE_EXPORT/GRANT/SYSTEM_GRANT/PROC_SYSTEM_GRANT
16-AUG-13 18:11:50.775: Processing object type DATABASE_EXPORT/SCHEMA/GRANT/SYSTEM_GRANT
16-AUG-13 18:11:50.786: Processing object type DATABASE_EXPORT/SCHEMA/ROLE_GRANT
16-AUG-13 18:11:50.790: Processing object type DATABASE_EXPORT/SCHEMA/DEFAULT_ROLE
16-AUG-13 18:11:50.796: Processing object type DATABASE_EXPORT/SCHEMA/ON_USER_GRANT
16-AUG-13 18:11:50.858: Processing object type DATABASE_EXPORT/SCHEMA/TABLESPACE_QUOTA
16-AUG-13 18:11:50.887: Processing object type DATABASE_EXPORT/RESOURCE_COST
16-AUG-13 18:11:50.962: Processing object type DATABASE_EXPORT/TRUSTED_DB_LINK
16-AUG-13 18:11:51.030: Processing object type DATABASE_EXPORT/DIRECTORY/DIRECTORY
16-AUG-13 18:12:11.437: Processing object type DATABASE_EXPORT/SYSTEM_PROCOBJACT/PRE_SYSTEM_ACTIONS/PROCACT_SYSTEM
16-AUG-13 18:12:20.503: Processing object type DATABASE_EXPORT/SYSTEM_PROCOBJACT/PROCOBJ
16-AUG-13 18:12:21.512: Processing object type DATABASE_EXPORT/SYSTEM_PROCOBJACT/POST_SYSTEM_ACTIONS/PROCACT_SYSTEM
16-AUG-13 18:12:23.077: Processing object type DATABASE_EXPORT/SCHEMA/PROCACT_SCHEMA
16-AUG-13 18:12:43.930: Processing object type DATABASE_EXPORT/EARLY_OPTIONS/VIEWS_AS_TABLES/TABLE
16-AUG-13 18:12:48.049: Processing object type DATABASE_EXPORT/EARLY_POST_INSTANCE_IMPCALLOUT/MARKER
16-AUG-13 18:12:53.759: Processing object type DATABASE_EXPORT/NORMAL_OPTIONS/TABLE
16-AUG-13 18:13:26.664: Processing object type DATABASE_EXPORT/NORMAL_OPTIONS/VIEWS_AS_TABLES/TABLE
16-AUG-13 18:13:50.085: Processing object type DATABASE_EXPORT/NORMAL_POST_INSTANCE_IMPCALLOU/MARKER
16-AUG-13 18:13:55.129: Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE
16-AUG-13 18:14:08.783: Processing object type DATABASE_EXPORT/SCHEMA/TABLE/INDEX/INDEX
16-AUG-13 18:14:12.618: Processing object type DATABASE_EXPORT/SCHEMA/TABLE/CONSTRAINT/CONSTRAINT
16-AUG-13 18:14:12.682: Processing object type DATABASE_EXPORT/SCHEMA/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
16-AUG-13 18:14:13.987: Processing object type DATABASE_EXPORT/SCHEMA/TABLE/CONSTRAINT/REF_CONSTRAINT
16-AUG-13 18:14:17.118: Processing object type DATABASE_EXPORT/SCHEMA/TABLE/STATISTICS/TABLE_STATISTICS
16-AUG-13 18:14:17.141: Processing object type DATABASE_EXPORT/STATISTICS/MARKER
16-AUG-13 18:14:32.021: Processing object type DATABASE_EXPORT/FINAL_POST_INSTANCE_IMPCALLOUT/MARKER
16-AUG-13 18:14:33.141: Processing object type DATABASE_EXPORT/SCHEMA/POST_SCHEMA/PROCOBJ
16-AUG-13 18:14:38.490: Processing object type DATABASE_EXPORT/SCHEMA/POST_SCHEMA/PROCACT_SCHEMA
16-AUG-13 18:14:38.911: Processing object type DATABASE_EXPORT/AUDIT_UNIFIED/AUDIT_POLICY_ENABLE
16-AUG-13 18:14:38.977: Processing object type DATABASE_EXPORT/AUDIT
16-AUG-13 18:14:39.201: Processing object type DATABASE_EXPORT/POST_SYSTEM_IMPCALLOUT/MARKER
16-AUG-13 18:14:41.683: . . exported "SYS"."KU$_USER_MAPPING_VIEW"               6.054 KB      36 rows
16-AUG-13 18:14:42.952: . . exported "ORDDATA"."ORDDCM_DOCS"                     252.9 KB       9 rows
16-AUG-13 18:14:43.235: . . exported "LBACSYS"."OLS$AUDIT_ACTIONS"               5.734 KB       8 rows
16-AUG-13 18:14:43.382: . . exported "LBACSYS"."OLS$DIP_EVENTS"                  5.515 KB       2 rows
16-AUG-13 18:14:43.400: . . exported "LBACSYS"."OLS$INSTALLATIONS"               6.937 KB       2 rows
16-AUG-13 18:14:43.444: . . exported "LBACSYS"."OLS$PROPS"                       6.210 KB       5 rows
16-AUG-13 18:14:43.487: . . exported "SYS"."DAM_CONFIG_PARAM$"                   6.507 KB      14 rows
16-AUG-13 18:14:43.529: . . exported "SYS"."TSDP_PARAMETER$"                     5.929 KB       1 rows
16-AUG-13 18:14:43.570: . . exported "SYS"."TSDP_POLICY$"                        5.898 KB       1 rows
16-AUG-13 18:14:43.618: . . exported "SYS"."TSDP_SUBPOL$"                        6.304 KB       1 rows
16-AUG-13 18:14:43.692: . . exported "SYSTEM"."REDO_DB"                          23.42 KB       1 rows
16-AUG-13 18:14:43.962: . . exported "WMSYS"."WM$ENV_VARS$"                      6.054 KB       5 rows
16-AUG-13 18:14:44.042: . . exported "WMSYS"."WM$EVENTS_INFO$"                   5.789 KB      12 rows
16-AUG-13 18:14:44.077: . . exported "WMSYS"."WM$HINT_TABLE$"                    9.429 KB      75 rows
16-AUG-13 18:14:44.124: . . exported "WMSYS"."WM$NEXTVER_TABLE$"                 6.351 KB       1 rows
16-AUG-13 18:14:44.170: . . exported "WMSYS"."WM$VERSION_HIERARCHY_TABLE$"       5.960 KB       1 rows
16-AUG-13 18:14:44.225: . . exported "WMSYS"."WM$WORKSPACES_TABLE$"              12.08 KB       1 rows
16-AUG-13 18:14:44.274: . . exported "WMSYS"."WM$WORKSPACE_PRIV_TABLE$"          6.539 KB       8 rows
16-AUG-13 18:14:44.281: . . exported "LBACSYS"."OLS$AUDIT"                           0 KB       0 rows
16-AUG-13 18:14:44.290: . . exported "LBACSYS"."OLS$COMPARTMENTS"                    0 KB       0 rows
16-AUG-13 18:14:44.296: . . exported "LBACSYS"."OLS$DIP_DEBUG"                       0 KB       0 rows
16-AUG-13 18:14:44.302: . . exported "LBACSYS"."OLS$GROUPS"                          0 KB       0 rows
16-AUG-13 18:14:44.308: . . exported "LBACSYS"."OLS$LAB"                             0 KB       0 rows
16-AUG-13 18:14:44.341: . . exported "LBACSYS"."OLS$LEVELS"                          0 KB       0 rows
16-AUG-13 18:14:44.351: . . exported "LBACSYS"."OLS$POL"                             0 KB       0 rows
16-AUG-13 18:14:44.359: . . exported "LBACSYS"."OLS$POLICY_ADMIN"                    0 KB       0 rows
16-AUG-13 18:14:44.366: . . exported "LBACSYS"."OLS$POLS"                            0 KB       0 rows
16-AUG-13 18:14:44.370: . . exported "LBACSYS"."OLS$POLT"                            0 KB       0 rows
16-AUG-13 18:14:44.410: . . exported "LBACSYS"."OLS$PROFILE"                         0 KB       0 rows
16-AUG-13 18:14:44.417: . . exported "LBACSYS"."OLS$PROFILES"                        0 KB       0 rows
16-AUG-13 18:14:44.424: . . exported "LBACSYS"."OLS$PROG"                            0 KB       0 rows
16-AUG-13 18:14:44.431: . . exported "LBACSYS"."OLS$SESSINFO"                        0 KB       0 rows
16-AUG-13 18:14:44.438: . . exported "LBACSYS"."OLS$USER"                            0 KB       0 rows
16-AUG-13 18:14:44.445: . . exported "LBACSYS"."OLS$USER_COMPARTMENTS"               0 KB       0 rows
16-AUG-13 18:14:44.452: . . exported "LBACSYS"."OLS$USER_GROUPS"                     0 KB       0 rows
16-AUG-13 18:14:44.458: . . exported "LBACSYS"."OLS$USER_LEVELS"                     0 KB       0 rows
16-AUG-13 18:14:44.466: . . exported "SYS"."AUD$"                                    0 KB       0 rows
16-AUG-13 18:14:44.472: . . exported "SYS"."DAM_CLEANUP_EVENTS$"                     0 KB       0 rows
16-AUG-13 18:14:44.480: . . exported "SYS"."DAM_CLEANUP_JOBS$"                       0 KB       0 rows
16-AUG-13 18:14:44.486: . . exported "SYS"."TSDP_ASSOCIATION$"                       0 KB       0 rows
16-AUG-13 18:14:44.494: . . exported "SYS"."TSDP_CONDITION$"                         0 KB       0 rows
16-AUG-13 18:14:44.500: . . exported "SYS"."TSDP_FEATURE_POLICY$"                    0 KB       0 rows
16-AUG-13 18:14:44.507: . . exported "SYS"."TSDP_PROTECTION$"                        0 KB       0 rows
16-AUG-13 18:14:44.513: . . exported "SYS"."TSDP_SENSITIVE_DATA$"                    0 KB       0 rows
16-AUG-13 18:14:44.518: . . exported "SYS"."TSDP_SENSITIVE_TYPE$"                    0 KB       0 rows
16-AUG-13 18:14:44.522: . . exported "SYS"."TSDP_SOURCE$"                            0 KB       0 rows
16-AUG-13 18:14:44.529: . . exported "SYSTEM"."REDO_LOG"                             0 KB       0 rows
16-AUG-13 18:14:44.534: . . exported "WMSYS"."WM$BATCH_COMPRESSIBLE_TABLES$"         0 KB       0 rows
16-AUG-13 18:14:44.540: . . exported "WMSYS"."WM$CONSTRAINTS_TABLE$"                 0 KB       0 rows
16-AUG-13 18:14:44.544: . . exported "WMSYS"."WM$CONS_COLUMNS$"                      0 KB       0 rows
16-AUG-13 18:14:44.549: . . exported "WMSYS"."WM$LOCKROWS_INFO$"                     0 KB       0 rows
16-AUG-13 18:14:44.554: . . exported "WMSYS"."WM$MODIFIED_TABLES$"                   0 KB       0 rows
16-AUG-13 18:14:44.559: . . exported "WMSYS"."WM$MP_GRAPH_WORKSPACES_TABLE$"         0 KB       0 rows
16-AUG-13 18:14:44.566: . . exported "WMSYS"."WM$MP_PARENT_WORKSPACES_TABLE$"        0 KB       0 rows
16-AUG-13 18:14:44.571: . . exported "WMSYS"."WM$NESTED_COLUMNS_TABLE$"              0 KB       0 rows
16-AUG-13 18:14:44.579: . . exported "WMSYS"."WM$REMOVED_WORKSPACES_TABLE$"          0 KB       0 rows
16-AUG-13 18:14:44.590: . . exported "WMSYS"."WM$RESOLVE_WORKSPACES_TABLE$"          0 KB       0 rows
16-AUG-13 18:14:44.597: . . exported "WMSYS"."WM$RIC_LOCKING_TABLE$"                 0 KB       0 rows
16-AUG-13 18:14:44.604: . . exported "WMSYS"."WM$RIC_TABLE$"                         0 KB       0 rows
16-AUG-13 18:14:44.611: . . exported "WMSYS"."WM$RIC_TRIGGERS_TABLE$"                0 KB       0 rows
16-AUG-13 18:14:44.618: . . exported "WMSYS"."WM$UDTRIG_DISPATCH_PROCS$"             0 KB       0 rows
16-AUG-13 18:14:44.624: . . exported "WMSYS"."WM$UDTRIG_INFO$"                       0 KB       0 rows
16-AUG-13 18:14:44.631: . . exported "WMSYS"."WM$VERSION_TABLE$"                     0 KB       0 rows
16-AUG-13 18:14:44.639: . . exported "WMSYS"."WM$VT_ERRORS_TABLE$"                   0 KB       0 rows
16-AUG-13 18:14:44.645: . . exported "WMSYS"."WM$WORKSPACE_SAVEPOINTS_TABLE$"        0 KB       0 rows
16-AUG-13 18:14:47.053: . . exported "SYSTEM"."SCHEDULER_JOB_ARGS"               8.640 KB       4 rows
16-AUG-13 18:14:48.309: . . exported "SYSTEM"."SCHEDULER_PROGRAM_ARGS"           10.18 KB      22 rows
16-AUG-13 18:14:49.731: . . exported "SYS"."AUDTAB$TBS$FOR_EXPORT"               5.929 KB       2 rows
16-AUG-13 18:14:51.556: . . exported "SYS"."NACL$_ACE_EXP"                       9.906 KB       1 rows
16-AUG-13 18:14:52.456: . . exported "SYS"."NACL$_HOST_EXP"                      6.890 KB       1 rows
16-AUG-13 18:14:53.929: . . exported "WMSYS"."WM$EXP_MAP"                        7.695 KB       3 rows
16-AUG-13 18:14:54.029: . . exported "SYS"."DBA_SENSITIVE_DATA"                      0 KB       0 rows
16-AUG-13 18:14:54.036: . . exported "SYS"."DBA_TSDP_POLICY_PROTECTION"              0 KB       0 rows
16-AUG-13 18:14:54.042: . . exported "SYS"."FGA_LOG$FOR_EXPORT"                      0 KB       0 rows
16-AUG-13 18:14:54.048: . . exported "SYS"."NACL$_WALLET_EXP"                        0 KB       0 rows
16-AUG-13 18:14:54.177: . . exported "SCOTT"."DEPT"                                  6 KB       4 rows
16-AUG-13 18:14:54.225: . . exported "SCOTT"."EMP"                                8.75 KB      14 rows
16-AUG-13 18:14:54.267: . . exported "SCOTT"."SALGRADE"                          5.929 KB       5 rows
16-AUG-13 18:14:54.274: . . exported "SCOTT"."BONUS"                                 0 KB       0 rows
16-AUG-13 18:14:56.693: Master table "SYS"."SYS_EXPORT_FULL_01" successfully loaded/unloaded
16-AUG-13 18:14:56.709: ******************************************************************************
16-AUG-13 18:14:56.710: Dump file set for SYS.SYS_EXPORT_FULL_01 is:
16-AUG-13 18:14:56.715:   /u01/app/oracle/dpdump/pythian/pythian_logging.dmp
16-AUG-13 18:14:56.749: Job "SYS"."SYS_EXPORT_FULL_01" successfully completed at Fri Aug 16 18:14:56 2015 elapsed 0 00:03:25

The next snippet from a PYTHIAN user export offers an idea of where the power of this parameter may lie – discovering exactly where time is being spent for large and small schema objects:

16-AUG-13 18:28:12.983: Processing object type SCHEMA_EXPORT/STATISTICS/MARKER
16-AUG-13 18:30:15.652: . . exported "PYTHIAN"."PMAST"                           2.384 GB 23252224 rows
16-AUG-13 18:30:16.388: . . exported "PYTHIAN"."LOC"                             21.01 MB  199999 rows
16-AUG-13 18:30:18.194: Master table "SYS"."SYS_EXPORT_SCHEMA_01" successfully loaded/unloaded

DISABLE_ARCHIVE_LOGGING

This is one of the many options available with the TRANSFORM parameter to data pump import. This parameter may indeed be a dream come true for those very large datasets. Sometimes, the archived redo generated by import detracts from its speed and leaves many wondering why archived redo needs to be generated. Some of the more familiar TRANSFORM options that have been around since the dawn of the product are:

OID is used to force the assignment of new IDs for objects in the export file and not to attempt to reuse IDs during the import phase. The OID in the export file may clash with an ID in an existing object in the target schema, causing the object to be skipped.
SEGMENT_ATTRIBUTES is used to permit the placement of objects in a different tablespace from where they were exported. The physical, storage, and logging attributes of objects are ignored, and they inherit the characteristics as set for the target schema(s).
PCTSPACE is specified as a multiplier to be used for object extent requests and datafile sizes.

The DISABLE_ARCHIVE_LOGGING can be set globally or for indexes and/or tables tables. If set to Y, the logging attributes of the specified target are altered before it is imported. Then, they are reset to their original characteristics when the work completes. The parameter passed on the command-line can have three values:

transform=disable_archive_logging:Y
transform=disable_archive_logging:Y:table
transform=disable_archive_logging:Y:index

All import activities are logged unless one of the three options listed above is coded in the call to data pump import. The following listing illustrates the usage and output to completely disable a generation of archived redo.

 
oracle@dlabvm46.dlab.pythian.com--> (pythian) ** Master **
/u01/app/oracle/dpdump/pythian> impdp dumpfile=pythian.dmp table_exists_action=append schemas=pythian transform=disable_archive_logging:Y

Import: Release 12.1.0.1.0 - Production on Sun Aug 18 05:47:14 2013

Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.

Username: / as sysdba

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Database Directory Object has defaulted to: "DPDUMP".
Master table "SYS"."SYS_IMPORT_SCHEMA_03" successfully loaded/unloaded
Starting "SYS"."SYS_IMPORT_SCHEMA_03":  /******** AS SYSDBA dumpfile=pythian.dmp table_exists_action=append schemas=pythian transform=disable_archive_logging:Y 
Processing object type SCHEMA_EXPORT/USER
ORA-31684: Object type USER:"PYTHIAN" already exists
Processing object type SCHEMA_EXPORT/SYSTEM_GRANT
Processing object type SCHEMA_EXPORT/ROLE_GRANT
Processing object type SCHEMA_EXPORT/DEFAULT_ROLE
Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA
Processing object type SCHEMA_EXPORT/TABLE/TABLE
Table "PYTHIAN"."LOC" exists. Data will be appended to existing table but all dependent metadata will be skipped due to table_exists_action of append
Table "PYTHIAN"."PMAST" exists. Data will be appended to existing table but all dependent metadata will be skipped due to table_exists_action of append
Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA

ENCRYPTION_PWD_PROMPT

As we have heard so many times but for some reason seem to experience difficulty putting into practice: “Thou shalt not enter an encryption password on the command-line.” This new-fangled parameter can force entry of this parameter manually when prompted. A data pump export of PYTHIAN.LOC with this parameter set to YES would proceed as follows:

oracle@dlabvm46.dlab.pythian.com--> (pythian) ** Master **
/u01/app/oracle/dpdump/pythian> expdp dumpfile=pythian_loc.dmp tables=pythian.loc encryption_pwd_prompt=yes

Export: Release 12.1.0.1.0 - Production on Sun Aug 18 05:55:04 2013

Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.

Username: / as sysdba

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

Encryption Password:
Database Directory Object has defaulted to: "DPDUMP".
Starting "SYS"."SYS_EXPORT_TABLE_01":  /******** AS SYSDBA dumpfile=pythian_loc.dmp tables=pythian.loc encryption_pwd_prompt=yes
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 50 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
. . exported "PYTHIAN"."LOC"                             42.00 MB  399998 rows
Master table "SYS"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYS.SYS_EXPORT_TABLE_01 is:
/u01/app/oracle/dpdump/pythian/pythian_loc.dmp
Job "SYS"."SYS_EXPORT_TABLE_01" successfully completed at Sun Aug 18 05:55:51 2013 elapsed 0 00:00:41

The ensuing import of that very same export would resemble the following (mistyped password entry deliberate):

oracle@dlabvm46.dlab.pythian.com--> (pythian) ** Master **
/u01/app/oracle/dpdump/pythian> impdp dumpfile=pythian_loc.dmp table_exists_action=replace tables=pythian.loc encryption_pwd_prompt=yes

Import: Release 12.1.0.1.0 - Production on Sun Aug 18 05:59:33 2013

Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.

Username: / as sysdba

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

Encryption Password: 
Database Directory Object has defaulted to: "DPDUMP".
ORA-39002: invalid operation
ORA-39176: Encryption password is incorrect.

It would have been nice if it had asked for encryption password twice on export. Then it may not have been so “easy” for me to forget. Let’s try that again…

oracle@dlabvm46.dlab.pythian.com--> (pythian) ** Master **
/u01/app/oracle/dpdump/pythian> impdp dumpfile=pythian_loc.dmp table_exists_action=replace tables=pythian.loc encryption_pwd_prompt=yes

Import: Release 12.1.0.1.0 - Production on Sun Aug 18 06:06:19 2013

Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.

Username: / as sysdba

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

Encryption Password: 
Database Directory Object has defaulted to: "DPDUMP".
Master table "SYS"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded
Starting "SYS"."SYS_IMPORT_TABLE_01":  /******** AS SYSDBA dumpfile=pythian_loc.dmp table_exists_action=replace tables=pythian.loc encryption_pwd_prompt=yes 
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "PYTHIAN"."LOC"                             42.00 MB  399998 rows
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
Job "SYS"."SYS_IMPORT_TABLE_01" successfully completed at Sun Aug 18 06:06:55 2013 elapsed 0 00:00:30

One of the first questions I asked myself is how does/could this possibly work for unattended jobs where the parameter value is passed as YES:

oracle@dlabvm46.dlab.pythian.com--> (pythian) ** Master **
/u01/app/oracle/dpdump/pythian> nohup impdp parfile=locimp.parfile &
[1] 24311

oracle@dlabvm46.dlab.pythian.com--> (pythian) ** Master **
/u01/app/oracle/dpdump/pythian> nohup: ignoring input and appending output to `nohup.out'

[1]+  Exit 1                  nohup impdp parfile=locimp.parfile

oracle@dlabvm46.dlab.pythian.com--> (pythian) ** Master **
/u01/app/oracle/dpdump/pythian> cat n*out

Import: Release 12.1.0.1.0 - Production on Sun Aug 18 06:09:33 2013

Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

Encryption Password: 
ORA-39001: invalid argument value
ORA-39207: Value NULL is invalid for parameter ENCRYPTION_PASSWORD.

Maybe I’ll give Chuck a call and ask him to submit an enhancement request.

COMPRESSION_ALGORITHM

This enhancement is all about trade-offs, measuring resource consumption against compression ratio. Like many life experiences, you cannot have the two together. In other words, the higher the compression ratio, the more CPUs are required to pull it off. The values for this parameter are as follows:

BASIC offers the most efficient usage of CPU and effective compression ratio; it is deemed to be applicable to most sessions.
LOW favors size in comparison to CPU and yields a larger file size with a lower compression ratio.
MEDIUM is similar to BASIC. It uses a different algorithm as it performs the work at hand.
HIGH is a good choice when the size of the export file is the determining factor; on the source site, it yields the smallest file, but it could be the most CPU-intensive.

The Advanced Compression option must be installed to use this data pump parameter. The familiar dialogue with data pump export goes as follows:

oracle@dlabvm46.dlab.pythian.com--> (pythian) ** Master **
/home/oracle> expdp dumpfile=pythian_pmast.dmp tables=pythian.pmast compression_algorithm=high

Export: Release 12.1.0.1.0 - Production on Mon Aug 19 13:58:43 2013

Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.

Username: / as sysdba

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Starting "SYS"."SYS_EXPORT_TABLE_01":  /******** AS SYSDBA dumpfile=pythian_pmast.dmp tables=pythian.pmast compression_algorithm=high 
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 11.03 GB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
. . exported "PYTHIAN"."PMAST"                           9.536 GB 93008896 rows
Master table "SYS"."SYS_EXPORT_TABLE_04" successfully loaded/unloaded
******************************************************************************
Dump file set for SYS.SYS_EXPORT_TABLE_04 is:
  /u01/app/oracle/dpdump/pythian/pythian_pmastlc.dmp
Job "SYS"."SYS_EXPORT_TABLE_04" successfully completed at Mon Aug 19 14:25:45 2013 elapsed 0 00:08:20

real    8m31.600s
user    0m0.017s
sys     0m0.026s

Export: Release 12.1.0.1.0 - Production on Mon Aug 19 14:25:49 2013

Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.

Username: / as sysdba

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Database Directory Object has defaulted to: "DPDUMP".
Starting "SYS"."SYS_EXPORT_TABLE_04":  /******** AS SYSDBA dumpfile=pythian_pmasthc.dmp tables=pythian.pmast compression_algorithm=high 
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 11.03 GB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
. . exported "PYTHIAN"."PMAST"                           9.536 GB 93008896 rows
Master table "SYS"."SYS_EXPORT_TABLE_04" successfully loaded/unloaded
******************************************************************************
Dump file set for SYS.SYS_EXPORT_TABLE_04 is:
  /u01/app/oracle/dpdump/pythian/pythian_pmasthc.dmp
Job "SYS"."SYS_EXPORT_TABLE_04" successfully completed at Mon Aug 19 14:36:31 2013 elapsed 0 00:07:52

real    10m45.764s
user    0m0.018s
sys     0m0.028s

oracle@dlabvm46.dlab.pythian.com--> (pythian) ** Master **
/home/oracle> ll /u01/app/oracle/dpdump/pythian
total 20019600
-rw-r-----. 1 oracle oinstall 10240008192 Aug 19 14:36 pythian_pmasthc.dmp
-rw-r-----. 1 oracle oinstall 10240008192 Aug 19 14:25 pythian_pmastlc.dmp

Not surprisingly, since a relatively small amount of data as exported (a mere 11.03Gb of data), the difference in the export file sizes is not dramatic. The fact that the export with high compression took close to 25% longer and consumed close to 8% more “sys” time is not dramatic, but it offers a flavor of what this parameter can do for you.

Wrap-up

Many fondly remember the arrival of data pump with release 10gR1. We tingled with this new-fangled tool to allow us to perform the ever-popular logical backup of Oracle database. We discovered the foundation of one of the underlying PL/SQL objects called DBMS_DATAPUMP and had an absolute field day as we rappelled into the depths of the product. Bear in mind: The trip is not over.

↧

RAC Attack at Oracle OpenWorld 2013 (Operation Ninja)

August 27, 2013, 7:22 am

≫ Next: Proactive FRA Monitoring Using OEM Metrics

≪ Previous: Database 12c: What’s New with Data Pump? Lots.

I am happy to announce that, thanks to all the volunteers’ hard work and Oracle Technology Network‘s support, RAC Attack is going to be part of the Oracle OpenWorld 2013 conference in less than a month’s time. Traditionally, RAC Attack is a meeting place for many technology enthusiasts and big names in the world of Oracle technology. I am sure that this year will not be an exception. This year, RAC Attack will have a prime location - Moscone South in the OTN Lounge - and will take place Tuesday, September 24th and Wednesday, September 25th from 10 am – 2 pm.

Inline image 2

We will have a friendly team of RAC Attack Ninjas on-site ready to break down any RAC-related technical issue. You will recognize RAC Attack’s Ninjas easily. They will wear bandannas like the one in the picture. If you see a person with such a bandanna, do not hesitate to stop him and have a conversation.

Besides the great networking opportunity, RAC Attack will once again offer you a unique opportunity to install an Oracle RAC 12c on your laptop in the least possible time with the help of a great team of volunteers. At the end of the activity, you will have a ready-to-go playground to get hands-on experience with the latest Oracle database version. This year, we have great news for the Oracle community: RAC Attack is now based on a complete Oracle technology stack. It includes Oracle Virtual Box 4.2, Oracle Linux 6.4, and the latest and greatest Oracle database 12c. Participants are advised to have the following:

A laptop with
- 64 bit OS
- 8 GB of RAM
- 50 GB of a free space
We suggest you pre-download Oracle 12c for 64-bit Linux (V38500-01, V38501-01) from e-Delivery‘s web site on your laptop for the following reasons:
- The WiFi network’s speed at large conferences like OpenWold isn’t great.
- We may not be able to give you a copy of the Oracle 12c on-site because of legal licencing implications. (We will discuss possible options with Oracle’s legal team.)

Even if you don’t have a powerful laptop with you at OpenWorld or don’t have much time to get the installation completed, come along and have a RAC-related technology discussion with our great team of Ninjas. There are many different activities that you can get involved in at RAC Attack, including the following:

Oracle 12c RAC Install Hands On
- Bring your laptop and in 3-4 hours, you will have your sandbox environment up and running.
Oracle 12c RAC Install Information Support
- In 15-30 minutes, talk through the Oracle 12c RAC installation procedure with us and get it installed at home after the conference.
Oracle 12c Single Node Install
- If you don’t have enough time or hardware resources, why not implement an Oracle 12c database in a single node configuration?
RAC Advice
- If you do have RAC-related questions, we would be happy to share our thoughts on it or pass the question to the experts who can answer them in no time.
RAC Advanced
- If you are keen to learn as many RAC details as possible or are an advanced RAC administrator already, then you could try to “break” it up by trying different testing scenarios or testing and discussing new features.

Inline image 1 Apart from joining the RAC Attack activities, don’t forget to grab your RAC Attack ribbon and become a RAC Attack Ninja yourself! :)

Last but not least, we need your help to ensure that you have a great networking experience at RAC Attack this year. Please spread the word about the event among your networks and peers. Feel free to use any images from this post in your social media posts. On Twitter, we use the #RACAttack hash tag. (Follow it to get updates!) On Facebook, please feel free to reference Oracle Technology Network. Let’s make the operation Ninja a success together.

See you folks at Oracle OpenWorld 2013 in less than a month. slide_01_2013-08-25_2258

↧

Proactive FRA Monitoring Using OEM Metrics

August 27, 2013, 7:32 am

≫ Next: Oracle OpenWorld 2013 – Bloggers Meetup

≪ Previous: RAC Attack at Oracle OpenWorld 2013 (Operation Ninja)

As you know, thresholds for flash/fast recovery area (FRA) usage are internally set in a database to 85 and 97 per cent, and there are no ways to change the thresholds — at least none that are supported by Oracle. These settings may work fine in most cases, but being aware of changes in FRA usage can sometimes be helpful. You can then contact your DBA and suggest verifying your database’s settings.

This process is very simple and doable if your database is monitored by OEM; the agent that gathers all the information and saves it in the OEM repository. The only thing you would need to do is either schedule a report or create a user-defined metric with a specific threshold that pages you before it reaches the threshold pre-set by Oracle.

The metrics are from the Flash Recovery group and are available in mgmt$metric_current and in hourly and daily metrics history tables. For user-defined metrics, the following SQL can be used:

with s1 as (
  select target_type, target_name, metric_label, column_label, value, key_value
  from mgmt$metric_current
  where metric_label = 'Flash Recovery'
  and column_label = 'Usable Flash Recovery Area (%)'
  and value is not null
),
s2 as (
select target_type, target_name, max(db_ver) db_ver, max(host_name) host_name, max(dg_stat) dg_stat from (
  select target_type, target_name,
  (case when property_name = 'DBVersion' then property_value end) db_ver,
  (case when property_name = 'MachineName' then property_value end) host_name,
  (case when property_name = 'DataGuardStatus' then property_value end) dg_stat
  from MGMT$TARGET_PROPERTIES
  where target_name in (select target_name from s1)
  or property_name in ('DBVersion', 'MachineName', 'DataGuardStatus')
  )
  group by target_type, target_name
)
select s1.target_name, s2.host_name, s2.db_ver, s2.dg_stat, s1.column_label, s1.value from s1, s2
where s1.target_name = s2.target_name
and value < 30 --threshold per cent to page on
order by cast(value as number) desc

The query uses 30% of the Usable Flash Recovery Area (%) metric and brings additional information on the database’s location, its version, and its Data Guard status. For the metrics that bring a list of databases with breached metrics, I personally use count(*) around the query to page on and then execute the SQL saved as a report in SQL Developer to get all the information on targets.

Happy OEM’ing!

↧

Oracle OpenWorld 2013 – Bloggers Meetup

September 4, 2013, 8:58 am

≫ Next: Linux hugepages for Oracle on Amazon EC2: Possible, but not convenient, easy or fully supported

≪ Previous: Proactive FRA Monitoring Using OEM Metrics

Oracle OpenWorld Bloggers Meetup Oracle OpenWorld 2013 is just few weeks away, and of course, we are organizing the Annual Oracle Bloggers Meetup — one of your top favorite events of OpenWorld.

What: Oracle Bloggers Meetup 2013

When: Wed, 25-Sep-2013, 5:30pm

Where: Main Dining Room, Jillian’s Billiards @ Metreon, 101 Fourth Street, San Francisco, CA 94103 (street view). Please comment with “COUNT ME IN” if coming — we need to know the attendance numbers.

Traditionally, Oracle Technology Network and Pythian sponsor the venue and drinks. We will also have some cool things and few prizes.

As usual, vintage t-shirts from previous meetups will make you look cool — feel free to wear them. We won’t make it too busy with activities this year as we have so many folks coming now and we barely have enough time to network and say hello.

However, we couldn’t resist another competition so this year wou will compete for the best video composition from Oracle Blogger Meetup — edit it, process is and do whatever you want with it but there are few conditions:

Must be under 3 minutes total
All scenes must be shoot at the meetup
Must be shared on Youtube by 15-Oct

Recommendations — make it funny, get as many bloggers into it as you can, make it a story… be yourself — it’s your film! Get your creative film director juices flowing!

For those of you who don’t know the history… The Bloggers Meetup during Oracle OpenWorld was started by Mark Rittman and continued by Eddie Awad, and then I picked up the flag in 2009 (gosh… it’s 5 years I’ve been doing it?). The meetups have been a great success so let’s keep them this way! To give you an idea, here are the photos from the OOW08 Bloggers Meetup (courtesy of Eddie Awad) and OOW09 meetup blog post update from myself.

While the initial meetings were mostly around Oracle database folks, the latest meetups are joined by guys and gals from lots of Oracle technologies – Oracle database, MySQL, Applications, Sun technologies, Java, and more. All bloggers are welcome. We estimate to gather around 150 bloggers.

If you are planning to attend, please comment here with the phrase “COUNT ME IN”. This will help us make sure we have the attendance numbers right. Make sure you provide your blog URL with your comment — it’s a Bloggers Meetup after all! Make sure you comment here if you are attending so that we have enough room, food, and (most importantly) drinks. Last year we barely fit (again!).

Of course, do not under any circumstances forget to blog and tweet about this year’s bloggers meetup.

↧

Linux hugepages for Oracle on Amazon EC2: Possible, but not convenient, easy or fully supported

October 4, 2013, 10:51 am

≫ Next: UDM is watching you, UDMs

≪ Previous: Oracle OpenWorld 2013 – Bloggers Meetup

One of the optimizations available to us when running Oracle on Linux is huge page support. This feature of the Linux kernel enables processes to allocate memory pages of size 2M (instead of 4k). In addition, memory allocated using hugepages is pinned in physical memory. It cannot be swapped out.

It is now common practice to enable huge page support for Oracle databases with large SGAs (one rule of thumb is 8G). Without this feature, the SGA can be, and often is, paged out. Paging out portions of the SGA can result in disastrous consequences from a performance standpoint. There are a variety of load patterns that perform particularly poorly without hugepages. Running with large numbers of processes, sudden increases in processes (connection storms), and highly concurrent access of diverse sets of SGA pages all can bring an Oracle system without hugepages to its knees.

Considering how commonplace use of hugepages is with Oracle, it is surprising that on Amazon EC2, huge page support is not generally available, and that the systems designed expressly for running Oracle cannot use hugepages.

EC2 Virtualization Types

Under the covers, Amazon EC2 uses a hypervisor to (potentially) run many virtual machines on a given physical server. EC2 instances essentially come in two flavours of virtualization: paravirtualization (PVM) and hardware virtualization (HVM). The vast majority of EC2 AMIs use PVM, but for a variety of reasons, only EC2 instances using HVM can allocate hugepages.

Initially, I tried to enable hugepages with PVM. I was able to load a kernel with huge page support, and even enable hugepages and allocate a shared memory segment using hugepages. However upon trying to attach to the shared memory segment, I received a kernel panic. For a simple test case, I used a C program that is included in the Linux kernel distribution which creates a small shared memory segment using hugepages attaches to it: tools/testing/selftests/vm/hugepage-shm.c.

[root@ip-10-28-12-158 ~]# echo 512 >/proc/sys/vm/nr_hugepages [root@ip-10-28-12-158 ~]# grep Huge /proc/meminfo HugePages_Total: 512 HugePages_Free: 512 ... Hugepagesize: 2048 kB

[root@ip-10-28-12-158 ~]# strace ./hugepage-shm
execve(“./hugepage-shm”, ["./hugepage-shm"], [/* 22 vars */]) = 0
…
shmget(0×2, 268435456, IPC_CREAT|SHM_HUGETLB|0600) = 1146881
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), …}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f469afd1000
write(1, “shmid: 0×118001\n”, 16shmid: 0×118001) = 16
shmat(1146881, 0, 0)………

At this point, the program hangs, and the kernel panics.

However, when I try the same experiment on an identical EC2 instance running with HVM, the test completes successfully. After trying to use hugepages on a variety of operating systems and configurations on EC2, I conclude that for EC2, huge page support is only available on HVM instances.

HVM is available on the following instance classes:

Class	ECUs*	Mem(G)	Price**	Oracle Licenses Required	Notes
m3.xlarge	13.0	15.0	0.50	1	2nd Gen Standard
m3.2xlarge	26.0	30.0	1.00	2	2nd Gen Standard
cc2.8xlarge	88.0	60.5	2.40	8	Cluster Compute
cr1.8xlarge	88.0	244.0	3.50	8	Memory-optimized cluster
hi1.4xlarge	35.0	60.5	3.10	4	I/O optimized
hs1.8xlarge	35.0	117.0	4.60	4	Storage optimized
cg1.4xlarge	33.5	22.5	2.10	4	Cluster GPU

* Elastic Compute Units: “The equivalent CPU capacity of one 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor”
** US$ per hour on-demand in us-east-1 region

Fine, I’ll use EC2 instances with HVM. What’s the big deal?

Well, at Openworld in 2010, Oracle and Amazon announced that EC2 would allow customers to select Oracle VM as the underlying hypervisor for their Amazon EC2 instances. This made EC2 a platform completely supported by Oracle.

However, when Oracle and Amazon released the OVM-backed instances, they could not be run on any of the above HVM-capable instance classes. The EC2 API responds with:

Client.InvalidParameterCombination: Non-Windows instances with a virtualization type of 'hvm' are currently not supported for this instance type.

This means that if you want to use hugepages on EC2, you cannot use any of the Oracle VM-backed instances. What??? AWS and Oracle went to the trouble of making OVM available to support running Oracle software on EC2, but didn’t bother to enable it on any of the instance classes capable of hugepages, which is recommended by Oracle?

We have all been over and over the question of whether to run Oracle on non-Oracle hypervisors like VMware and Xen. The policy from Oracle is something like “You can run Oracle on non-Oracle hypervisors, but if you call for support and we suspect the problem is with the hypervisor, we reserve the right to ask you to reproduce the problem on bare metal.”

Well for Amazon Xen, the same policy applies. Essentially we are left with the choice: When running Oracle on Amazon EC2, either run efficiently with hugepages on a not-totally-supported hypervisor, or run inefficiently without hugepages on a fully supported Oracle VM-backed instance.

AWS and Oracle take note: As it stands, the OVM offering is not set up in a way that enables real enterprise use of EC2 by Oracle customers. This is a major oversight that needs to be addressed.

↧

UDM is watching you, UDMs

October 11, 2013, 8:08 am

≫ Next: Mining the AWR to Identify Performance Trends

≪ Previous: Linux hugepages for Oracle on Amazon EC2: Possible, but not convenient, easy or fully supported

With time monitoring of several thousand targets of different versions and on different operating systems accumulates additional checks and user-defined metrics for specific requirements. With the presence of a dozen super administrator accounts, the Oracle Enterprise Manager environment demands specific monitoring so that checks that were customized and configured for certain targets are not lost.

How could this monitoring function be better organized? One of the approaches I find useful is to create user-defined metrics to monitor other user-defined metrics. Knowing the list of UDMs that should be configured for targets, I created UDMs that gather information about metrics and send alerts if there are any discrepancies.

select count(*) from (
with sql_metrics as (
            select 'OEMP' target_name, 'prod_targets_no_metrics' metric_name, '' note from dual
  union all select 'OEMP', 'target_removed', '' from dual
  union all select 'OEMP', 'test_targets_no_metrics', '' from dual
  union all select 'OEMP', 'UDM_count_autofiles', '' from dual
  union all select 'OEMP', 'UDM_asm_dg', '' from dual
  union all select 'PROD.WORLD', 'UDM_apply_status', '' from dual
  union all select 'TEST.WORLD', 'UDM_apply_status', '' from dual
  union all select 'STDBY.WORLD', 'UDM_standby_lag', '' from dual
  union all select 'PROD.WORLD', 'changes_lag', '' from dual
  union all select 'REP.WORLD', 'changes_lag', '' from dual
  union all select 'TEST.WORLD', 'changes_lag', '' from dual
  union all select 'TNP', 'standby_lag', '' from dual
  union all select 'E.WORLD', 'UDM_alertlog', '' from dual
),
sql_current as (select c.target_type, c.target_name, c.metric_label, c.column_label,
  max(c.collection_timestamp) last_date, count(*) cnt
  from mgmt$metric_current c
  where c.metric_label like 'User-Defined%Metric%'
  group by c.target_type, c.target_name, c.metric_label, c.column_label
)
select target_name, metric_name, decode(nvl(cnt, 0), 0,
'Error: Metric does not exist', 'Error: Metric exists but coll time older than 3 days') msg,
note from (
select m.target_name, m.metric_name, c.cnt, c.last_date, m.note
from sql_metrics m, sql_current c
where m.target_name = c.target_name(+)
and m.metric_name = c.column_label(+)
)
where last_date < sysdate - 3.4
or last_date is null
)

But what about the UDM check itself? What if it is removed as well? For that purpose, I created a script which scheduled on the OMS host and runs emcli collect_metric periodically. If collection is not completed successfully then the DBA on call is alerted.

/home/oracle/working/ag/report_udm_not_there.sh OEMP UDM_existence >/dev/null 2>&1

report_udm_not_there.sh:
#!/bin/bash
CMD_PATH=/home/oracle/working/ag
export JAVA_HOME=/e00/oracle/middleware/oms11g/jdk
export PATH=$JAVA_HOME/bin:$PATH
UDM=$2
DB=$1
LOG=$CMD_PATH/`basename $0 .sh`_${DB}_${UDM}.log
$CMD_PATH/emcli collect_metric -target_type=oracle_database -target_name=$DB -collection=$UDM >$LOG 2>$LOG
if [ `cat $LOG | grep "was collected at repository successfully" | wc -l` -ne 1 ]
then
  mailx -s "issues with $UDM at $DB" admin@site.com < $LOG
fi

Another way to monitor disassociation of UDMs and targets is to use the FLASHBACK feature which allows you to compare current and past data (let’s say 15 minutes ago) of mgmt$metric_current. However, it can be too general.

Happy OEM monitoring!

↧

Mining the AWR to Identify Performance Trends

October 31, 2013, 5:53 am

≫ Next: Meaning of “Disk Reads” Values in DBA_HIST_SQLSTAT

≪ Previous: UDM is watching you, UDMs

Sometimes it’s useful to check how performance of a SQL statement changes over time. The diagnostic pack features provide some really useful information to answer these questions. The data is there, but it not always easy to retrieve it, especially if you want to see how the performance changes over time. I’ve been using three really simple scripts to retrieve this information from the AWR. These scripts help me answer the following questions:

How does the performance of a particular SQL change over time?
How do wait times of a particular wait event change over time?
How does a particular statistic change over time?

Please note, the scripts provided here require diagnostic pack licenses and it’s your task to make sure you have them before running the scripts.

SQL performance

I use script awr_sqlid_perf_trend.sql to check how performance of the SQL changes over time. The script summarizes the data from DBA_HIST_SQLSTAT and reports the average statistics for a single execution of the query during the reporting interval. It requires 3 input parameters:

SQL ID
Days to report. It will summarize all AWR snapshots starting with “trunc(sysdate)-{days to report}+1″, so if you pass “1″, it will summarize all snapshots from today, if “2″ – than it’s yesterday and today are included.
Interval in hours. “24″ will provide one row for each day, “6″ will give 4 rows a day.

Nothing shows it better than an example. Below you see how I’m checking execution statistics for sql_id fd7rrqkn1k2xb by summarizing the AWR information captured in last 2 weeks and reporting the average values for 2-day intervals. Then I’m taking a little closer look at the last 4 days for the same SQL by summarizing data over 6hour intervals. Note, the time column shows the beginning of the interval.


TIME                 EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC  DISK_READS_1EXEC DIRECT_WRITES_1EXEC
------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
16.10.2013 00:00:00         351              195.571           74.995           .097           .000           .000           .000           134417.570      21319182.291        293731.556          304434.305
18.10.2013 00:00:00         364               91.225           47.474          1.687           .000           .000           .002           141140.228      20364053.544        270107.745          273343.709
20.10.2013 00:00:00         542               20.686            9.378           .004           .000           .000           .000           146436.875       4597922.220             3.168                .000
22.10.2013 00:00:00         531               25.060           12.086           .161           .000           .000           .000           146476.605       6026729.224         23999.684           23998.859
24.10.2013 00:00:00         542               51.611           40.675          1.880           .000           .000           .000           146814.220      21620264.039        287994.862          287994.701
26.10.2013 00:00:00         534               39.949           26.688          1.050           .000           .000           .000           147099.275      14081016.607        159704.463          159704.418
28.10.2013 00:00:00         245               37.837           29.384          1.150           .000           .000           .000           147135.216      15505533.959        179244.437          179244.367

7 rows selected.


TIME                 EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC  DISK_READS_1EXEC DIRECT_WRITES_1EXEC
------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
26.10.2013 00:00:00          72               19.209            9.439           .000           .000           .000           .000           147076.000       4623816.597              .111                .000
26.10.2013 06:00:00          72               15.391            9.401           .000           .000           .000           .000           147086.403       4624153.819              .000                .000
26.10.2013 12:00:00          72               14.022            9.351           .000           .000           .000           .000           147099.000       4624579.639              .000                .000
26.10.2013 18:00:00          55               48.174           35.723          1.575           .000           .000           .000           147099.000      19192781.582        243584.055          243584.055
27.10.2013 00:00:00          72               76.723           43.350          2.116           .000           .000           .000           147099.000      23258326.875        314445.111          314445.111
27.10.2013 06:00:00          72               64.921           43.914          2.084           .000           .000           .000           147107.542      23258506.028        315673.000          315673.000
27.10.2013 12:00:00          72               52.567           43.383          2.041           .000           .000           .000           147116.000      23258739.403        315673.000          315673.000
27.10.2013 18:00:00          47               25.522           18.095           .523           .000           .000           .000           147117.532       9382873.851         80597.702           80597.362
28.10.2013 00:00:00          65               17.645            9.384           .000           .000           .000           .000           147120.000       4625354.262              .000                .000
28.10.2013 06:00:00          19               17.571            9.451           .000           .000           .000           .000           147122.421       4625411.263              .000                .000
28.10.2013 12:00:00           6               14.083            9.645           .000           .000           .000           .000           147208.167       4629315.167              .000                .000
28.10.2013 18:00:00          48               42.173           35.208          1.509           .000           .000           .000           147236.375      18606643.833        229433.750          229433.750
29.10.2013 00:00:00          72               53.015           43.517          2.022           .000           .000           .000           147245.125      23265547.847        314507.319          314507.083
29.10.2013 06:00:00          30               52.181           43.638          1.932           .000           .000           .000           147250.300      23265839.767        303949.000          303949.000
29.10.2013 12:00:00           5               59.576           43.836          1.177           .000           .000           .000           144049.800      23267109.200        227814.000          227814.000

15 rows selected.

I’ve checked this SQL because the users reported inconsistent performance. It can also be observed in the outputs above. Take a look! The number of rows processed during each execution of the SQL doesn’t change – it’s always around 147K, but look at the disk reads and the direct writes! These values can be around zero, but then they suddenly jump up to 300K, and when they do, the buffer gets increase too and the CPU time goes up from 9 seconds to 43. Based on the information above it looks like there could be two different execution plans involved and bind variable peeking could be causing one or the other plan to become the active plan.
Additionally you can use the same script to check how execution statistics for the same SQL change over time. Does the elapsed time increase? Do the number of processed rows or number of buffer gets per execution change?

Wait event performance

Script awr_wait_trend.sql can be used to show the changes in wait counts and wait durations for a particular event over time. Similarly to the previous script it also requires 3 parameters, only instead of SQL ID you pass the name of the wait event. This time the data comes from DBA_HIST_SYSTEM_EVENT.

I typically use this script in two situations:

To check if a particular wait event performs worse when an overall performance problem is reported (usually I’m looking at IO events)
Illustrate how the implemented change improved the situation.

The example below shows how the performance of log file parallel write event changed over 3 weeks. On october 19th we moved the redo logs to dedicated high performance LUNs. Before that the 2 members of each redo log group were located on a saturated LUN together with all the data files.


TIME                EVENT_NAME                       TOTAL_WAITS   TOTAL_TIME_S    AVG_TIME_MS
------------------- ---------------------------- --------------- -------------- --------------
09.10.2013 00:00:00 log file parallel write              4006177      31667.591          7.905
10.10.2013 00:00:00 log file parallel write              3625342      28296.640          7.805
11.10.2013 00:00:00 log file parallel write              3483249      31032.324          8.909
12.10.2013 00:00:00 log file parallel write              3293462      33351.490         10.127
13.10.2013 00:00:00 log file parallel write              2871091      36413.925         12.683
14.10.2013 00:00:00 log file parallel write              3763916      30262.718          8.040
15.10.2013 00:00:00 log file parallel write              3018760      28262.172          9.362
16.10.2013 00:00:00 log file parallel write              3303205      31062.276          9.404
17.10.2013 00:00:00 log file parallel write              3012105      31831.491         10.568
18.10.2013 00:00:00 log file parallel write              2692697      26981.966         10.020
19.10.2013 00:00:00 log file parallel write              1038399        512.950           .494
20.10.2013 00:00:00 log file parallel write               959443        427.554           .446
21.10.2013 00:00:00 log file parallel write              1520444        606.580           .399
22.10.2013 00:00:00 log file parallel write              1618490        655.873           .405
23.10.2013 00:00:00 log file parallel write              1889845        751.216           .398
24.10.2013 00:00:00 log file parallel write              1957384        760.656           .389
25.10.2013 00:00:00 log file parallel write              2204260        853.691           .387
26.10.2013 00:00:00 log file parallel write              2205783        856.731           .388
27.10.2013 00:00:00 log file parallel write              2033199        785.785           .386
28.10.2013 00:00:00 log file parallel write              2439092        923.368           .379
29.10.2013 00:00:00 log file parallel write              2233614        840.628           .376

21 rows selected.

Visualizing the data from output like that is easy too!

System Statistics

The last script from this set is awr_stat_trend.sql. It does the same thing with the system statistics collected in DBA_HIST_SYSSTAT as previous scripts did to the performance of SQLs and wait events. The parameters are similar again – the name of the system statistic, days to report and the interval. I usually use the query to check how the redo size or the number of physical reads change over time, but there’s huge number of statistics available (638 different statistics in 11.2.0.3) and that’s why I’m sure you’ll find your own reasons to use this script.


TIME                STAT_NAME                             VALUE
------------------- ------------------------- -----------------
27.10.2013 00:00:00 redo size                        1739466208
27.10.2013 04:00:00 redo size                        2809857936
27.10.2013 08:00:00 redo size                         648511376
27.10.2013 12:00:00 redo size                         533287888
27.10.2013 16:00:00 redo size                         704832684
27.10.2013 20:00:00 redo size                         819854908
28.10.2013 00:00:00 redo size                        2226799060
28.10.2013 04:00:00 redo size                        3875182764
28.10.2013 08:00:00 redo size                        1968024072
28.10.2013 12:00:00 redo size                        1125339352
28.10.2013 16:00:00 redo size                        1067175300
28.10.2013 20:00:00 redo size                         936404908
29.10.2013 00:00:00 redo size                        1758952428
29.10.2013 04:00:00 redo size                        3949193948
29.10.2013 08:00:00 redo size                        1715444632
29.10.2013 12:00:00 redo size                        1008385144
29.10.2013 16:00:00 redo size                         544946804

17 rows selected.

AWR is a gold mine, but you need the right tools for digging. I hope the scripts will be useful for you too!
P.S. You might have noticed the scripts are published on GitHub, let me know if you find any issues using them and perhaps one day you’ll find new versions for the script.

Update (4-Nov-2013)

I’ve added the instance numbers to the outputs in all three scripts. This is how it looks now:


 INST TIME                 EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC  DISK_READS_1EXEC DIRECT_WRITES_1EXEC
----- ------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
    1 28.10.2013 00:00:00         840                 .611             .014           .595           .007           .000           .000                1.000          1085.583           128.724                .000
      30.10.2013 00:00:00        1466                 .491             .011           .479           .005           .000           .000                1.000           976.001            88.744                .000
      01.11.2013 00:00:00         542                 .798             .023           .760           .025           .000           .000                1.000           896.978           114.196                .000
      03.11.2013 00:00:00         544                 .750             .021           .719           .017           .000           .000                1.000          1098.213           134.941                .000

    2 28.10.2013 00:00:00        1638                 .498             .017           .474           .013           .000           .000                1.001           953.514            96.287                .000
      30.10.2013 00:00:00        1014                 .745             .022           .712           .019           .000           .000                1.000          1034.249           131.057                .000
      01.11.2013 00:00:00        1904                 .633             .011           .624           .002           .000           .000                1.000          1045.668           104.568                .000
      03.11.2013 00:00:00         810                 .602             .017           .581           .010           .000           .000                1.000           929.778           108.998                .000


8 rows selected.

↧

Meaning of “Disk Reads” Values in DBA_HIST_SQLSTAT

November 6, 2013, 7:11 am

≫ Next: getMOSPatch.sh – Downloading Patches From My Oracle Support

≪ Previous: Mining the AWR to Identify Performance Trends

This post relates to my previous writing on mining the AWR. I noticed that it’s very easy to misinterpret the DISK_READS_TOTAL and DISK_READS_DELTA columns in DBA_HIST_SQLSTAT. Let’s see what the documentation says:

DISK_READS_TOTAL – Cumulative number of disk reads for this child cursor
DISK_READS_DELTA – Delta number of disk reads for this child cursor

You might think it’s clear enough and that’s exactly what I thought too. The number of disk reads is the number of IO requests to the storage. But is it really true?
I started suspecting something was not right after using my own awr_sqlid_perf_trend.sql script (see more details on this script here). I noticed the DISK_READS_DELTA values were too close to BUFFER_GETS_DELTA values for queries that use full table scans, which are normally executed using multi-block IO requests to the storage. I was expecting disk reads to be at least two times lower than the buffer gets, but it was something closer to 90% in a few cases. So was I looking at the number of IO requests or the number of blocks read from disks? The best way to find it out was a test case.
The following testing was done in an 11.2.0.3 database:

I created a new AWR snapshot and enabled tracing for my session. I made sure the db_file_multiblock_read_count parameter was set to a high value and then executed a SQL that was forced to use a full table scan (FTS) to read the data from disks. Another AWR snapshot was taken after that.

SQL> alter session set tracefile_identifier='TEST1';
Session altered.

SQL> show parameter multiblock
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_file_multiblock_read_count        integer     26

SQL> alter system set db_file_multiblock_read_count=128;
System altered.

SQL> show parameter multiblock
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_file_multiblock_read_count        integer     128

SQL> exec dbms_workload_repository.create_snapshot();
PL/SQL procedure successfully completed.

SQL> alter session set max_dump_file_size=unlimited;
Session altered.

SQL> alter system set events '10046 trace name context forever, level 12';
System altered.

SQL> select /*+ full(a) */ count(I_DATA) from tpcc.item;
COUNT(I_DATA)
-------------
       100000

SQL> exec dbms_workload_repository.create_snapshot();
PL/SQL procedure successfully completed.

I found the sql_id in the trace file (it was 036c3dmx2u3x9) and executed the awr_sqlid_perf_trend.sql to find out how many disk reads were made (I removed a few columns that are not important here).
```
SQL> @awr_sqlid_perf_trend.sql 036c3dmx2u3x9 20 0.001

 INST TIME                BUFFER_GETS_1EXEC  DISK_READS_1EXEC DIRECT_WRITES_1EXEC  EXECUTIONS ROWS_PROCESSED_1EXEC
----- ------------------- ----------------- ----------------- ------------------- ----------- --------------------
    1 24.10.2013 03:53:06          1092.000          1073.000                .000           1                1.000
```
It was a single execution and look at the numbers! 1073 disk reads and 1092 buffer gets. Could it be the DISK_READS_DELTA is actually the number of blocks read from disks? I need to check the raw trace file to find out.

I found the following lines in the trace file. I’ve highlighted all lines that report waits on physical IO. Notice the first query (sqlid=’96g93hntrzjtr’) is a recursive SQL (dep=1) for the query I executed (sqlid=’036c3dmx2u3x9′, and it was executed during the PARSE phase “PARSE #7904600″ for my query. There were few other recursive statements, but they didn’t do any disk IOs (you’ll have to trust me here). It’s good to know the lines are written to the trace file after the corresponding event completes, this is why the recursive statements of the parse phase are reported before the line describing the whole parse operation.

PARSING IN CURSOR #25733316 len=210 dep=1 uid=0 oct=3 lid=0 tim=1382576068532471 hv=864012087 ad='3ecd4b88' sqlid='96g93hntrzjtr'
select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum, maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2
END OF STMT
PARSE #25733316:c=0,e=240,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=3,plh=0,tim=1382576068532470
EXEC #25733316:c=0,e=404,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=3,plh=2239883476,tim=1382576068532919
WAIT #25733316: nam='db file sequential read' ela= 885 file#=1 block#=64857 blocks=1 obj#=427 tim=1382576068533833
WAIT #25733316: nam='db file sequential read' ela= 996 file#=1 block#=58629 blocks=1 obj#=425 tim=1382576068534935
FETCH #25733316:c=0,e=2092,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=1382576068535022
STAT #25733316 id=1 cnt=1 pid=0 pos=1 obj=425 op='TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=3 pr=2 pw=0 time=2079 us)'
STAT #25733316 id=2 cnt=1 pid=1 pos=1 obj=427 op='INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=1 pw=0 time=989 us)'
CLOSE #25733316:c=0,e=58,dep=1,type=3,tim=1382576068535146
=====================
PARSING IN CURSOR #7904600 len=50 dep=0 uid=0 oct=3 lid=0 tim=1382576068535413 hv=4197257129 ad='3618a2a4' sqlid='036c3dmx2u3x9'
select /*+ full(a) */ count(I_DATA) from tpcc.item
END OF STMT
PARSE #7904600:c=8001,e=12985,p=2,cr=19,cu=0,mis=1,r=0,dep=0,og=1,plh=1537583476,tim=1382576068535411
EXEC #7904600:c=0,e=29,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1537583476,tim=1382576068535500
WAIT #7904600: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1382576068535530
WAIT #7904600: nam='db file sequential read' ela= 1960 file#=4 block#=113050 blocks=1 obj#=65019 tim=1382576068537566
WAIT #7904600: nam='direct path read' ela= 1203 file number=4 first dba=113051 block cnt=5 obj#=65019 tim=1382576068539309
WAIT #7904600: nam='direct path read' ela= 1531 file number=4 first dba=123392 block cnt=8 obj#=65019 tim=1382576068541567
WAIT #7904600: nam='direct path read' ela= 1047 file number=4 first dba=123401 block cnt=15 obj#=65019 tim=1382576068542719
WAIT #7904600: nam='direct path read' ela= 1081 file number=4 first dba=123417 block cnt=15 obj#=65019 tim=1382576068543895
WAIT #7904600: nam='direct path read' ela= 956 file number=4 first dba=123433 block cnt=15 obj#=65019 tim=1382576068544997
WAIT #7904600: nam='direct path read' ela= 950 file number=4 first dba=123449 block cnt=15 obj#=65019 tim=1382576068546096
WAIT #7904600: nam='direct path read' ela= 1168 file number=4 first dba=123465 block cnt=15 obj#=65019 tim=1382576068547425
WAIT #7904600: nam='direct path read' ela= 1151 file number=4 first dba=123481 block cnt=15 obj#=65019 tim=1382576068548784
WAIT #7904600: nam='direct path read' ela= 1279 file number=4 first dba=123497 block cnt=15 obj#=65019 tim=1382576068550229
WAIT #7904600: nam='direct path read' ela= 9481 file number=4 first dba=123522 block cnt=126 obj#=65019 tim=1382576068559912
WAIT #7904600: nam='direct path read' ela= 6872 file number=4 first dba=123650 block cnt=126 obj#=65019 tim=1382576068566997
WAIT #7904600: nam='direct path read' ela= 5562 file number=4 first dba=123778 block cnt=126 obj#=65019 tim=1382576068573516
WAIT #7904600: nam='direct path read' ela= 7524 file number=4 first dba=123906 block cnt=126 obj#=65019 tim=1382576068582195
WAIT #7904600: nam='direct path read' ela= 5858 file number=4 first dba=124034 block cnt=126 obj#=65019 tim=1382576068589263
WAIT #7904600: nam='direct path read' ela= 5326 file number=4 first dba=124162 block cnt=126 obj#=65019 tim=1382576068595750
WAIT #7904600: nam='direct path read' ela= 5788 file number=4 first dba=124290 block cnt=126 obj#=65019 tim=1382576068602627
WAIT #7904600: nam='direct path read' ela= 2446 file number=4 first dba=124418 block cnt=70 obj#=65019 tim=1382576068607337
FETCH #7904600:c=4000,e=73444,p=1071,cr=1073,cu=0,mis=0,r=1,dep=0,og=1,plh=1537583476,tim=1382576068608996
STAT #7904600 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=1073 pr=1071 pw=0 time=73444 us)'
STAT #7904600 id=2 cnt=100000 pid=1 pos=1 obj=65019 op='TABLE ACCESS FULL ITEM (cr=1073 pr=1071 pw=0 time=49672 us cost=198 size=3900000 card=100000)'
WAIT #7904600: nam='SQL*Net message from client' ela= 148 driver id=1650815232 #bytes=1 p3=0 obj#=65019 tim=1382576068609235
FETCH #7904600:c=0,e=1,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=1537583476,tim=1382576068609261
WAIT #7904600: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=65019 tim=1382576068609276
WAIT #7904600: nam='SQL*Net message from client' ela= 940 driver id=1650815232 #bytes=1 p3=0 obj#=65019 tim=1382576068610226
CLOSE #7904600:c=0,e=17,dep=0,type=0,tim=1382576087713173

The next task was to count the “blocks” for db file sequential reads and “block cnt” for direct path reads. The recursive SQL (96g93hntrzjtr) read 2 data blocks from disks and the main SQL (036c3dmx2u3x9) read 1071 data blocks from disks. The total number is 1073! Hey, this is exactly what DISK_READS_DELTA (DISK_READS_1EXEC in the script outputs above) reported – so it’s the number of data blocks, and not the number of IO requests!

The investigation resulted in two obvious conclusions:

DISK_READS_TOTAL and DISK_READS_DELTA in DBA_HIST_SQLSTAT report the number of blocks read from disks.
The query statistics in DBA_HIST_SQLSTAT also include the data from execution of the recursive statements.

P.S. Later I found another column – DBA_HIST_SQLSTAT.PHYSICAL_READ_REQUESTS_DELTA – that was introduced in 11.2 along with a large number of additional columns. PHYSICAL_READ_REQUESTS_DELTA and PHYSICAL_READ_REQUESTS_TOTAL represent the number of IO requests that were executed. You can compare the numbers by counting the highlighted rows above to the value I found in DBA_HIST_SQLSTAT below.

SQL> select DISK_READS_DELTA, PHYSICAL_READ_REQUESTS_DELTA from dba_hist_sqlstat where sql_id='036c3dmx2u3x9';

DISK_READS_DELTA PHYSICAL_READ_REQUESTS_DELTA
---------------- ----------------------------
            1073                           20

↧

getMOSPatch.sh – Downloading Patches From My Oracle Support

November 11, 2013, 7:31 am

≫ Next: How to Download Oracle Software Using WGET or CURL

≪ Previous: Meaning of “Disk Reads” Values in DBA_HIST_SQLSTAT

How to download patches from My Oracle Support (MOS) directly to the server? This has bothered me since the ftp access was closed a few years ago. Of course, I’ve been given some options by Oracle, like, I could access MOS from the server using a browser (probably from a VNC desktop – thank you very much), or I could look up the patches on my workstation to download the WGET script from MOS, which I uploaded to the server, adjusted with the username and the password of my MOS account and then started the downloads. Not too convenient, is it?
Then, back in 2009 my teammate John published a blog post on Retrieving Oracle patches with wget. This eliminated the need to upload the wget script from MOS to the server and I only had to get the URLs of the patches and pass them to a shell function. While this was so much easier, I still needed to open the browser to find those URLs.
I think it’s time to get rid of browser dependency. So I’ve written a shell script getMOSPatch.sh that can be used to download patches directly to the server using only the patch number.
I’ve tested the tool on Linux, and there is a good chance it won’t work on some other platforms as it utilizes tools like awk, sed, grep, egrep and wget with options that probably only work on Linux, but if there’s much interest in this tool and I get many comments on this blog post I promise to change that :)
You can use wget to download the script to the server directly:

[oracle@mel1 Patches]$ wget --no-check-certificate -nv https://raw.github.com/MarisElsins/TOOLS/master/Shell/getMOSPatch.sh
WARNING: cannot verify raw.github.com's certificate, issued by `/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert High Assurance CA-3':
  Unable to locally verify the issuer's authority.
2013-11-10 17:42:17 URL:https://raw.github.com/MarisElsins/TOOLS/master/Shell/getMOSPatch.sh [4021/4021] -> "getMOSPatch.sh" [1]
[oracle@mel1 Patches]$ chmod u+x getMOSPatch.sh

First time you run the script (or when you run it with parameter reset=yes) it will let you choose which platforms and languages the patches need to be downloaded for and the choices will be saved in a configuration file. The available platforms and languages are fetched from MOS.

[oracle@mel1 Patches]$ ./getMOSPatch.sh reset=yes
Oracle Support Userid: elsins@pythian.com
Oracle Support Password:

Getting the Platform/Language list
Available Platforms and Languages:
527P - Acme Packet OS
293P - Apple Mac OS X (Intel) (32-bit)
522P - Apple Mac OS X (Intel) (64-bit)
...
226P - Linux x86-64
912P - Microsoft Windows (32-bit)
...
7L - Finnish (SF)
2L - French (F)
4L - German (D)
104L - Greek (EL)
107L - Hebrew (IW)
...
39L - Ukrainian (UK)
43L - Vietnamese (VN)
999L - Worldwide Spanish (ESW)
Comma-delimited list of required platform and language codes: 226P,4L
[oracle@mel1 Patches]$

After this you simply have to run the script with parameter patch=patchnr1,patchnr2,… and the listed patches will be downloaded. This is how it happens:

the script looks up each of the patches for each platform and language and:
- if one patch is found – it is automatically downloaded
- if multiple patches are found (this can happen if the same patch is available for multiple releases) – the tool will ask you to choose which patches to download.
you can also specify parameter download=all to download all found patches without being asked to choose ones from the list.
you can also specify parameter regexp to apply filters to the filenames of the looked up patches. This is especially useful for Apps DBAs as filter regexp=”.*A_R12.*” would be helpful for e-Business Suite Release 12.0 and regexp=”.*B_R12.*” – for R12.1.
if you set environment variables mosUser and mosPass before running the script you won’t be asked to enter the user credentials.

Take a look at the following examples:

downloading the latest CPU patch (patch 16902043, OCT2013) for 11gR2.

[oracle@mel1 Patches]$ ./getMOSPatch.sh patch=16902043
Oracle Support Userid: elsins@pythian.com
Oracle Support Password:

Getting patch 16902043 for "Linux x86-64"
p16902043_112030_Linux-x86-64.zip completed with status: 0

Getting patch 16902043 for "German (D)"
no patch available

Downloading the latest patch for OPatch (there are multiple patches available on the same platform):

[oracle@mel1 Patches]$ export mosUser=elsins@pythian.com
[oracle@mel1 Patches]$ ./getMOSPatch.sh patch=6880880
Oracle Support Password:

Getting patch 6880880 for "Linux x86-64"
1 - p6880880_112000_Linux-x86-64.zip
2 - p6880880_111000_Linux-x86-64.zip
3 - p6880880_121010_Linux-x86-64.zip
4 - p6880880_131000_Generic.zip
5 - p6880880_101000_Linux-x86-64.zip
6 - p6880880_102000_Linux-x86-64.zip
Comma-delimited list of files to download: 3
p6880880_121010_Linux-x86-64.zip completed with status: 0

Getting patch 6880880 for "German (D)"
no patch available
[oracle@mel1 Patches]$

Downloading multiple patches at the same time without prompting user to specify which files to download if multiple files are found. (don’t be confused that files with “LINUX” and not “Linux-x86-64″ in the filename are downloaded here. These are e-Business Suite patches and both 32b and 64b platforms have the same patch):

[oracle@mel1 Patches]$ ./getMOSPatch.sh patch=10020251,10141333 download=all
Oracle Support Userid: elsins@pythian.com
Oracle Support Password:

Getting patch 10020251 for "Linux x86-64"
p10020251_R12.AR.B_R12_LINUX.zip completed with status: 0
p10020251_R12.AR.A_R12_LINUX.zip completed with status: 0

Getting patch 10020251 for "German (D)"
p10020251_R12.AR.A_R12_d.zip completed with status: 0

Getting patch 10141333 for "Linux x86-64"
p10141333_R12.AR.A_R12_LINUX.zip completed with status: 0
p10141333_R12.AR.B_R12_LINUX.zip completed with status: 0

Getting patch 10141333 for "German (D)"
p10141333_R12.AR.B_R12_d.zip completed with status: 0
p10141333_R12.AR.A_R12_d.zip completed with status: 0

Downloading the same patches as in the previous example with an additional filter for e-Business Suite 12.1 patches only:

[oracle@mel1 Patches]$ ./getMOSPatch.sh regexp=".*B_R12.*" patch=10020251,10141333 download=all
Oracle Support Userid: elsins@pythian.com
Oracle Support Password:

Getting patch 10020251 for "Linux x86-64"
p10020251_R12.AR.B_R12_LINUX.zip completed with status: 0

Getting patch 10020251 for "German (D)"
no patch available

Getting patch 10141333 for "Linux x86-64"
p10141333_R12.AR.B_R12_LINUX.zip completed with status: 0

Getting patch 10141333 for "German (D)"
p10141333_R12.AR.B_R12_d.zip completed with status: 0

↧

How to Download Oracle Software Using WGET or CURL

November 11, 2013, 7:42 am

≫ Next: Oracle RAC on the Cloud, Part 1

≪ Previous: getMOSPatch.sh – Downloading Patches From My Oracle Support

This is yet another blog post with tips and tricks to help you (and me) download Oracle software without the help of a Web browser. If you Google “how to download Oracle with wget” you’ll find heaps of posts with useful tips.

I decided to write this post to compile the methods I’m aware of in a single page for future easy reference.

If you have anything to complement this information, please let me know and I’ll update the post with it!

Please note that the methods described below may work for some of the Oracle sites but not others. For each method, I list the sites that are known to work with it. I’ve tested the methods with the following sites:

Oracle Technology Network (OTN)
Oracle eDelivery
My Oracle Support (MOS)

Method 1: Use the download URL

Works with: OTN, eDelivery and MOS

I came across this method recently while googling for a blog post like this one. Google pointed in the direction of a video by David Ghedini demonstrating a very simple way to download files from Oracle.

This method is very easy to use and the simplest way to download a single file since it doesn’t require exporting and copying cookies to the server. For multiple files, though, method 2 below may be a better option.

Please watch David Ghedini’s video for details on using this method.

It consists of initiating the download using your computer’s browser, pausing the download and copying the download URL, which contains the authentication token in it.
With this URL you can download the file from the remote server using one of the following commands:

wget "download_url" -O file_name OR curl "download_url" -o file_name

Method 2: Export cookies

Works with: OTN, eDelivery and MOS

This method requires exporting the cookies from you browser to a file and copying that file to the remote server, so that we can use it for the download. The cookies file contains your session’s authentication token as well as the confirmation of the EULA acceptance.

This is a handy method when you have to download multiple files at once.

To use this method it’s necessary to have a tool to export the cookies from the Web browser to a text file. If you don’t already have one, I’d suggest one of the browser extensions below:

For Firefox: Export Cookies
For Chrome: cookies.txt export

After installing the extension(s) above on the browser of your choice, follow the steps below:

Initiate the download of the file you want (if downloading multiple files, you just need to do this for the first one)
Once the download is initiated, cancel it.
Export the cookies to a file (call it cookies.txt)
If you’re using one of the extensions suggested above, this is how you do it:
- On Firefox: click on Tools -> “Export cookies…” and save the file
- On Chrome: click on the “cookies.txt export” icon in the toolbar (the icon is a blue “C” with an arrow inside), select the entire contents of the cookies and paste it into a text file.
Copy the cookies.txt file to your remote server.
Download the files you want with one of the following commands:

wget --load-cookies=./cookies.txt --no-check-certificate "file_url" -O file_name OR curl --location --cookie ./cookies.txt --insecure "file_url" -o file_name

Multiple files can be downloaded using the same cookies.txt file. The cookies are valid for 30 minutes and the download must be initiated during that period. After that you’ll have to repeat the process to re-export the cookies.

Method 3: Basic Authentication

Works with: MOS

My Oracle Support (MOS) accepts Basic Authentication when downloading files. If you have a valid MOS account you can download files simply by executing the following:

wget --http-user=araujo@pythian.com --ask-password "file_url" -O file_name OR curl --user araujo@pythian.com --cookie-jar cookie-jar.txt --location-trusted "file_url" -o file_name

Last but not least: Verify your downloads!

Regardless of the method you use, it’s good practice to verify the digest of the downloaded files to ensure they are indeed the original files and haven’t been tampered with.

The Oracle download sites always provide digests for the available files in the form of checksums, SHA-1 or MD5 hashes. To verify that the downloaded files are ok, simply execute the corresponding command, as shown in the examples below, and compare the output string with the value shown on the download site:

For checksum:

[araujo@client test]$ cksum p17027533_121010_Linux-x86-64.zip 4109851411 3710976 p17027533_121010_Linux-x86-64.zip

For MD5:

[araujo@client test]$ md5sum p17027533_121010_Linux-x86-64.zip 48a4a957e2d401b324eb89b3f613b8bb p17027533_121010_Linux-x86-64.zip

For SHA-1:

[araujo@client test]$ sha1sum p17027533_121010_Linux-x86-64.zip 43a70298c09dcd9d59f00498e6659063535fee52 p17027533_121010_Linux-x86-64.zip

↧

Oracle RAC on the Cloud, Part 1

November 12, 2013, 7:40 am

≫ Next: Oracle RAC on the Cloud, Part 2

≪ Previous: How to Download Oracle Software Using WGET or CURL

I’ve been working on moving a lot of the testing and R&D work I do away from local virtual machines and onto cloud environments, for a few reasons:

I can avoid carrying around a laptop all the time, and rather log onto the cloud wherever I happen to be
It’s easy to scale down and scale up capacity as required
You pay for what you use
Bandwidth is fast and cheap

One gap so far has been anything involving Oracle RAC. A number of attempts have been made to make it run under Amazon Web Services, notably Jeremy Schneider (blog post link) and Jeremiah Wilton. My experimentation with Amazon cloud environments has hit two major roadblocks. The first is lack of shared storage: the block-level (EBS) storage product can only be mounted my one machine at a time, while RAC is built on the concept of shared disk. The second issue is networking: Oracle RAC expects to manage its own IP addresses, creating a gaggle of VIPs and SCAN IPs. But Amazon and similar cloud providers require all IP addresses to be managed through their own API.

Enter Gandi, stage left

So it was with interest that I read about hosting provider Gandi’s new private VLAN service. They claim to offer layer-2 network services, much like the vSwitches in VMWare, network bridges in Oracle VM, or plain old physical layer-2 switches. By looking like a real switch, they would allow Oracle’s grid infrastructure to manage IP addresses like it expects to.

The issue of shared storage remains; I have yet to find any public cloud provider that offers true shared block storage. For a testing environment, though, we can simulate shared storage by setting up a NFS server that shares its own local disk with the RAC nodes. Highly available it is not, but at least it should let us set up and run the grid infrastructure.

Gandi is a provider I’ve used for domain name registration in the past, but I guess they’ve started IaaS hosting much along the lines of Amazon, including flexible per-hour charges. And unlike Amazon, they offer very flexible resizing of servers: RAM, disk space, networks can be changed dynamically, often without even a reboot.

To try this out, the first step is naturally to sign up with the service. Gandi actually offers a “free trial”: 30,000 free credits to try out the service. (Sounds like a lot, but a credit is actually worth a fraction of a cent). To take advantage, go to the trial page create an account, and you’ll be asked to describe how you’re planning to use the service. In my case, I was approved within an hour, and actually got 60,000 credits (2 servers?).

The setup

So here’s the setup I’m thinking of:

Name	Description	RAM	Data disk size	Data disk name
server01	NFS and gateway	256MB	40GB	datadisk01
rac01	RAC node 1	1GB	20GB	rac01data
rac02	RAC node 2	1GB	20GB	rac02data

A few notes about the config: server01 will act as shared storage as well as the only access point to the Internet, so all inbound access will be via server01. I’ll use either SSH or SSH port forwarding for access, though a more permanent solution would probably involve a VPN like OpenVPN. (I can put together a VPN walkthrough if there’s enough interest, though there are likely already good ones online). server01 will also host internal DNS and DHCP services for our little network, saving some tedious /etc/hosts configuration. Network-wise, in addition to the built-in globally-routable IP addresses, I’ll be adding two private VLAN networks: the RAC public and RAC private networks. Ideally we could remove the internet-routable IP addresses from rac01 and rac02 completely. And although the management interface permits this, the Gandi boot-up scripts didn’t like the config at all and resulted in a non-bootable VM, so the globally routable IPs will stay for now.

You may also notice that the RAC servers only have 1GB of RAM when RAC officially requires 4. This is purely a cost saving during the initial install, taking advantage of the capability to change sizes later. Each server has two mountpoints: a default 3GB system “/” partition, plus a data disk to store shared data (for server01) and Oracle binaries (for the rest).

On the VLANs, I’ll use 10.100.0.x addresses in RAC-public, and 10.100.1.x addresses in RAC-private.

Firing up the VM

Creating the servers from the GUI, as per the table above.

On the server side, I’m using CentOS 6.4 64-bit as the operating system, the default system disk, and a data disk for the ORACLE_HOME and eventual data. Oracle Linux would be preferable as it’s a certified OS with Oracle 12c, but Gandi does not supply an install image. CentOS is anyways very very similar, except for a few critical differences we’ll get to later. Selecting to use a SSH key security, and pasting in a SSH key I already have. At first I got the error message
“This is not a valid public SSH key” before realizing that the key must be in OpenSSH rather than ssh.com format. Fortunately ssh-keygen can do the key format conversion:

ssh-keygen -i -f id_rsa.pem

The first server created fine, but the VLAN creation errored out with an internal error. Creating a support ticket, which, I heard some hours later, is being sent to the development team for investigation. Good thing this isn’t anything critical! Trying to create a second server, I got an error message that I have run out of “disk quota”. And re-reading the e-mail about the free credits, it looks like they did put a restriction on disk space. So one way or another, you do need to give them some money. I bought their entry-level package of 150k credits for $16.58. After the order went through, not only the server creation worked, but VLANs work too.

Using the “interfaces” tab to create new the RAC-Public and RAC-Private VLANs, and attaching each to all three servers.

Configuring the NFS server

Logging onto the servers via SSH as the root user using the SSH key added during the install, using the public IP listed in the Gandi console. And doing some basic OS-level config.

Setting up a simple static network config on server01 for the local net:

cd /etc/sysconfig/network-scripts
cat > ifcfg-eth1 <<-EOF
DEVICE=eth1
IPADDR=10.100.0.1
NETMASK=255.255.255.0
ONBOOT=yes
NAME=rac-public
EOF
cat > ifcfg-eth2 <<-EOF
DEVICE=eth2
IPADDR=10.100.1.1
NETMASK=255.255.255.0
ONBOOT=yes
NAME=rac-private
EOF
service network restart

Since this server is accessible over the Internet, we need a basic firewall:

iptables -F INPUT
# Allow existing connections
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
# Loopback traffic
iptables -A INPUT -i lo -j ACCEPT
# RAC public network
iptables -A INPUT -i eth1 -j ACCEPT
# RAC private network
iptables -A INPUT -i eth2 -j ACCEPT
# SSH incoming
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
# Log and reject everything else
iptables -A INPUT -m limit --limit 2/minute -j LOG --log-prefix "iptables-input: " --log-level info
iptables -A INPUT -j REJECT
service iptables save

If you want to be even more secure, you could restrict the source IPs for SSH access. If you always come from static IP 1.2.3.4, you could replace the SSH incoming line with:

iptables -A INPUT -s 1.2.3.4 -p tcp --dport 22 -j ACCEPT

Getting ready for server01 to be a NFS server. Gandi’s scripts automatically mount the data disk at /srv/datadisk01 (NFS parameters taken from the great guide on oracle-base.com)

mkdir /srv/datadisk01/oradata /srv/datadisk01/dl
cat > /etc/exports <<-EOF
/srv/datadisk01/oradata 10.100.0.0/24(rw,sync,no_wdelay,insecure_locks,no_root_squash)
/srv/datadisk01/dl 10.100.0.0/24(rw,no_root_squash)
EOF

yum -y install nfs-utils
service rpcbind start
service nfs start
chkconfig rpcbind on
chkconfig nfs on

DNS and DHCP

We’ll now set up the DHCP and DNS using the wonderfully easy-to-configure dnsmasq. You may not have heard of dnsmasq before, but if you have a home wireless router, you’re likely using it already for DNS forwarding. It also has the capability to do authoritative DNS, which we’ll use here.

First, we need to create the /etc/hosts file that will both do local hostname resolution and be the source for dnsmasq’e entries:

cat >> /etc/hosts <<-EOF
# RAC public
10.100.0.1      server01-pub
10.100.0.2      rac01-pub
10.100.0.3      rac02-pub
10.100.0.12     rac01-pub-vip
10.100.0.13     rac02-pub-vip
# SCAN IPs
10.100.0.100    rac-cluster
10.100.0.101    rac-cluster
10.100.0.102    rac-cluster
# RAC private
10.100.1.1      server01-priv
10.100.1.2      rac01-priv
10.100.1.3      rac02-priv
EOF

And now we can install dnsmasq itself. It requires a small change to the default configuration to get DHCP running: creating a dummy dynamic DHCP range, and configuring it to assign IPs for rac01/02 public and private networks.

yum -y install dnsmasq
cat >> /etc/dnsmasq.conf <<-EOF
# Dummy DHCP range to enable the DHCP server
dhcp-range=10.99.99.99,10.99.99.102,12h
# Static DHCP entries for the RAC servers; addresses come from /etc/hosts
dhcp-host=rac01-priv
dhcp-host=rac01-pub
dhcp-host=rac02-priv
dhcp-host=rac02-pub
EOF
chkconfig dnsmasq on
service dnsmasq start
netstat -anp | grep dnsmasq
# Make sure it's running: you should see lines for 0.0.0.0:53 and 0.0.0.0:67 here

Downloading Oracle 12c

And we might as well kick off an Oracle software download to run while the rest of the config is done. And why not use the latest and greatest, Oracle 12c? It’s possible to simply download the archives to your local machine and transfer to the cloud server using scp, but those are big files and my local Internet isn’t _that_ fast, so it’s definitely preferable to download from the server directly. The recent blog post by Andre Araujo pointed me to an easy way to do the download on YouTube, at least for Firefox users. Paraphrasing the video, you need to go to the download page, start the download locally, open the FF download manager, right-click “Copy Download Link”, and feed that literal link to wget. No messing with cookies required. The local download can then be cancelled.

The wget command will end up looking something like this:

cd /srv/datadisk01
mkdir dl oradata
cd dl
wget http://download.oracle.com/otn/linux/oracle12c/121010/linuxamd64_12c_database_1of2.zip?AuthParam=(from Firefox copy) &
wget http://download.oracle.com/otn/linux/oracle12c/121010/linuxamd64_12c_database_2of2.zip?AuthParam=(from Firefox copy) &
wget http://download.oracle.com/otn/linux/oracle12c/121010/linuxamd64_12c_grid_1of2.zip?AuthParam=(from Firefox copy) &
wget http://download.oracle.com/otn/linux/oracle12c/121010/linuxamd64_12c_grid_2of2.zip?AuthParam=(from Firefox copy) &

We’ll need parts 1 and 2 of the database download, and parts 1 and 2 of the grid infrastructure download, using the Linux x86-64 platform.

Even on a reasonably fast network like Gandi’s, it still took over an hour to run. So while downloading, we can move onto part 2 (coming soon), where we configure the RAC hosts themselves.

Lessons learned

Don’t count on the “free trial” at Gandi to actually get usable infrastructure, but $18 won’t break the bank either for this type of infrastructure.
Once you pony up the money, though, Gandi’s VLAN service does do what it advertises
Even with a small 2-node cluster, DHCP and DNS make configuration easier and less error-prone
At one point while setting up networking, I managed to make the network unreachable. And while Gandi provides an emergency console tool, I wasn’t able to get it to work: it showed console messages all right, but no login prompt. So be very careful about any network or bootup configs that could potentially lock you out.

↧

Oracle RAC on the Cloud, Part 2

November 13, 2013, 7:16 am

≫ Next: Oracle RAC on the Cloud, Part 3

≪ Previous: Oracle RAC on the Cloud, Part 1

In part 1 of this series, we talked about some of the challenges of setting up Oracle RAC on a public cloud provider, and went on to order some VMs from provider Gandi, and finally configuring an NFS server for shared storage. In this post, we move on to configuring the RAC servers themselves, rac01 and rac02.

Network config

After starting up the RAC nodes in the GUI, we can log in via the SSH key we created. The first order of business is to set up networking. The eth1 and eth2 VLAN interfaces by default have no network configuration at all. We’ll set up a configuration to use a DHCP, and sending a hostname to the DHCP server so it knows which IP address to give us. This ties into the dnsmasq configuration we set up in part 1, to automatically assign IP addresses to the eth1 and eth2 private-VLAN network interfaces.

cd /etc/sysconfig/network-scripts
cat > ifcfg-eth1 <<EOF
DEVICE=eth1
BOOTPROTO=dhcp
DHCP_HOSTNAME=rac01-pub
ONBOOT=yes
NAME=rac-public
EOF
cat > ifcfg-eth2 <<EOF
DEVICE=eth2
BOOTPROTO=dhcp
DHCP_HOSTNAME=rac01-priv
ONBOOT=yes
NAME=rac-private
EOF
service network restart
ip address list

And if all is working properly, we should see eth1 and eth2 each assigned an IP address.

Since this server is accessible over the Internet, we need a basic firewall, just like server01:

iptables -F INPUT
# Allow existing connections
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
# Loopback traffic
iptables -A INPUT -i lo -j ACCEPT
# RAC public network
iptables -A INPUT -i eth1 -j ACCEPT
# RAC private network
iptables -A INPUT -i eth2 -j ACCEPT
# SSH incoming
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
# Log and reject everything else
iptables -A INPUT -m limit --limit 2/minute -j LOG --log-prefix "iptables-input: " --log-level info
iptables -A INPUT -j REJECT
service iptables save

Fun with hostnames

For DNS resolution, we simply point ourselves to the DNS server on server01, 10.100.0.1:

cat > /etc/resolv.conf <<-EOF
server 10.100.0.1
EOF

One issue I ran into with hostnames: the grid infrastructure install expects its public network to be associated with the hostname of the machine. But in the Gandi setup, the hostname is associated with the Internet-facing IP. Modifying /etc/sysconfig/network-scripts to change the hostname to rac01-pub.

perl -pi -e 's/HOSTNAME=rac01\\n/HOSTNAME=rac01-pub\\n/' /etc/sysconfig/network
hostname rac01-pub

But even after rebooting the hostname is still rac01. More digging showed it to be part of Gandi’s auto-configuration scripts. But conveniently they provide a file, /etc/sysconfig/gandi, where specific configurations can be turned off. There are a few that won’t play well with RAC: the hostname as mentioned above, but also the mountpoint name: the grid infrastructure expects the same mountpoint names for both RAC nodes, so the default /srv/rac01data mountpoints won’t work. And lastly, we have our DNS server, so we don’t want the Gandi configuration to mangle our resolv.conf.

The relevant section of the /etc/sysconfig/gandi looks like this:

# set to 0 to avoid hostname automatic reconfigure
CONFIG_HOSTNAME=1

# set to 0 to avoid nameserver automatic reconfigure
CONFIG_NAMESERVER=1

# allow mounting the data disk to the mount point using the disk label
CONFIG_ALLOW_MOUNT=1

Changing:

perl -pi.orig -e 's/CONFIG_HOSTNAME=1/CONFIG_HOSTNAME=0/; s/CONFIG_NAMESERVER=0/CONFIG_NAMESERVER=1/; s/CONFIG_ALLOW_MOUNT=1/CONFIG_ALLOW_MOUNT=0/' /etc/sysconfig/gandi

Now our config won’t be clobbered on next reboot.

And speaking of the mountpoint, we need to undo what Gandi did and create our own mountpoint on /u01:

umount /srv/rac01data
mkdir /u01
cat >> /etc/fstab <<-EOF
LABEL=rac01data /u01    ext3    rw,nosuid,nodev,noatime 0 0
EOF
mount /u01

A bit about the /etc/fstab line: Gandi created our filesystem with a filesystem label, so we can use this to locate the disk even if the actual device node /dev/xvdb changes. The mountpoint options are taken form Gandi’s defaults. Notably, “noatime” avoids doing a disk write every time a file is accessed.

Filessystems and NFS

NFS client config:

mkdir -p /srv/datadisk01/oradata /srv/datadisk01/dl
cat >> /etc/fstab <<-EOF
10.100.0.1:/srv/datadisk01/oradata /srv/datadisk01/oradata nfs rw,bg,hard,nointr,tcp,vers=3,timeo=600,rsize=32768,wsize=32768,actimeo=0 0 0
10.100.0.1:/srv/datadisk01/dl /srv/datadisk01/dl nfs rw,bg,tcp 0 0
EOF

yum -y install nfs-utils
service rpcbind start
chkconfig rpcbind on

mount /srv/datadisk01/dl
mount /srv/datadisk01/oradata

And although Gandi gives us a minimal amount of swap, we’ll need 4GB to make the Oracle installer happy. Since the system disk is only 3GB it has to come from the data disk.

dd if=/dev/zero of=/u01/swapfile bs=1048576 count=4096
mkswap /u01/swapfile
swapon /u01/swapfile
# Make it persistent
echo "/u01/swapfile swap swap defaults 0 0" >> /etc/fstab

And Oracle checks for /dev/shm, just in case you want to use memory_target. Adding a config here to make it owned by Oracle to avoid ugly error messages from dbca later on. It you get a “ORA-00600: internal error code, arguments: [SKGMHASH]…” error message, it may very well be permission issues on /dev/shm. The “54321″ userid is the numeric ID of the oracle user that the preinstall RPM will create later on.

echo "tmpfs /dev/shm tmpfs user=54321,group=54321,mode=755 0 0" >> /etc/fstab
mount /dev/shm

Oracle prerequisites

Now that storage is set up, we can start looking at Oracle stuff.

Going from the official Oracle Grid Infrastructure install guide

Browsing public-yum.oracle.com’s Oracle Linux 6 “latest” repository to find the latest database preinstall RPM. As of this writing, it’s 1.0-8.

cd
wget http://public-yum.oracle.com/repo/OracleLinux/OL6/latest/x86_64/getPackage/oracle-rdbms-server-12cR1-preinstall-1.0-8.el6.x86_64.rpm
yum -y localinstall oracle-rdbms-server-12cR1-preinstall-1.0-8.el6.x86_64.rpm

It complains about a UEK dependency. On this kind of cloud environment we can’t install custom kernels like UEK, so we do need to stay with default kernels. And we’re running CentOS anyway.

flashdbba has an excellent blog post on this subject; he listed a two-line workaround of downloading the full UEK kernel package, and installing it with the –justdb and –nodeps options. I can confirm that it works, but it requires a big UEK package download, plus results in warnings about missing dependencies every time yum is run.

So instead, we’re going to create a dummy RPM package to satisfy the dependency. It won’t have any files or scripts, but will match the name that the preinstall RPM is looking for.

To actually turn this specification into a RPM, we need the the rpm-build package installed as well:

yum -y install rpm-build
mkdir -p ~/rpmbuild/SPECS
cat > ~/rpmbuild/SPECS/dummy-kernel-uek.spec <<-EOF
Summary: Dummy UEK kernel RPM
Name: kernel-uek
Version: 2.6.39
License: GPLv2
Release: 1.dummy

%description
Dummy RPM to satisfy oracle preinstall RPM dependencies

%build
echo OK

%files
EOF
rpmbuild -bb dummy-kernel-uek.spec

And if all went well, we should see a kernel-uek package in ~/rpmbuild/RPMS/x86_64

cd ~/rpmbuild/RPMS/x86_64/
yum -y localinstall kernel-uek*.rpm

Back at the original dependency package:

cd
yum -y localinstall oracle-rdbms-server-11cR1-preinstall-*

50m of dependencies. Downloading them all. Setting up the NFS server was so much less work :-)

And we might as well use sudo from the Oracle account rather than using root:

yum -y install sudo
visudo

And uncommenting the line

%wheel  ALL=(ALL)       NOPASSWD: ALL

Adding oracle to the “wheel” group so it can run sudo:

usermod -G wheel oracle

Passwordless SSH

Now that we have an Oracle user, the installer will require passwordless SSH to be set up. For now, it’s just from the Oracle user to itself:

su - oracle
mkdir -p .ssh
cd .ssh
ssh-keygen -t rsa -b 2048 -N "" -f id_rsa
cat id_rsa.pub >> authorized_keys
chmod 0600 authorized_keys

Testing out the SSH, which lets us add the host key.

ssh rac01-pub
exit

And setting a password for Oracle, which requires the passwd tool itself:

sudo yum -y install passwd
sudo passwd oracle

Creating directories for the Oracle install:

sudo mkdir -p /u01/app
sudo chown oracle:oinstall /u01/app
sudo chown oracle:oinstall /srv/datadisk01/oradata

And now it’s time to repeat it all the steps from this post, this time on rac02, replacing rac01 with rac02 where appropriate.

After that, it’s time for part 3

Lessons learned

If you’re using Gandi hosting for anything out of the ordinary, familiarize yourself with /etc/sysconfig/gandi
Dummy RPMs work very well to circumvent Oracle’s OS compatibility checks

↧

Oracle RAC on the Cloud, Part 3

November 13, 2013, 4:47 pm

≫ Next: Step-by-Step: Move Oracle DB to New Oracle Home on Same Windows Server

≪ Previous: Oracle RAC on the Cloud, Part 2

This is part 3 of a multipart series of getting Oracle RAC running on a cloud environment. In part 1, we set up a NFS server for shared storage. In part 2, we set up OS components for each RAC server. Now we finish up the OS configuration and move to the Oracle grid infrastructure.

Passwordless SSH, take two

Now that we have Oracle users on both rac01 and rac02, we need to configure passwordless SSH between them. (It’s also possible to do it from the installer, but I prefer to do it myself)

On rac01-pub as Oracle:

cd ~/.ssh
scp rac02-pub:$PWD/id_rsa.pub rac02.pub
(enter oracle user password, and confirm the hostkey addition)
cat rac02.pub >> authorized_keys

And on rac02-pub, again as oracle:

cd ~/.ssh
scp rac01-pub:$PWD/id_rsa.pub rac01.pub
(shouldn't have a password prompt, but you can confirm hostkey addition)
cat rac01.pub >> authorized_keys

Getting RAM for the install

Before we run the Oracle installer, we should expand the physical RAM for each machine. This can be done from the Gandi control panel for each server. When I first tried this I got a quota error, and had to raise a support ticket (and wait for a response) to get the quota raised. A second issue with the RAM is that the the VM doesn’t see the full amount of RAM allocated: when I tried firing up a 4GB instance, Linux only saw 3667716k available, and the Oracle installer promptly complained about insufficient memory.

So instead of 4096MB of memory, we’re going to adjust rac01 and rac02 to have 4800MB. After adjusting in the control panel, you may see that the operation is complete within a minute or so, but the server didn’t consistently get more memory. So while logged onto rac01 as oracle, have a look:

for host in rac01-pub rac02-pub; do echo $host; ssh $host free; done

If each node shows 4388612 total memory, you’re golden. Otherwise, reboot the nodes.
(And yes, 700mb seems like an awful amount of memory to simply not be available to the OS; I’m wondering what’s using the space?)

Getting ready for the installer

By now the Oracle software download should be complete, and we need to give the downloaded files .zip extensions and install an unzipper to use it. (Note to Oracle packagers: unzip isn’t _all_ that common in the Linux world, and gzip provides better compression rates anyways. Why not do tarballs?)

Back on rac01:

cd /srv/datadisk01/dl
for file in *zip?AuthParam*[0-9a-f]; do sudo mv "$file" "$file".zip; done
sudo yum -y install unzip
for file in *.zip; do sudo unzip $file; done

Now setting up a remote VNC so that we can actually run the Oracle installer, as well as a firewall rule to let us connect:

sudo yum -y install tigervnc-server xterm twm
sudo iptables -I INPUT 4 -p tcp --dport 5901 -j ACCEPT

Starting the server; you’ll need to supply a password the first time around. (still logged in as oracle)

vncpasswd
vncserver :1

Now we start a VNC viewer locally. If you don’t have one already, you can download one from www.realvnc.com

Grid Infrastructure Install

Connecting to display 1 of the server external IP, you should get an xterm window if all went well. Running the installer:

cd /srv/datadisk01/dl/grid
./runInstaller

Skipping software updates, and doing a cluster install, using a standard cluster. Doing an advanced install. Using the default language. Under “grid plug and play” we need to set up the node naming. Using cluster name “rac-cluster”, and SCAN name “rac-cluster” as defined in /etc/hosts earlier. On the cluster node screen, we should see that rac01-pub has been detected. Adding rac02-pub too, with rac02-pub-vip as its VIP address.

Now comes the validation, where we learn if SSH, naming etc were properly set up. If all goes well, you’ll make it to the network interface usage screen. Here we need to make changes: eth0 shouldn’t be used, eth1 is public, and eth2 is private. The management repository is a choice: it takes up memory and install time, but it does allow us to use such things as QoS management, and it can only be created at install time. I chose to skip.

For storage, we’re using a shared file system: the NFS we created. Using external redundancy since it’s a single disk anyway. Doing the same for the voting disk.

Not using IPMI. We’ll also leave the ASM oper group blank, and accept the warning.

Using the default “/u01/app/oracle” and “/u01/app/12.1.0/grid” directories for ORACLE_BASE and grid home. Using /u01/app/oraInventory for oraInventory. You can either run sudo yourself or let the installer do it. I like to run it myself to have more control over re-running and deconfigs if required.

Now it’s time for the prerequisite checks. If all previous steps have succeeded, you shouldn’t see any warnings at all.

Saving the response file and kicking off the install itself.

Running the orainstRoot.sh from the oraInventory, plus root.sh from the grid home. Running on rac01 first.

Just got errors starting ASM:

PRCR-1079 : Failed to start resource ora.asm
CRS-2672: Attempting to start 'ora.asm' on 'rac01-pub'
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-00600: internal error code, arguments: [SKGMHASH], [1], [18446744073549507196], [0], [0], [], [], [], [], [], [], []
. For details refer to "(:CLSN00107:)" in "/u01/app/12.1.0/grid/log/rac01-pub/agent/ohasd/oraagent_oracle/oraagent_oracle.log".

CRS-2674: Start of 'ora.asm' on 'rac01-pub' failed
CRS-2679: Attempting to clean 'ora.asm' on 'rac01-pub'
CRS-2681: Clean of 'ora.asm' on 'rac01-pub' succeeded

CRS-2674: Start of 'ora.asm' on 'rac01-pub' failed
2013/11/05 21:28:33 CLSRSC-113: Start of ASM instance failed

Preparing packages for installation...
cvuqdisk-1.0.9-1
2013/11/05 21:28:54 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

But there is no ASM here, so ignoring the error.

On rac02, it didn’t even try starting ASM with root.sh.

Running the rootinventory and root.sh since we have sudo running. It does take some time to run as the grid infrastructure is shut down and started up a few times.

Database install

Now that the grid infrastructure is in place, we can move onto the actual database install. We can re-use the same VNC window to install:

Skipping software updates and skipping the DB creation too (software only). Picking a RAC install. At this point we should see both nodes detected. Using the default language.

Installing enterprise edition with default home locations. In the group selection, it won’t let me select dba, even though the group was installed by the preinstall RPM. For now I’ll select oinstall.

The rest are default.

Running root.sh, which this time is very short.

Database creation assistant

With a database home we can run the creation assistant. But first, working on a hugepage configuration. /proc/meminfo is missing the HugePages lines entirely, and it does look like, regrettably, the supplied kernel does not support hugepages:

[oracle@rac01-pub ~]$ zgrep HUGETLB /proc/config.gz
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set

And a quick web search seems to show that, while custom kernel support has been a long-standing user request at Gandi, it’s still not available.

So onto the install. From the same VNC session:

/u01/app/oracle/product/12.1.0/dbhome_1/bin/dbca

Creating a new database. Using the advanced install with:

RAC database (default)
Admin-managed
General purpoase/transaction processing
DB name: racdb
non-PDB
Selecting both nodes to run on
Configuring EM express
Running CVU periodically
Picking a password
File system storage
/srv/datadisk01/oradata/oradata – default
Default FRA, using the default size of 5G
Archiving disabled
Skipping sample schemas and database value
Unselecting automatic memory management
Leaving the remaining parameters default

And we’re installed and have a database. It can be tested via SQL*Plus:

export ORACLE_SID=racdb
export ORAENV_ASK=NO
. oraenv
export ORACLE_SID=racdb1
sqlplus "/ as sysdba"

If all went well, you should see a SQL prompt:

[oracle@rac01-pub ~]$ sqlplus "/ as sysdba"

SQL*Plus: Release 12.1.0.1.0 Production on Wed Nov 6 23:38:51 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics
and Real Application Testing options

SQL>

And that’s it for the series. I made it through with 50,000 credits remaining in my Gandi account to play with.

Feel free to ping me in case of issues getting running. Some of these steps are a combination of several iterations as bugs were worked out, so it’s likely that there are some gremlins still lurking, and I’ll try and incorporate fixes as issues are discovered.

Lessons Learned

Yes, Oracle RAC can installed cleanly on a cloud environment, and at $17, the price is right
True shared storage from a cloud provider is still hard to come by, limiting the high-availability potential
There are quite a few extra steps required to satisfy the RAC installer and its prerequisite checks
In the Gandi environment, you need to overallocate RAM as not all of it is visible to the OS
The lack of hufepage support in the Gandi kernel (and complete lack of custom kernel support) further increases memory requirements
A dummy oracle-release RPM is all we need to keep the OS prerequisite checks happy

↧

Step-by-Step: Move Oracle DB to New Oracle Home on Same Windows Server

November 25, 2013, 7:40 am

≫ Next: Log Buffer #348, A Carnival of the Vanities for DBAs

≪ Previous: Oracle RAC on the Cloud, Part 3

I have been thinking about writing a Pythian blog for long time, and today I finally took the opportunity.

In the DBA life, it’s common to get a request to move the database across servers due to a RDBMS upgrade plan or receiving new hardware. It’s not common, however, to receive a request to move RDBMS Oracle home within the same server. This request may arrive due to improper planning of creating the Oracle home into the root mount point on Unix platforms, and C:\\ drive on Windows platforms(system mountpoint/drive).

The cloning feature introduced by Oracle from the 10gR2 version become handy to work with on this request. The use of clone.pl script on Unix platform is quite straightforward, as we have full control over Unix processes. The thread architecture on Windows platform makes this a bit different, but not complex.

Let’s assume the current Oracle home is located at “C:\\oracle\\product\\11.1.0″ directory, and the new directory planning to move is “D:\\oracle\\product\\11.1.0″ for example. As usual, keep the database name as TEST. The steps below describe the activities required.

Step 1. Log into Windows server as local server user, which is part of local administrator and ora_dba groups. Let the existing Oracle database run and start to copy the entire contents from “C:\\oracle\\product\\11.1.0″ directory to “D:\\oracle\\product\\11.1.0″ directory. Ensure the copy process is completed without any issues.

Step 2. Take existing Oracle home inventory details for reference. Open a command prompt window (Window I) and execute these commands.

C:\>set ORACLE_HOME=C:\\oracle\\product\\11.1.0
C:\>set PATH=C:\\oracle\\product\\11.1.0\\OPatch;C:\\oracle\\product\\11.1.0\\bin;%PATH%
C:\>opatch version
C:\>opatch lsinventory

Step 3. Open a new command prompt (Window II) and set the environment variables appropriately.

C:\>set PERL5LIB=D:\\oracle\\product\\11.1.0\\perl\\5.8.3\\lib ==> This depends on Perl version exists under oracle home, may differ from version to version.
C:\>set PATH=D:\\oracle\\product\\11.1.0\\perl\\5.8.3\\bin\\MSWin32-x86-multi-thread;%PATH%

Step 4. Run the clone.pl script from Window II.
C:\>perl clone.pl ORACLE_HOME=”D:\\oracle\\product\\11.1.0″ ORACLE_HOME_NAME=”OraDB11gR1_home” ORACLE_BASE=”D:\\oracle”

Execution of this command should complete without any issues. Review the log file C:\\Program Files\\Oracle\\Inventory\\logs\\cloneActions<DATE>.log for the verification.

Excerpts from the log file:
————————————
INFO: ca page to be shown: false
INFO: exitonly tools to be excuted passed: 0
INFO:
*** End of Installation Page***
The cloning of OraDB11gR1_home was successful. ==> Should get this message.

Step 5. Execute the following commands from window II for the newly cloned home configuration verfication.
C:\\>set ORACLE_HOME=D:\\oracle\\product\\11.1.0
C:\\>set PATH=D:\\oracle\\product\\11.1.0\\OPatch;D:\\oracle\\product\\11.1.0\\bin;%PATH%
C:\\>opatch version ==> Output should match with the output obtained on step 2.
C:\\>opatch lsinventory ==> Output should match with the output obtained on step 2.

Step 6. Get maximum 15 minutes downtime for the database and bring down the TEST database. Open the services utility and stop “OracleOraDB11g_homeTNSListener” service and “OracleServiceTEST” service.

Step 7. On Window I, execute this command to delete the existing listener service from the server.
C:\\>C:\\oracle\\product\\11.1.0\\bin\\netca.bat

Step 8. Execute this command from Window I to delete the existing database services from the server.
C:\\>ORADIM -DELETE -SID TEST

Step 9. Open the services utility and confirmed all the services belongs to old oracle home including “Oracle TEST VSS Writer Service” and “OracleJobSchedulerTEST” are deleted.

Step 10. Invoke Oracle Net Configuration Assistant from Window II to configure new listener service.
C:\\>D:\\oracle\\product\\11.1.0\\bin\\netca.bat

Step 11. Create new database service from new oracle home from Window II.
C:\\>ORADIM -NEW -SID TEST -SYSPWD *** -STARTMODE auto -SPFILE

Note: This command would start the database instance too.

Step 12. Open services utility and confirmed the following services got created from new oracle home.Modify the “Startup Type” for these services accordingly.
OracleOraDB11gR1_homeTNSListener
OracleServiceTEST
Oracle TEST VSS Writer Service
OracleJobSchedulerTEST

Now work with your application administrator, and confirm that everything works fine :)

↧

Log Buffer #348, A Carnival of the Vanities for DBAs

November 27, 2013, 1:20 pm

≫ Next: Batched Table Access

≪ Previous: Step-by-Step: Move Oracle DB to New Oracle Home on Same Windows Server

With the holiday season fast approaching (or is it slow?), data bloggers have already adopted a festive mood, and this Log Buffer edition jubilantly captures and reflects that, and much more.

Oracle:

On December 4, 2013, Oracle will host a customer webcast to acquaint customers with the Oracle SuperCluster M6-32, Oracle’s most powerful engineered system for in-memory Oracle Database performance, Database-as-a-Service and application consolidation.

The ETL logic in BI Apps uses parameters in packages, interfaces, load plans, and knowledge modules (KM) to control the ETL behaviors.

Bobby says that the default degree may cause performance problems.

Need an excuse to extend your trip to Brazil after Oracle CloudWorld São Paulo?

Are you ready to take the next step to modernize your manufacturing plants but not sure what tools are available?

SQL Server:

Bob Pusateri is very happy to be making the trek up to Appleton, Wisconsin to speak at FoxPASS next week!

What To Do If Your Database Catches Fire?

SQL Server Data Tools (SSDT) – October 2013 update

As Thanksgiving Approaches, Remember What SQL Server Pros Can Get By Giving

There are a lot of SQL Server management tools for monitoring your environment—but using ones that keep DBAs like you ahead of the game and proactively finding problems before your user base does have greater value.

MySQL:

You may think that you already know what the opposite of “DISABLED” is, but with MySQL Event Scheduler you’ll be wrong.

MySQL has an exceptional track record of introducing minor fixes that cause major breakages.

You should take care when changing schema in a MySQL/InnoDB database cluster.

Connector/Python v1.1.3 is available for testing since last week. It is a “beta” release.

Query optimization versus caching

↧

Batched Table Access

November 29, 2013, 11:33 am

≫ Next: Log Buffer #346, A Carnival of the Vanities for DBAs

≪ Previous: Log Buffer #348, A Carnival of the Vanities for DBAs

When I first saw the suffix BATCHED in an Oracle execution plan from 12c, I was curious to see what is hidden behind and how it works. I had some spare time for testing and wanted to share my findings with you here.

Intro

Here is what I’m talking about:

-------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |           |  1000 |   990K|   148   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1        |  1000 |   990K|   148   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_Y_INDX |  1000 |       |     5   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("Y"=TO_NUMBER(:1))

Line 1 of this very simple execution plan shows how Oracle 12c added a suffix BATCHED to the table access by a B*Tree index rowsource. I was thinking about the reasons behind this change and how it could be implemented before starting my tests.

Why Oracle would want to “batch” table access
Usually large index range/full/skip scans with subsequent table access running serially cause lots of single block reads of a table. Depending on the clustering of the data, the number of table block reads could be as high as the number of ROWIDs fetched from index leaf blocks up to the next rowsource. In case of a serial execution plan it means that query performance depends on how fast single random table read is. Say you need to read 1000 random table blocks located far away from each other and average read of 1 block takes 5ms, then you need about 5 seconds to execute such query. But if the storage subsystem can handle concurrent IO requests well enough, and you were able to ask it for 1000 blocks someway concurrently or in parallel, transparent for the end user session, then it could take less wall clock time for a user while putting more pressure on the OS, storage and connectivity to the storage.

How Oracle optimizes IO already
As far as I know, Oracle can and does a few cunning things with IO even in pre-12.1 releases. Here is a(n incomplete, most likely) list with example optimizations you may see:

Within NESTED LOOP joins there are couple of strategies Oracle uses: NL-join batching and moving TABLE ACCESS out of a join (I’ve no idea how it is called exactly).
“Prefetching” with ‘db file parallel read’ – as described by Tanel Poder here (it gives you a very nice idea of what ‘db file parallel read’s are)
In case of a “cold” buffer cache Oracle may choose to read ahead, and instead of reading just a single block when you think it is enough, Oracle may opt to reading multiple physically adjacent on disk blocks to the cache (aka ‘db file scattered read’). Sometimes it could hurt (a lot) the application performance, sometimes it doesn’t matter, but the thing is: it’s a “normal” thing to experience multi-block buffered reads on what should probably be single block reads.

Test Case

Based on my understanding of what Oracle can possibly do I have created a test scenario which could be used to find out more things behind BATCHED table access. Here is the setup:

drop table t1 cascade constraints purge;

create table t1
(
  id              integer,
  x               integer,
  y               integer,
  pad             varchar2(4000)
);

insert /*+ append */ into t1
select
  rownum,
  mod(rownum, 1000),
  floor((rownum-1)/1000),
  lpad('x', 1000, 'x')
from
  all_source a1, all_source a2
where rownum <= 1e6;

create index t1_x_indx on t1(x);
create index t1_y_indx on t1(y);

exec dbms_stats.gather_table_stats(user, 't1', method_opt=>'for all columns size 1', cascade=>true, no_invalidate=>false);

Very easy. I have created a sufficiently wide table holding 1 million rows with two integer columns following a very bad (T1.X) and very good (T1.Y) clustering of data. Usually it is also important where are you creating this table. Initially I created it in a standard USERS tablespace (i.e., ASSM, non-uniform extent size), but then switched to a MSSM tablespace with uniform extents of 1MB. Looking ahead, it does not make a difference to the test results (at least I could not identify it.)

The test itself:

set linesize 180 pagesize 100 define ^ arraysize 100
col plan_table_output format a180

explain plan for select /*+ index(t1(x)) */ * from t1 where x = :1;
select * from table(dbms_xplan.display);

explain plan for select /*+ index(t1(y)) */ * from t1 where y = :1;
select * from table(dbms_xplan.display);

col spid new_value spid
col curr_date new_value curr_date
select p.spid,to_char(sysdate, 'YYYYMMDDHH24MI') curr_date  from v$session s, v$process p where s.paddr = p.addr and s.sid = userenv('sid');
col tracefile new_value tracefile
select value tracefile from v$diag_info where name='Default Trace File';

alter system flush buffer_cache;
!sleep 1
alter system flush buffer_cache;

select object_id, data_object_id, object_name from dba_objects where owner = user and object_name like 'T1%';

set termout off

exec dbms_session.session_trace_enable(waits=>true, binds=>false)
!strace -tt -p ^spid -o trc_^spid..txt &

spool batched_^curr_date..txt
select /*+ index(t1(x)) */ * from t1 where x = 1;
select /*+ index(t1(y)) */ * from t1 where y = 2;
spool off
set termout on

!orasrp -t --sort=fchela --sys=no ^tracefile orasrp_^spid..txt
!cat orasrp_^spid..txt | grep -A 165 fvkg1sp2b73x

prompt trace: orasrp_^spid..txt
prompt strace: trc_^spid..txt
prompt tracefile: ^tracefile

exit

So the test is also really easy, except for some diagnostic & preparation steps. Basically I’m tracing two statements, which are accessing T1 by two indexes respectively, both at OS and Oracle levels, and then parse Oracle trace file with OraSRP. You may want to use tkprof. I also used it initially but OraSRP has one feature which helps to see the waits with breakdown by object, like this:

                                                                                 --------- Time Per Call --------
Object/Event                                    % Time       Seconds      Calls         Avg        Min        Max
--------------------------------------------  --------  ------------  ---------  ---------- ---------- ----------
TABLE T1 [88650]
    db file parallel read                        68.9%       4.8404s         26     0.1862s    0.0883s    0.2614s
    db file sequential read                      29.9%       2.1023s        147     0.0143s    0.0014s    0.0636s
INDEX T1_X_INDX [88651]
    db file sequential read                       1.1%       0.0746s          5     0.0149s    0.0025s    0.0278s
    Disk file operations I/O                      0.0%       0.0034s          1     0.0034s    0.0034s    0.0034s

Testing

I was using VirtualBox with 64-bit OEL 6.4 and Oracle 11.2.0.4 & 12.1.0.1. I also did (partial) tests on 11.2.0.3 running in OVM on a faster storage, and the results were similar to what I’ve observed with 11.2.0.4.
Both instances were running with a pfile, with following parameters specified:

-- 11.2.0.4
*._db_cache_pre_warm=FALSE
*.compatible='11.2.0.4.0'
*.control_files='/u01/app/oracle/oradata/ora11204/control01.ctl','/u01/app/oracle/fast_recovery_area/ora11204/control02.ctl'
*.db_block_size=8192
*.db_cache_size=300000000
*.db_domain=''
*.db_name='ora11204'
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=ora11204XDB)'
*.filesystemio_options=setall
*.pga_aggregate_target=100000000
*.open_cursors=300
*.processes=100
*.remote_login_passwordfile='EXCLUSIVE'
*.shared_pool_size=420430400
*.undo_tablespace='UNDOTBS1'

-- 12.1.0.1
*._db_cache_pre_warm=FALSE
*.compatible='12.1.0.0.0'
*.control_files='/u01/app/oracle/oradata/ora121/control01.ctl','/u01/app/oracle/oradata/ora121/control02.ctl'
*.db_block_size=8192
*.db_cache_size=300000000
*.db_domain=''
*.db_name='ora121'
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=ora121XDB)'
*.enable_pluggable_database=true
*.filesystemio_options='SETALL'
*.pga_aggregate_target=100000000
*.open_cursors=300
*.processes=100
*.remote_login_passwordfile='EXCLUSIVE'
*.shared_pool_size=420430400
*.undo_tablespace='UNDOTBS1'

Test Process & Observations

Initially I started testing with a default database config, and filesystemio_options set to DIRECTIO. After some random tests, I realized that this cache warm up thing is not what I’m interested in right now and turned it off with a hidden parameter. Overall I think that the test results could be explained in the following:

Test results are inconsistent. This is the most irritating thing. However, after I ran the test multiple times in a row, I get a pretty stable outcome. So I consider the results after multiple consecutive runs of the same test. Usually it is just 2 runs, but sometimes more, especially after an instance restart. I’ve no understanding why it happens and what’s behind the scene of the decisions. Maybe it has something to do with CKPT as Tanel mentions in his post on oracle-l, but I did not check (and honestly don’t want to :))
Both 11g and 12c show that for a table access of scattered data (by T1_X_INDX index) Oracle may batch table access IO using db file parallel reads; on the OS level it is using io_submit/io_getevents calls to run IO with async API if it’s turned on of course; in case of just DIRECTIO in place it uses a bunch of single block reads using pread
Both 11g and 12c can use multi-block access of clustered data (by T1_Y_INDX index) for index range scans (and most likely, full/skip scans too). This is one of the most amusing things: even though I turned off cache warm up, Oracle still can identify that the data I am accessing is well placed altogether, and it decides to read multiple adjacent table blocks at once. 12c, However, behaves differently and by default does not use buffered multi-block table reads
The size of multi-block IO (db file parallel read) is different between 11g and 12c: in 11g it is usually 39, sometimes 19. With 12c, by default the number of requests depends on the client’s fetch size: it is equal to the minimum of fetch size and 127
Looks like the parameter _db_file_noncontig_mblock_read_count does not control the actual number of blocks read with db file parallel read; any value greater than 1 turns this feature on and the size of read requests stays the same (I have tested only setting it to 1, 2, 3, 5)
The word BATCHED appeared in execution plans of 12c is controlled with a new hidden parameter _optimizer_batch_table_access_by_rowid. By default the parameter is set to TRUE, so plans tend to include BATCHED in table access rowsource. In the run-time this setting acts very much similar to 11g behavior, so it reads scattered table data with db file parallel reads, except for the number of IO requests which is min(fetch_size, 127). If _optimizer_batch_table_access_by_rowid is set to FALSE on a session level, for example, then the plans generated by Oracle do not include BATCHED suffix in table access rowsource, but in run-time Oracle still uses multi-block IO in the same way as 11g does, i.e. 39 or 19 IO requests per one call and scattered reads of clustered table data are there as well!

Summary

In 12c Oracle changed some internal code path which deals with the batched table access. But important thing is that the batched table access is not new, so even if you disable it either explicitly with _optimizer_batch_table_access_by_rowid or implicitly with optimizer_features_enable, Oracle will still be able to utilize a similar approach as it was in 11g.

One important thing, of course, is that by default the size of vector IO now depends on the client fetch size. And I can imagine a situation in which this change could make an impact on the application performance after an upgrade.
I have uploaded test script & trace files from 11.2.0.4 and 12.1.0.1 here so if you would like to repeat my tests and compare results – feel free to do that.

↧

Log Buffer #346, A Carnival of the Vanities for DBAs

November 15, 2013, 6:39 am

≫ Next: Running JRE7 alongside JRE6 for Oracle Forms

≪ Previous: Batched Table Access

Economist says that Physics suggest that storms will get worse as the planet warms. Typhoon Haiyan in Philippines, bush-fires in Australia, floods in China, and extreme unpredictable weather across the planet is a sober reminder. Good news is that technology and awareness is rising, and so is the data. Database technologies are playing their part to intelligently store that data and enabling the stakeholders to analyze and get meaningful results to predict and counter the extreme conditions. This Log Buffer Edition appreciates these efforts.

Big Data:

Big Data Tools that You Need to Know About – Hadoop & NoSQL.

Dave Stokes is copying MySQL data to Hadoop with minimal loss of blood.

Mark Rittman is creating a Multi-Node Hadoop/Impala Cluster as a Datasource for OBIEE 11.1.1.7.

In what appears to be a short space of time, there has been a revolution in the data discovery and analytics space.

Is NoSQL less disruptive than we thought and just, well, useful?

Oracle:

Sayan Malakshinov is sharing a little example of index creation on extended varchars.

As almost everyone is interested in data science, take the boot camp to get ahead of the curve.

Guardian was introduced in Oracle Coherence 3.5 as uniform and reliable mean to detect and report various stalls and hangs on data grid members.

Session profile using V$ tables

Lucas Jellema is on the integrity of data in Java applications.

SQL Server:

I got 99 Problems but my Backup ain’t one.

Let’s talk about the case where you want to compare tables to see what’s missing.

How Do You Prevent a SAN Failure? Mike Walsh tells us.

Tracy is going to start introducing some of the Powershell elements that tie audit process together.

Power Pivot: unable to convert a value to the data type requested.

MySQL:

Before there was information_schema and performance_schema, there were InnoDB Monitor tables.

Database schema changes are usually not popular among DBAs or sysadmins, not when you are operating a cluster and cannot afford to switch off the service during a maintenance window.

Will AWS plans for PostgreSQL RDS help it finally pick up?

SHOW EXPLAIN in MariaDB 10.0 vs EXPLAIN FOR CONNECTION in MySQL 5.7

Justin Swanhart is announcing a new MySQL fork: TroySQL.

↧

Running JRE7 alongside JRE6 for Oracle Forms

November 19, 2013, 7:22 am

≫ Next: Why was SSHD refusing my key?

≪ Previous: Log Buffer #346, A Carnival of the Vanities for DBAs

DBAs working in a managed services environment often run into situations where some customers are still running JRE6 while others have upgraded to latest JRE7. When JRE7 is installed on a Windows desktop, it becomes the default JRE and gets invoked when users access Oracle forms. This creates all kinds of issues for users who try to access Oracle forms from customers who are still on JRE6.

This blog describes a way to access Oracle E-Business Suite (EBS) forms running on JRE7 and, at the same time, other EBS forms running on JRE6. This method has been tested on Windows 7 64bit running on laptops and desktops. The reason this method only works on the 64bit flavor of Windows is that it comes with both 32bit and 64bit versions of Internet Explorer. We will configure the 32bit version of Internet Explorer to use JRE6 and the 64bit version to use JRE7. This is done by installing the 64bit version of JRE7 . You can download 64bit JRE7 from the Java download page. Just look for Windows x64, which I’ve highlighted in the image below.

After you download and install JRE7 64bit, now launch 64bit Internet Explorer by just typing “internet explorer” into the run prompt (as shown below) and click on “Internet Explorer (64-bit)”

You can quickly verify which version of JRE is running by accessing this java.com url. You will notice that this link shows JRE7 when you access it via Internet Explorer 64bit and the default Internet Explorer will show JRE6.

You may be concerned that this setup is not supported, but don’t worry. 64bit Internet Explorer & 64bit JRE are certified with Oracle E-Business Suite. You can refer to My Oracle Support Note 285218.1
I would like to hear about any issues you may have experienced with JRE. Please post them in the comments section!

↧

Why was SSHD refusing my key?

November 22, 2013, 12:39 pm

≫ Next: Log Buffer #347, A Carnival of the Vanities for DBAs

≪ Previous: Running JRE7 alongside JRE6 for Oracle Forms

I was contacted by a colleague about a problem he was having. “I’m trying to set up something simple which I’ve done millions of times, but it’s not working. I might be missing something obvious.”

The issue was that the SSH public key authentication didn’t work. The environment was running a virtualized Oracle Enterprise Linux 6.4 operating system (similar to Red Hat Enterprise Linux RHEL or Centos 6.) We’ll call this box Badboy for the purpose of this post.

I logged onto Badboy and attempted to do it myself, following the basic steps to set up public key authentication on Linux:

[knecht@random-client ~]$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/home/knecht/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/knecht/.ssh/id_rsa. Your public key has been saved in /home/knecht/.ssh/id_rsa.pub. The key fingerprint is: 0c:32:d6:9e:4f:82:7d:f6:ff:c7:f3:0b:c6:a4:88:29 knecht@random-client The key's randomart image is:

+–[ RSA 2048]—-+
| |
| . |
| + o |
| . * + . . |
| . = S . ..+ |
| *o…+. o |
| E o…. + .|
| . . . .|
| . |
+—————–+
I then copied that public key file to Badboy and added the key to the ~/.ssh/authorized_keys file.

Let’s verify that the public key authentication has not been disabled in sshd_config:
[root@client ~]# grep Pubkey /etc/ssh/sshd_config #PubkeyAuthentication yes
The value isn’t set explicitly, so the default setting comes into play which is yes.

I then switched back to my random-client and attempted to log in using SSH with the public key.

[knecht@random-client ~]$ ssh knecht@badboy knecht@badboy's password:
Hmm, it’s asking for a password. That’s not what you would expect. Let’s disable password authentication entirely in the client:

[knecht@random-client ~]$ ssh -o "PasswordAuthentication no" knecht@badboy Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). [knecht@random-client ~]$
Interesting.

This box was really not behaving nicely. Badboy it is then.

I checked the logs to see if anything shows up by doing a tail -f /var/log/secure while logging on from a second session, trying again with password authentication disabled.

The only entry that showed up was:

Nov 21 20:14:21 badboy sshd[4299]: Connection closed by xx.xx.xxx.xxx
That’s helpful. Move on.

Hoping to get a little bit more insight, I increased the logging verbosity of sshd, by changing LogLevel in sshd_config</code: LogLevel VERBOSE
Retrying the previous exercise, we now have a wee bit more information:

Nov 21 20:21:59 badboy sshd[4980]: Failed publickey for knecht from xx.xx.xxx.xxx port 36189 ssh2 Nov 21 20:21:59 badboy sshd[4981]: Connection closed by xx.xx.xxx.xxx
Well, you’re not very talkative Mr. Badboy, are you? I went again and triple-checked all of the possible options, read the main pages, and couldn’t determine what was wrong with Badboy’s configuration. As my colleague mentioned, it’s a simple exercise. We’ve all done it countless times… So what was the problem now?

It dawned on me that I had a similar experience in the past, when things just wouldn’t work without any clear reason. The new suspect was now Mr. Badboy’s big brother, SELinux.

[root@badboy ~]# sestatus SELinux status: enabled SELinuxfs mount: /selinux Current mode: enforcing Mode from config file: enforcing Policy version: 26 Policy from config file: targeted

Well, there you have it. A quick peek in /var/log/audit/audit.log showed several actions being denied during an attempted connection.

How to configure SELinux to get this to work is beyond the scope of this document. In our environment, we are not using SELinux so we disabled it by setting SELINUX=disabled
in /etc/selinux/config and rebooted the system.

As soon as it was back up:

[knecht@random-client ~]# ssh knecht@badboy [knecht@badboy ~]#
Good boy.

↧