Quantcast
Channel: Oracle – Official Pythian® Blog
Viewing all 301 articles
Browse latest View live

Backup Oracle to Amazon AWS S3 – Part 1

$
0
0

Backup Oracle to S3 – Part 1

Oracle has the ability to backup directly to Amazon S3 buckets.

This is a very attractive option for many reasons:

  • offsite backup
  • inexpensive storage
  • backups can automatically be migrated to archive storage (Amazon Glacier)

The ‘Cloud Backup’ capability of Oracle is found with the ‘Oracle Secure Backup’ package, which is an extra-cost option.

Circumstances may not allow you to purchase this software. Perhaps you are running a departmental server and there is no budget for this.

There could be many reasons.

That does not mean that your Oracle backups cannot be made directly to S3.

If your databases are running on a reasonably recent version of Linux, you may be able to use s3fs which uses the FUSE (Filesystem in Userspace) filesystem.

Requirements:

  • Linux kernel of 2.6.18-164+
  • The Linux Kernel must support FUSE

The system used for the test:

  • Oracle Linux 7.7
  • Oracle Database 19.3

Due to the nature of how s3 works (no random access) this method is only appropriate for static backup files.

The use of s3 would not be a good choice for updatable images copies for instance.

Another limitation is file size – 5T is the maximum file size allowed. More on that later.

Another limitation was discovered, as well as a workaround. This will also be discussed.

Let’s get started.

Install Amazon Utilities

This is not a tutorial on installing Amazon command line utilities, so only the necessary commands are shown for those components.

I first installed awscli and s3cmd

# yum install -y awscli s3cmd

AWS credentials were then set up for both oracle and root.

Now create an s3 bucket:

# s3cmd mb s3://pythian-oracle-backups

Now install s3fs-fuse from the yum repo:

$ yum install -y s3fs-fuse

Create a mount point, and give it to oracle:

# mkdir /mnt/s3-oracle-backup
# chown oracle:oinstall /mnt/s3-oracle-backup

Allow oracle to run fusermount

# chmod +x /usr/bin/fusermount

As oracle tries to mount an s3 ‘filesytem’:

$ s3fs pythian-oracle-backups /mnt/s3-oracle-backup

Checking for a mount at /mnt/s3-oracle-backup, there was not one:

# ls /mnt | grep s3
#

Next I tried the debug options:

$ s3fs pythian-oracle-backups /mnt/s3-oracle-backup -o dbglevel=info -f -o curldbg

A couple of things appeared here:

  • libcurl and libcurl-dev were out of date – newer versions were recommended for performance reasons
  • access was denied due to invalid credentials.

 

[WAN] curl.cpp:ResetHandle(1855): The CURLOPT_SSL_ENABLE_ALPN option could not be unset. S3 server does not support ALPN, then this option should be disabled to maximize performance. you need to use libcurl 7.36.0 or later.
[WAN] curl.cpp:ResetHandle(1858): The S3FS_CURLOPT_KEEP_SENDING_ON_ERROR option could not be set. For maximize performance you need to enable this option and you should use libcurl 7.51.0 or later.
...
[CRT] s3fs.cpp:s3fs_check_service(3864): invalid credentials(host=https://s3.amazonaws.com) - result of checking service.
[ERR] s3fs.cpp:s3fs_exit_fuseloop(3440): Exiting FUSE event loop due to errors

The credentials were correct, as s3cmd was working properly:

$ s3cmd ls
2015-12-28 21:29  s3://pythian-apps
2016-01-09 21:33  s3://pythian-aws-billing
2010-08-08 00:07  s3://pythian-data
2017-06-12 23:55  s3://pythian-dropbox-backup
2010-08-08 00:08  s3://pythian-fcedit
2010-08-07 19:32  s3://pythian-keepass
2011-09-19 17:28  s3://pythian-keepass2
2011-09-19 18:04  s3://pythian-keepass2-bkup
2010-08-08 00:08  s3://pythian-manuals
2010-11-06 23:25  s3://pythian-maps
2014-08-28 21:14  s3://pythian-music
2012-01-07 18:56  s3://pythian-oracle
2020-03-04 16:18  s3://pythian-oracle-backups
2018-01-15 00:46  s3://pythian-oracle-sw
2010-08-07 18:21  s3://pythian-photos
2016-03-05 16:42  s3://pythian-rx7
2018-12-14 15:49  s3://pythian-stc
2015-03-02 23:22  s3://linux-backups

So I removed s3fs-fuse.

# yum erase s3fs-fuse

Create s3fs-fuse from source

Building software from source is something I have found necessary a number of times on RedHat Linux and its variants.

While the software I need may be available in the repos, it is sometimes too old to be of use.

Such is the case with s3fs-fuse.

Before continuing, I obtained a more recent version of libcurl and libcurl-dev

Get Recent libcurl – instructions

rpm -Uvh http://www.city-fan.org/ftp/contrib/yum-repo/rhel7/x86_64/city-fan.org-release-2-1.rhel7.noarch.rpm
yum --enablerepo=city-fan.org install libcurl libcurl-devel

Installation instructions are all in the s3fs-fuse github repo, and can be easily referenced in a browser:

Compile s3fs-fuse

Platform Specific notes – install s3fs-fuse

Install other required libs:

# yum install gcc libstdc++-devel gcc-c++ fuse fuse-devel libxml2-devel mailcap git automake make

Now get the source

# git clone https://github.com/s3fs-fuse/s3fs-fuse.git

At this point the software can be compiled:

# cd s3fs-fuse/

# ./autogen.sh

# ./configure --prefix=/usr --with-openssl

# make install

Again, make it executable for others

# chmod +x /usr/bin/fusermount

As root, modify the file /etc/fuse.conf by uncommenting the user_allow_other line:

# mount_max = 1000
user_allow_other

As oracle, mount the filesystem and test it:

$ s3fs pythian-oracle-backups /mnt/s3-oracle-backup -o default_permissions -o allow_other -o 'umask=0007'


$ touch  /mnt/s3-oracle-backup/test-file-oracle

$ ls -l /mnt/s3-oracle-backup/
total 1
-rw-r--r-- 1 oracle oinstall 0 Mar  4 11:18 test-file-oracle

$ s3cmd ls s3://pythian-oracle-backups
2020-03-04 19:18         0   s3://pythian-oracle-backups/test-file-oracle

The extra options allow the grid user to also write to the mounted filesystem. In my case that was necessary, as the database under tests is using ASM, running as ‘grid’.

See FUSE Manpage for more on FUSE options.

Now for some backups.

RMAN Backups

At first, I attempted to use RMAN to create backups.

This failed. The following errors were seen:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ch1 channel at 03/04/2020 16:49:58
ORA-19506: failed to create sequential file, name=\"CDB_T20200304_db_s15_p1_t1034182197
ORA-27028: skgfqcre: sbtbackup returned error\n", 46) = 46
ORA-19511: non RMAN, but media manager or vendor specific failure, error text

After running strace on the job I could see that errors were occurring on read() and lseek() calls.

It appeared that RMAN was trying to do block-level IO on the files, which is not going to work on S3 (object) storage.

So, it was time to go old school and use user-managed backups.

That was also not without some issues, but they can be handled.

User-Managed Backups

Before there was RMAN, there were User Managed Backups.

While not commonly used, this method of Oracle backup still works.

The documentation just referenced discusses all the considerations in creating backups.

Essentially, these are the steps

  • put the database in backup mode
  • backup data files
  • take the database out of the backup mode
  • switch logfile
  • backup archive logs
  • backup controlfiles (both binary and to trace)
  • backup spfile to pfile

Errors in User-Managed Backup

The first attempts to backup data files and archive logs failed, but in a different manner than the RMAN backups.

ASMCMD-8016: copy source '+DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/undotbs1.276.1020807859' and target '/mnt/s3-oracle-backup/pdb1/2020-03-05_12-44-54/datafiles/undotbs1.276.1020807859' failed
ORA-19505: failed to identify file "/mnt/s3-oracle-backup/pdb1/2020-03-05_12-44-54/datafiles/undotbs1.276.1020807859"
ORA-27044: unable to write the header block of file
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
ORA-15120: ASM file name '/mnt/s3-oracle-backup/pdb1/2020-03-05_12-44-54/datafiles/undotbs1.276.1020807859' does not begin with the ASM prefix character
ORA-06512: at "SYS.X$DBMS_DISKGROUP", line 617
ORA-06512: at line 3 (DBD ERROR: OCIStmtExecute)

My hypothesis at this point was the asmcmd cp command has made the call to create the file, and then when it tried to write the file immediately afterwards, it was not available. This could be due to the nature of s3 – the filesystem hasn’t finished creating the file, and ASM is trying to write to it.

If that was the case, I speculated that pre-creating the files with touch would solve that problem; and it did.

Following is the output from a partial backup – the datafiles and archivelogs were skipped, and only controlfile and pfile backups created.

[grid@ora192rac01 RMAN]$ ./backup-db-s3.sh
s3fs on /mnt/s3-oracle-backup type fuse.s3fs (rw,nosuid,nodev,relatime,user_id=54321,group_id=54321,default_permissions,allow_other)
SID: cdb1
ORACLE_SID = [+ASM1] ? The Oracle base has been changed from /u01/app/19.0.0/grid_base to /u01/app/oracle

undotbs1.276.1020807859:+DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/undotbs1.276.1020807859
sysaux.275.1020807859:+DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/sysaux.275.1020807859
undo_2.279.1020807925:+DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/undo_2.279.1020807925
system.277.1020807859:+DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/system.277.1020807859
users.280.1020807941:+DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/users.280.1020807941
ORACLE_SID = [cdb1] ? The Oracle base has been changed from /u01/app/oracle to /u01/app/19.0.0/grid_base
Datafile: undotbs1.276.1020807859:+DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/undotbs1.276.1020807859
Skipping Datafile: +DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/undotbs1.276.1020807859
Touch: touch /mnt/s3-oracle-backup/pdb1/2020-03-05_12-09-08/datafiles/undotbs1.276.1020807859
Datafile: sysaux.275.1020807859:+DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/sysaux.275.1020807859
Skipping Datafile: +DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/sysaux.275.1020807859
Touch: touch /mnt/s3-oracle-backup/pdb1/2020-03-05_12-09-08/datafiles/sysaux.275.1020807859
Datafile: undo_2.279.1020807925:+DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/undo_2.279.1020807925
Skipping Datafile: +DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/undo_2.279.1020807925
Touch: touch /mnt/s3-oracle-backup/pdb1/2020-03-05_12-09-08/datafiles/undo_2.279.1020807925
Datafile: system.277.1020807859:+DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/system.277.1020807859
Skipping Datafile: +DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/system.277.1020807859
Touch: touch /mnt/s3-oracle-backup/pdb1/2020-03-05_12-09-08/datafiles/system.277.1020807859
Datafile: users.280.1020807941:+DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/users.280.1020807941
Skipping Datafile: +DATA/CDB/9423786B7CAA4898E053DD01A8C04EF1/DATAFILE/users.280.1020807941
Touch: touch /mnt/s3-oracle-backup/pdb1/2020-03-05_12-09-08/datafiles/users.280.1020807941
ORACLE_SID = [+ASM1] ? The Oracle base has been changed from /u01/app/19.0.0/grid_base to /u01/app/oracle

System altered.

min scn: 62876556
max scn: 62877828
ORACLE_SID = [cdb1] ? The Oracle base remains unchanged with value /u01/app/oracle

thread_1_seq_436.284.1034250403:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_436.284.1034250403
thread_1_seq_437.283.1034251313:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_437.283.1034251313
thread_2_seq_345.281.1034251317:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_2_seq_345.281.1034251317
thread_1_seq_438.280.1034251757:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_438.280.1034251757
thread_1_seq_436.284.1034250403:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_436.284.1034250403
thread_1_seq_437.283.1034251313:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_437.283.1034251313
thread_2_seq_345.281.1034251317:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_2_seq_345.281.1034251317
thread_1_seq_438.280.1034251757:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_438.280.1034251757
ORACLE_SID = [cdb1] ? The Oracle base has been changed from /u01/app/oracle to /u01/app/19.0.0/grid_base
archive log: thread_1_seq_436.284.1034250403:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_436.284.1034250403
Skipping Archivelog: +FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_436.284.1034250403
archive log: thread_1_seq_437.283.1034251313:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_437.283.1034251313
Skipping Archivelog: +FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_437.283.1034251313
archive log: thread_2_seq_345.281.1034251317:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_2_seq_345.281.1034251317
Skipping Archivelog: +FRA/CDB/ARCHIVELOG/2020_03_05/thread_2_seq_345.281.1034251317
archive log: thread_1_seq_438.280.1034251757:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_438.280.1034251757
Skipping Archivelog: +FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_438.280.1034251757
archive log: thread_1_seq_436.284.1034250403:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_436.284.1034250403
Skipping Archivelog: +FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_436.284.1034250403
archive log: thread_1_seq_437.283.1034251313:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_437.283.1034251313
Skipping Archivelog: +FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_437.283.1034251313
archive log: thread_2_seq_345.281.1034251317:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_2_seq_345.281.1034251317
Skipping Archivelog: +FRA/CDB/ARCHIVELOG/2020_03_05/thread_2_seq_345.281.1034251317
archive log: thread_1_seq_438.280.1034251757:+FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_438.280.1034251757
Skipping Archivelog: +FRA/CDB/ARCHIVELOG/2020_03_05/thread_1_seq_438.280.1034251757

And here are the files created:

[grid@ora192rac01 RMAN]$ find /mnt/s3-oracle-backup/pdb1/2020-03-05_12-09-08/ -type f | xargs ls -ldtar
-rwxrwx--- 1 grid   oinstall      483 Mar  5 12:09 /mnt/s3-oracle-backup/pdb1/2020-03-05_12-09-08/datafiles/datafiles.txt
-rwxrwx--- 1 grid   oinstall      761 Mar  5 12:09 /mnt/s3-oracle-backup/pdb1/2020-03-05_12-09-08/archivelogs/archlogs.txt
-rwxrwx--- 1 grid   oinstall 19841024 Mar  5 12:09 /mnt/s3-oracle-backup/pdb1/2020-03-05_12-09-08/controlfile/controlfile.bkup
-rwxrwx--- 1 grid   oinstall    12646 Mar  5 12:09 /mnt/s3-oracle-backup/pdb1/2020-03-05_12-09-08/controlfile/controlfile.txt
-rwxrwx--- 1 oracle asmadmin     1950 Mar  5 12:09 /mnt/s3-oracle-backup/pdb1/2020-03-05_12-09-08/pfile/pfile.txt

As you may guess, even a small database takes some time to complete a backup to a remote destination when the uplink (10M in my case) is rather limited.

Archive logs have been successfully backed up. A full backup is currently running and will take some time to complete.

The backup-db-s3.sh script may be found at the end of this blog.

Un-mounting the s3 Filesystem

There is a bug in s3fs as regards unmounting a filesystem:

$ s3fs unmount  /mnt/s3-oracle-backup/ -o nonempty
fusermount: failed to access mountpoint /mnt/s3-oracle-backup: Permission denied

Running this fusermount command was successful, however:

fusermount -u -o  rw,nosuid,nodev,nonempty,subtype=s3fs /mnt/s3-oracle-backup

Wrap Up

There are other s3 limitations that may come into play.

In particular, remember that s3fs is simulating a file system on s3.

Large numbers of files in an s3 bucket can cause a directory listing to be quite slow.

Using sub-directories will help alleviate that issue.

And finally, the big question: Can this backup be restored?

Stay tuned, that will be addressed in a future installment.

backup-db-s3.sh

#!/usr/bin/env bash

: <<COMMENT

backup-db-s3.sh
Oracle User Managed Backup
https://docs.oracle.com/database/121/BRADV/osbackup.htm

Jared Still - Pythian 2020
still@pythian.com
jkstill@gmail.com

This script is S3 specific
It was written as part of a blog on backing up directly to Amazon AWS S3

COMMENT

: <<NOTES

This script makes the following assumptions:

- the database being backed up is a PDB (pluggable database) in a container database (CDB)
- ASM is used
- backup is made by grid user, as oracle does not have ASM privileges


The /mnt/s3-oracle-backup directory was initially created by the oracle user.
The dba group was added to the grid user so tha grid could also write to the directory

As both Oracle and Grid must write here, it is necessary for the mount point to have a common group
for the oracle and grid users

Note: spfile and controlfile backup come from the oracle users, even if run as grid, as it is the
oracle db instance creating the files

There is no warranty expressed or implied by the use of this script - use it at your own risk.

As per usual, be sure to test any backups made to verify they are complete and recoverable.

NOTES

declare -A sid
sid[ora]='cdb1'
sid[asm]='+ASM1'

declare dbname=pdb1
declare -A db
db[db]=dbserver/pdb1
db[cdb]=dbserver/cdb

declare -A username
username[db]=sys
username[cdb]=sys

declare -A password
password[db]=PASSWORD
password[cdb]=PASSWORD

# any value other than 0 will activate these flags
# skip datafile backup - for debugging
declare skipDatafileBackup=0

# skip archivelog backup - for debugging
declare skipArchlogBackup=0

#########################################
## user configurations above this line ##
#########################################

declare rc
declare touchSleep=1

declare timestamp=$(date '+%Y-%m-%d_%H-%M-%S')
declare s3Mount=/mnt/s3-oracle-backup
declare backupHome=${s3Mount}/${dbname}/${timestamp}

declare -A targetDirs
targetDirs[datafiles]=${backupHome}/datafiles
targetDirs[archivelogs]=${backupHome}/archivelogs
targetDirs[controlfile]=${backupHome}/controlfile
targetDirs[pfile]=${backupHome}/pfile

declare -A inventory
inventory[datafiles]=${targetDirs[datafiles]}/datafiles.txt
inventory[archivelogs]=${targetDirs[archivelogs]}/archlogs.txt

declare -A sql
declare -A scn
declare -A filenames
filenames[controlfile_backup]=${targetDirs[controlfile]}/controlfile.bkup
filenames[controlfile_backup_tmp]=/tmp/controlfile.tmp
filenames[controlfile_trace]=${targetDirs[controlfile]}/controlfile.txt
filenames[pfile_backup]=${targetDirs[pfile]}/pfile.txt

sql[minscn]='select min(checkpoint_change#) retval from gv\$datafile'
sql[maxscn]='select max(current_scn) retval from gv\$database'

sql[backup_controlfile]="alter database backup controlfile to '${filenames[controlfile_backup]}' reuse"
sql[backup_controlfile_tmp]="alter database backup controlfile to '${filenames[controlfile_backup_tmp]}' reuse"
sql[backup_controlfile_trace]="alter database backup controlfile to trace as '${filenames[controlfile_trace]}' reuse"

sql[backup_spfile]="create pfile='${filenames[pfile_backup]}' from spfile"
sql[begin_backup]="alter database begin backup"
sql[end_backup]="alter database end backup"

newdir () {
   local dirName="$1"

   [[ -z $dirName ]] && {
      echo
      echo "newdir(): dirName cannot be empty"
      echo
      exit 3
   }

   mkdir -p $dirName

   [[ $? -ne 0 ]] && {
      echo
      echo could not create $dirName
      echo
      exit 1
   }

   [[ -d $newDir ]] && {
      echo
      echo $dirName not available
      echo
      exit 2
   }

   # ownership is grid:oinstall
   # oracle must also be able to write here:
   chmod g+rwx $dirName
}


# pass in the key to username[], etc, not the actual db name
# pass in a SQL that returns a single value
# always runs as sysdba

getSqlVal () {
   local sqldb="$1"; shift
   local sql="$@"

eval "sqlplus  -L -s ${username[$sqldb]}/${password[$sqldb]}@${db[$sqldb]} as sysdba <<-EOF

   set pause off
   set echo off
   set timing off
   set trimspool on
   set verify off
   set  head off

   clear col
   clear break
   clear computes

   btitle ''
   ttitle ''

   btitle off
   ttitle off

   set newpage 1
   set pages 0 lines 200 term on feed off

   $sql;

   exit

EOF
"

}

# execute some sql - no return value expected
# currently the same as getSqlVal, but subject to change
# always runs as sysdba
exeSql () {
   local sqldb="$1"; shift
   local sql="$@"

eval "sqlplus  -L -s ${username[$sqldb]}/${password[$sqldb]}@${db[$sqldb]} as sysdba <<-EOF

   set pause off
   set echo off
   set timing off
   set trimspool on
   set verify off
   set  head off

   clear col
   clear break
   clear computes

   btitle ''
   ttitle ''

   btitle off
   ttitle off

   set newpage 1
   set pages 0 lines 200 term on feed off

   $sql;

   exit

EOF
"

}

cleanup () {
   echo
   echo "!!! $0 Interrupted - cleaning up !!"
   echo
   exeSql db ${sql[end_backup]}
   exit 4
}

exitIfError () {
   local errCode=$?

   if [[ $errCode -ne 0 ]]; then
      echo
      echo Error Encountered - exiting
      echo
      cleanup
   fi
}

# precreate file names on s3
# then sleep a bit to ensure it is available
s3Touch () {
   local file2touch="$@"

   touch $file2touch

   declare rc=$?

   [[ $rc -ne 0 ]] && {
      echo
      echo "touch failed with $rc (s3Touch)"
      echo
   }

   sleep $touchSleep

}

trap "cleanup" INT
trap "cleanup" TERM
trap "cleanup" ABRT
trap "cleanup" QUIT

# ensure that s3fs is mounted
/bin/mount | grep $s3Mount
[[ $? -ne 0 ]] && {

   echo
   echo "! $s3Mount is not Mounted !"
   echo
   exit 5

}


for dir2create in ${targetDirs[@]}
do
   newdir $dir2create
done


unset ORAENV_ASK

echo SID: ${sid[ora]}

. oraenv <<< ${sid[ora]}

scn[min]=$(getSqlVal db ${sql[minscn]})
exitIfError $?

# begin backup mode - get a list of datafiles

exeSql db ${sql[begin_backup]}

exitIfError $?

sqlplus -L -s ${username[db]}/${password[db]}@${db[db]} as sysdba <<-EOF

   set echo off term  on pause off verify off
   set feed off head off
   set linesize 1000 trimspool on
   col name_data format a1000

   spool ${inventory[datafiles]}
   select substr(file_name,instr(file_name,'/',-1)+1) ||':'|| file_name name_data from dba_data_files;
   spool off

   exit;

EOF

. oraenv <<< ${sid[asm]}

# backup the datafiles

for df in $(grep -v '^$' ${inventory[datafiles]} )
do
   echo Datafile: $df

   declare shortName=$(echo $df | cut -f1 -d:)
   declare fullName=$(echo $df | cut -f2 -d:)

   if [[ $skipDatafileBackup -ne 0 ]]; then
      echo "Skipping Datafile: $fullName"
   else
      # asmcmd cp expects the file to appear immediately after creating
      # and that is not occuring with s3
      # use 'touch' to precreate the file
      s3Touch "${targetDirs[datafiles]}/${shortName}"

      asmcmd cp "$fullName" ${targetDirs[datafiles]}
   fi
done

. oraenv <<< ${sid[ora]}


# end backup mode
exeSql db ${sql[end_backup]}

exitIfError $?

# switch logfile
exeSql cdb 'alter system switch logfile'

scn[max]=$(getSqlVal db ${sql[maxscn]})
exitIfError $?

echo min scn: ${scn[min]}
echo max scn: ${scn[max]}

. oraenv <<< ${sid[ora]}

# get list of archivelogs

sqlplus -L -s ${username[db]}/${password[db]}@${db[db]} as sysdba <<-EOF

   set echo off term  on pause off verify off
   set feed off head off
   col name format a200
   set linesize 1000 trimspool on
   col name_data format a1000

   spool ${inventory[archivelogs]}
   select  distinct substr(name,instr(name,'/',-1)+1) ||':'|| name name_date
   from gv\$archived_log
   where first_change# >= (select min(first_change#) min_change from gv\$archived_log where ${scn[min]} between first_change# and next_change#);
   spool off

   exit;

EOF

echo
echo "finished archivelog inventory"
echo

. oraenv <<< ${sid[asm]}

# backup the archivelogs

for archlog in $(grep -v '^$' ${inventory[archivelogs]})
do
   echo archive log: $archlog

   declare shortName=$(echo $archlog | cut -f1 -d:)
   declare fullName=$(echo $archlog | cut -f2 -d:)

   if [[ $skipArchlogBackup -ne 0 ]]; then
      echo "Skipping Archivelog: $fullName"
   else
      s3Touch "${targetDirs[archivelogs]}/${shortName}"
      asmcmd cp "$fullName" ${targetDirs[archivelogs]}
   fi
done

# backup the controlfile
touch ${filenames[controlfile_backup]}
touch ${filenames[controlfile_trace]}

exeSql cdb "${sql[backup_controlfile_tmp]}"
rc=$?
#exitIfError $rc

: <<'ORA-00245'

This error may occur on RAC when making a contfolfile backup:

  ORA-00245: control file backup failed; in Oracle RAC, target might not be on shared storage

As we are not making the backup to shared storage, there is little to be done about it.

A trace controlfile backup is also made

ORA-00245

if [[ $rc -eq 0 ]]; then
   cp ${filenames[controlfile_backup_tmp]} ${filenames[controlfile_backup]}
else
   echo
   echo Failed to created temp backup controlfile
   echo This is probably due to ORA-00245 on RAC
   echo
   echo This can happen if the backup is on non-shared storage
   echo and the controlfile is currently being written by another instance
   echo
   echo A backup controfile to trace will still be created
   echo
fi

# this may work on S3, but often fails, much as datafiles do if not first 'touched'
# it also fails due to 'write past end of file'
# do not backup directly to s3, just copy the tmp backupcontrol file to s3 as seen previously

#exeSql cdb "${sql[backup_controlfile]}"
#exitIfError $?

exeSql cdb "${sql[backup_controlfile_trace]}"
exitIfError $?

# it is assumed that an spfile is in use
# modify accordingly if that is not the case
#touch $Pfilenames[pfile_backup]}
exeSql cdb "${sql[backup_spfile]}"
exitIfError $?

 


World Backup Day – Protect Your Oracle Databases

$
0
0

In recognition of World Backup Day, I will discuss backups as they pertain to Oracle databases.

There are a number of options for backing up, testing backups, and restoring them. In fact, the number of possibilities can be a little overwhelming. Backing up does not necessarily need to be too complex. It is best to start off simple, and then modify the backup procedures as needed.

This article is not a tutorial by any means, but just an overview of some of the methods for backing up your database, and testing those backups. For some readers, this may be a refresher, and for those new to Oracle, it is a brief introduction to some features of the Oracle RMAN utility.

Making a backup can be as simple as the RMAN command backup database. Most backups, however, are more complex than that. Rather than spend much time on backups though, I actually want to talk about testing backups.

Backups that do not work when a restore and recovery attempt is made are the things that nightmares are made of. Making sure the backups are actually useful is something that cannot be overstated.

My interest in backups

First a little background on why I tend to be rather particular about making backups and being certain that a backup can be used to successfully restore a database. The two incidents below are from long ago; pre-Pythian and pre-Oracle (at least for me). The technologies used will make this obvious.

Inventory System

Long ago I was responsible for an inventory system. The fact that this inventory software ran on 8-inch floppy disks will provide clues to a couple of things:

  • it was long ago
  • it could not have been a large inventory (thankfully, as you will see)

One day something happened to the inventory disk. I do not recall just what happened, but I do remember that the entire inventory had to be re-entered.

By hand.

By me.

You would think I would have learned my lesson and start making regular backups. But no. If you have never made backups on 8-inch floppy disks, you do not really understand the meaning of the word ‘tedium’.

But of course, there was another crash, the disk was trash, and I had to re-enter all of the inventory.

Again.

By hand.

That was powerful lesson. But wait, there’s more!

The Orders System

At a somewhat later stage, I was at a different job, working onsite at a client site where they ran our custom software. My job that day was to upgrade the version of Xenix (Google it!) on the client’s machine. And of course, before doing that I made a backup. This backup was to tape, and so fairly simple to make.

The backup was made via the Unix (Linux not yet invented) tar command, directly to the tape device. Pretty standard stuff for the time. When the backup was complete, I ran ‘tar tvf /dev/tape’ to get a directory listing of the backup as a means of verification. The command was successful, all the correct files were there.

Next steps:

  • format the system drive
  • install later version of Xenix
  • reload backup software and data files

The first two steps were fine. That final, and crucial step, however, was not so good. It seems a colleague from the employer had upgraded the RAM the previous day. One step in that process was missed. Everything appeared to be OK after that. The problem was the memory used by the tape driver was not handled correctly.

Every file on that backup was nothing but garbage, except for the header. The headers were fine, but the contents were not.

This was not a good day, week, or month.

Should I ever seem rather persnickety and nitpicky about backups and restores, there are good reasons for that.

Backups are not that important

I say that only in jest; of course backups are important. But are there some things more important than backups?

Restores: restores are very important. Even more important is the recovery, as the database cannot be opened until the recovery step completes.

How is recovery different than restore? Recovery is the process that brings the restored files to a point where the database can be opened. Often this point is a particular time or SCN (System Change Number) that is specified at the time the database is recovered. This is referred to as an ‘incomplete’ recovery.

A ‘complete recovery’ is one where ALL redo from the archive logs is applied to the datafiles, applying all changes to the database.

Sure, you need backups to do a restore. But if you never test your backups, you may as well just not make them. A backup that cannot be used when a restore is necessary can make life very difficult indeed. You want to do everything you can to protect your data.

There is much more to know about Oracle backups, but for the purpose of discussing the validation of backups, the backup will be kept simple.

Testing Oracle Backups

The best way to test a backup is to restore it.

While this is the ideal, it is not always practical, and sometimes not even possible due to the size of the database. Very few companies are willing to spend the money and time needed to have a server available just for testing a restore of a 152-terabyte database.

The next best thing is to validate the restores. Here, Oracle has you covered with RMAN methods to validate that backups can indeed be restored.

Backup Validate

First on the list is backup validate. I know, I said this would be about restoring databases. However, the backup validate command can help you determine the condition of your database, spotting physical corruption in the database, and if you choose, it can also locate and record logical discrepancies.

The command can be as simple as backup validate check logical database

This will take some time, depending on the size of the database, as Oracle will read all required database files. This operation can be performed online and is non-destructive, as no backup files are created, and no files are altered.

Read more here: Backup Validate

Make a Backup

The following is the RMAN script used to create a backup that is used for some following commands.

run {
  allocate channel ch1 device type DISK;
  allocate channel ch2 device type DISK;
  backup incremental level=0 tag "RMAN-TEST" format '/mnt/oracle-backups/rac19c01/%d_T%T_db_s%s_p%p_t%t' database filesperset 4 plus archivelog tag "RMAN-TEST";
  alter system switch logfile;
  backup format '/mnt/oracle-backups/rac19c01/%d_T%T_arch_s%s_p%p_t%t' archivelog all filesperset 4 tag "RMAN-TEST";
  backup format '/mnt/oracle-backups/rac19c01/%d_T%T_cf%s_p%p_t%t' current controlfile tag "RMAN-TEST";
}

Restore Preview

Restore preview is fast, very fast. It is fast because it really doesn’t do much. The preview simply reads the metadata from the backup catalogue. This is useful just to ensure that the catalogue has captured the files necessary for a restore. This command does not read any files, just the metadata.

RMAN> restore preview database;

Starting restore at 2020-03-02 17:30:50
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=300 instance=cdb1 device type=DISK


List of Backup Sets
===================


BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
24      Incr 0  840.32M    DISK        00:00:20     2020-03-02 17:14:57
        BP Key: 24   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_db_s33_p1_t1034010877
  List of Datafiles in backup set 24
  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name
  ---- -- ---- ---------- ------------------- ----------- ------ ----
  1    0  Incr 80321492   2020-03-02 17:14:37              NO    +DATA/CDB/DATAFILE/system.259.1021126503
  4    0  Incr 80321492   2020-03-02 17:14:37              NO    +DATA/CDB/DATAFILE/undotbs1.261.1021126575
  9    0  Incr 80321492   2020-03-02 17:14:37              NO    +DATA/CDB/DATAFILE/undotbs2.274.1021127203

...


List of Archived Log Copies for database with db_unique_name CDB
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - -------------------
810     1    457     A 2020-03-02 17:14:19
        Name: +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_457.346.1034010983

812     1    458     A 2020-03-02 17:16:23
        Name: +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_458.342.1034010991

813     1    459     A 2020-03-02 17:16:31
        Name: +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_459.340.1034010995

811     2    359     A 2020-03-02 17:14:19
        Name: +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_359.344.1034010985

814     2    360     A 2020-03-02 17:16:25
        Name: +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_360.339.1034010997

recovery will be done up to SCN 80321491
Media recovery start SCN is 80321491
Recovery must be done beyond SCN 80321782 to clear datafile fuzziness
Finished restore at 2020-03-02 17:30:52

At the end of the listing, you will find that media recovery must be done beyond SCN 80321782 for a successful recovery.

However, the listing does not include that SCN information. It is there, we just need to look at the archive logs explicitly:

RMAN> list backup of archivelog all;

List of Backup Sets
===================


BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
3       325.40M    DISK        00:00:06     2020-03-02 17:05:26
        BP Key: 3   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_db_s11_p1_t1034010320

  List of Archived Logs in backup set 3
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    451     80282070   2020-03-02 15:00:03 80302686   2020-03-02 16:22:34
  1    452     80302686   2020-03-02 16:22:34 80318226   2020-03-02 17:05:18
  2    355     80302690   2020-03-02 16:22:36 80318223   2020-03-02 17:05:17

...

  List of Archived Logs in backup set 43
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    457     80321365   2020-03-02 17:14:19 80321856   2020-03-02 17:16:23
  1    458     80321856   2020-03-02 17:16:23 80321897   2020-03-02 17:16:31
  1    459     80321897   2020-03-02 17:16:31 80321912   2020-03-02 17:16:34
  2    360     80321860   2020-03-02 17:16:25 80321989   2020-03-02 17:16:37

...

We see 80321860, which is well past 80321782 and so we are confident that at least the required files have been recorded in the backup catalogue.

Note: By default, RMAN uses a date formate of ‘YYYY-MON-DD’.

This behavior can be altered by setting the NLS_DATE_FORMAT shell variable:

$ export NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS'

$ rman target /
...

Restore Validate

Does the knowledge that your backup files are recorded in the backup catalogue give you a warm, fuzzy feeling of contentment?

It shouldn’t. A database cannot be restored with metadata alone. Not knowing that backups have been tested is the kind of thing that should keep a DBA awake at night.

Knowing that you need proper rest, let’s see another method to test backups, restore validate.

RMAN can validate the files used for a restore. This is not just a test to see if the files are available, but a read of all blocks in the backup files to ensure they are valid.

The following is an example, using the backup that was made previously:

RMAN> restore validate database;

Starting restore at 2020-03-02 17:52:37
using channel ORA_DISK_1

channel ORA_DISK_1: starting validation of datafile backup set
channel ORA_DISK_1: reading from backup piece /mnt/oracle-backups/rac19c01/CDB_T20200302_db_s32_p1_t1034010877
channel ORA_DISK_1: piece handle=/mnt/oracle-backups/rac19c01/CDB_T20200302_db_s32_p1_t1034010877 tag=RMAN-TEST
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01

...

channel ORA_DISK_1: starting validation of datafile backup set
channel ORA_DISK_1: reading from backup piece /mnt/oracle-backups/rac19c01/CDB_T20200302_db_s43_p1_t1034010975
channel ORA_DISK_1: piece handle=/mnt/oracle-backups/rac19c01/CDB_T20200302_db_s43_p1_t1034010975 tag=RMAN-TEST
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: validation complete, elapsed time: 00:00:07
Finished restore at 2020-03-02 17:54:11

This is a small test database; the validation required only about two minutes.

It’s okay to get some warm and fuzzies now, but don’t get too cozy. There are other types of files that must be restored for a recovery to take place.

Let’s check on the archive log files. We know that all the required archive logs were created after 13:00 today, as the backup was made at about 17:14, and the alert log shows several log switches between 17:00 and 17:14

2020-03-02T17:07:24.248403-08:00
Thread 1 advanced to log sequence 455 (LGWR switch)
  Current log# 1 seq# 455 mem# 0: +DATA/CDB/ONLINELOG/group_1.265.1021126651
  Current log# 1 seq# 455 mem# 1: +DATA/CDB/ONLINELOG/group_1.268.1021126653
2020-03-02T17:07:24.320618-08:00
ARC2 (PID:30093): Archived Log entry 805 added for T-1.S-454 ID 0x7f6be175 LAD:1

So now we can validate that archive logs can be read:

RMAN>  restore validate archivelog  from time "to_date('2020-03-02 17:00:00','yyyy-mm-dd hh24:mi:ss')";

Starting restore at 2020-03-02 18:09:58
using channel ORA_DISK_1

channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_452.341.1034010319
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_453.349.1034010441
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_454.352.1034010445
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_455.351.1034010445
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_456.347.1034010859
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_457.346.1034010983
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_458.342.1034010991
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_1_seq_459.340.1034010995
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_355.353.1034010317
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_356.319.1034010441
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_357.350.1034010447
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_358.348.1034010859
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_359.344.1034010985
channel ORA_DISK_1: scanning archived log +DATA/CDB/ARCHIVELOG/2020_03_02/thread_2_seq_360.339.1034010997
Finished restore at 2020-03-02 18:10:12

Let’s also see if the controlfile is there:

RMAN>  restore validate controlfile;

Starting restore at 2020-03-02 18:13:09
using channel ORA_DISK_1

channel ORA_DISK_1: starting validation of datafile backup set
channel ORA_DISK_1: reading from backup piece +DATA/CDB/AUTOBACKUP/2020_03_02/s_1034011019.338.1034011019
channel ORA_DISK_1: piece handle=+DATA/CDB/AUTOBACKUP/2020_03_02/s_1034011019.338.1034011019 tag=TAG20200302T171659
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
Finished restore at 2020-03-02 18:13:11

Finally, a list of the backup by tag (you do use tags, don’t you?) can be used to show that all files are present:

RMAN> list backupset tag "RMAN-TEST";


List of Backup Sets
===================


BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
3       325.40M    DISK        00:00:06     2020-03-02 17:05:26
        BP Key: 3   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_db_s11_p1_t1034010320

...

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
44      4.32M      DISK        00:00:00     2020-03-02 17:16:54
        BP Key: 44   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s53_p1_t1034011014

  List of Archived Logs in backup set 44
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    453     80318226   2020-03-02 17:05:18 80318656   2020-03-02 17:07:20
  1    454     80318656   2020-03-02 17:07:20 80318683   2020-03-02 17:07:24
  2    356     80318223   2020-03-02 17:05:17 80318660   2020-03-02 17:07:21
  2    357     80318660   2020-03-02 17:07:21 80318695   2020-03-02 17:07:27

BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
45      Full    19.08M     DISK        00:00:01     2020-03-02 17:16:58
        BP Key: 45   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_cf54_p1_t1034011017
  Control File Included: Ckp SCN: 80322102     Ckp time: 2020-03-02 17:16:57

Now you can rest easy, as the data files, archivelogs and controlfile are all present in the backup made earlier.

The only way to be more sure of the backup is to use it to restore and recover the database.

Docs here: RESTORE

Recover Validate Header

While recover validate may sound like restore validate on steroids, it really isn’t.

According to the documentation, this command is nearly the same as restore preview. The difference is that when recover validate is used, the file headers are read to ensure they match what is found in the backup catalogue.

This would provide a little more assurance than the restore preview, but not as much as the restore validate.

The problem here is that in this 19.3 database, recover database validate header is examining only the archive log backups:

MAN> recover database validate header;

Starting recover at 2020-03-02 18:43:11
using channel ORA_DISK_1
using channel ORA_DISK_1


List of Backup Sets
===================


BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
40      173.87M    DISK        00:00:04     2020-03-02 17:16:49
        BP Key: 40   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s49_p1_t1034011005

  List of Archived Logs in backup set 40
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    449     80240390   2020-03-02 12:18:51 80261546   2020-03-02 13:39:51

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
38      199.81M    DISK        00:00:05     2020-03-02 17:16:43
        BP Key: 38   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s47_p1_t1034010998

  List of Archived Logs in backup set 38
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  2    354     80240394   2020-03-02 12:18:51 80302690   2020-03-02 16:22:36
  1    450     80261546   2020-03-02 13:39:51 80282070   2020-03-02 15:00:03

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
39      174.02M    DISK        00:00:05     2020-03-02 17:16:43
        BP Key: 39   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s48_p1_t1034010998

  List of Archived Logs in backup set 39
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    451     80282070   2020-03-02 15:00:03 80302686   2020-03-02 16:22:34

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
41      151.39M    DISK        00:00:05     2020-03-02 17:16:50
        BP Key: 41   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s50_p1_t1034011005

  List of Archived Logs in backup set 41
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    452     80302686   2020-03-02 16:22:34 80318226   2020-03-02 17:05:18
  2    355     80302690   2020-03-02 16:22:36 80318223   2020-03-02 17:05:17

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
44      4.32M      DISK        00:00:00     2020-03-02 17:16:54
        BP Key: 44   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s53_p1_t1034011014

  List of Archived Logs in backup set 44
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  2    356     80318223   2020-03-02 17:05:17 80318660   2020-03-02 17:07:21
  1    453     80318226   2020-03-02 17:05:18 80318656   2020-03-02 17:07:20
  1    454     80318656   2020-03-02 17:07:20 80318683   2020-03-02 17:07:24
  2    357     80318660   2020-03-02 17:07:21 80318695   2020-03-02 17:07:27

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
42      15.46M     DISK        00:00:00     2020-03-02 17:16:53
        BP Key: 42   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s51_p1_t1034011013

  List of Archived Logs in backup set 42
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    455     80318683   2020-03-02 17:07:24 80318691   2020-03-02 17:07:24
  1    456     80318691   2020-03-02 17:07:24 80321365   2020-03-02 17:14:19
  2    358     80318695   2020-03-02 17:07:27 80321362   2020-03-02 17:14:19
  2    359     80321362   2020-03-02 17:14:19 80321860   2020-03-02 17:16:25

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
43      5.46M      DISK        00:00:00     2020-03-02 17:16:53
        BP Key: 43   Status: AVAILABLE  Compressed: NO  Tag: RMAN-TEST
        Piece Name: /mnt/oracle-backups/rac19c01/CDB_T20200302_arch_s52_p1_t1034011013

  List of Archived Logs in backup set 43
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    457     80321365   2020-03-02 17:14:19 80321856   2020-03-02 17:16:23
  1    458     80321856   2020-03-02 17:16:23 80321897   2020-03-02 17:16:31
  2    360     80321860   2020-03-02 17:16:25 80321989   2020-03-02 17:16:37
  1    459     80321897   2020-03-02 17:16:31 80321912   2020-03-02 17:16:34
validation succeeded for backup piece
recovery will be done up to SCN 2163919
Media recovery start SCN is 2163919
Recovery must be done beyond SCN 80321989 to clear datafile fuzziness
Finished recover at 2020-03-02 18:43:13

At this time I do not know if this is due to a bug in the code, or a bug in the documentation.

Either way, restore validate gives me the kind of assurance I need that my backups can be restored.

Learn more about Pythian’s Oracle Services.

101 Series of Oracle in Google Cloud – Part I : Building ASM and Database

$
0
0

About a year ago, I worked on a project with some amazing teammates (Simon Pane and Karun Dutt) to collaborate on a POC to move their Oracle environment into Google Cloud. This series reflects some of the things I learned while working on this project, and provides a guide for testing Oracle in Google Cloud.

Note: that I am not a licensing expert, so before you do this, make sure you consult with your Google/Oracle Sales representative to be clear about what’s allowed and the costs involved.

Because this is a “101” series, let’s start by logging into Google Cloud. By now you should already have an account in cloud.google.com and installed the Google Cloud SDK. Because I use a Mac, I downloaded it from this location. Then, I logged into my Google Cloud account:

Renes-MacBook-Pro-2:~ rene$ gcloud auth login **********@*****.com

Because I’m using Google Free Tier, there are limitations on disk sizes, instance type, quotas, etc. These are the specifications I used:

  • Google Cloud instance name: rene-ace-test-inst1
  • n1-standard-8 VM which has 8 vCPU and 30GB of memory
  • Centos 7
  • 1 150GB SSD disk for ASM, disk group name: DATA
  • 1 100GB HD disk for the boot disk
  • Default subnet network
  • No quotas
  • Oracle 19.3 for OHAS and RDBMS
  • Oracle DB instance name: DBTEST

First, create the disk to use for ASM. In this case, I used a 150GB SSD disk:

Renes-MacBook-Pro-2:~ rene$ gcloud compute disks create rene-ace-disk-asm1 \
>    --project=oracle-migration \
>    --type=pd-ssd \
>    --size=150GB \
>    --labels=item=rene-ace \
>    --zone=us-central1-c
Created [https://www.googleapis.com/compute/v1/projects/oracle-migration/zones/us-central1-c/disks/rene-ace-disk-asm1].
NAME                ZONE           SIZE_GB  TYPE    STATUS
rene-ace-disk-asm1  us-central1-c  150      pd-ssd  READY

Create a 100GB HD disk for the boot disk with Centos 7:

Renes-MacBook-Pro-2:~ rene$ gcloud compute disks create rene-ace-inst1-boot-disk \
> --project=oracle-migration \
> --type=pd-standard \
> --size=100GB \
> --zone=us-central1-c \
> --image=centos-7-v20200309 \
> --image-project=centos-cloud 
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.
Created [https://www.googleapis.com/compute/v1/projects/oracle-migration/zones/us-central1-c/disks/rene-ace-inst1-boot-disk].
WARNING: Some requests generated warnings:
 - Disk size: '100 GB' is larger than image size: '10 GB'. You might need to resize the root repartition manually if the operating system does not support automatic resizing. See https://cloud.google.com/compute/docs/disks/add-persistent-disk#resize_pd for details.

NAME                      ZONE           SIZE_GB  TYPE         STATUS
rene-ace-inst1-boot-disk  us-central1-c  100      pd-standard  READY

After creating the ASM disk, create the VM with the boot disk that you just created:

Renes-MacBook-Pro-2:~ rene$ gcloud compute instances create rene-ace-test-inst1 \
>    --project=oracle-migration \
>    --zone=us-central1-c \
>    --machine-type=n1-standard-8 \
>    --subnet=default \
>    --network-tier=PREMIUM \
>    --no-restart-on-failure \
>    --maintenance-policy=TERMINATE \
>    --no-service-account \
>    --no-scopes \
>    --disk=name=rene-ace-inst1-boot-disk,device-name=rene-ace-inst1-boot-disk,mode=rw,boot=yes,auto-delete=yes \
>    --tags=allow-ssh,egress-nat-gce \
>    --labels=item=rene-ace
Created [https://www.googleapis.com/compute/v1/projects/oracle-migration/zones/us-central1-c/instances/rene-ace-test-inst1].
NAME                 ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP   STATUS
rene-ace-test-inst1  us-central1-c  n1-standard-8               10.***.***.***   34.***.***.***  RUNNING

Attach the disk to use for ASM:

Renes-MacBook-Pro-2:~ rene$ gcloud compute instances attach-disk rene-ace-test-inst1 \
>    --disk=rene-ace-disk-asm1 \
>    --device-name=rene-ace-disk-asm1 \
>    --mode=rw \
>    --zone=us-central1-c
Updated [https://www.googleapis.com/compute/v1/projects/oracle-migration/zones/us-central1-c/instances/rene-ace-test-inst1].
Renes-MacBook-Pro-2:~ rene$ gcloud compute ssh rene-ace-test-inst1 --zone=us-central1-c
Updating project ssh metadata...?Updated [https://www.googleapis.com/compute/v1/projects/oracle-migration].                                                                            
Updating project ssh metadata...done.                                                                                                                                                  
Waiting for SSH key to propagate.
Warning: Permanently added 'compute.5519028484974783249' (ECDSA) to the list of known hosts.
/etc/profile.d/lang.sh: line 19: warning: setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory
-bash: warning: setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory
[rene@rene-ace-test-inst1 ~]$ 

Unless otherwise mentioned, perform all steps as the root user. Now, install the 19c pre-install RPM:

[rene@rene-ace-test-inst1 ~]$ sudo su -
[root@rene-ace-test-inst1 ~]# curl -o ./oracle-database-preinstall-19c-1.0-1.el7.x86_64.rpm \
>    https://yum.oracle.com/repo/OracleLinux/OL7/latest/x86_64/getPackage/oracle-database-preinstall-19c-1.0-1.el7.x86_64.rpm
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 18204  100 18204    0     0  55201      0 --:--:-- --:--:-- --:--:-- 55331
[root@rene-ace-test-inst1 ~]# yum -y localinstall ./oracle-database-preinstall-19c-1.0-1.el7.x86_64.rpm
Loaded plugins: fastestmirror
Examining ./oracle-database-preinstall-19c-1.0-1.el7.x86_64.rpm: oracle-database-preinstall-19c-1.0-1.el7.x86_64
Marking ./oracle-database-preinstall-19c-1.0-1.el7.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package oracle-database-preinstall-19c.x86_64 0:1.0-1.el7 will be installed
--> Processing Dependency: bc for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
Determining fastest mirrors
epel/x86_64/metalink                                                                                                                                             |  18 kB  00:00:00     
 * base: us.mirror.nsec.pt
 * epel: mirror.steadfastnet.com
 * extras: mirrors.gigenet.com
 * updates: bay.uchicago.edu
base                                                                                                                                                             | 3.6 kB  00:00:00     
epel                                                                                                                                                             | 4.7 kB  00:00:00     
extras                                                                                                                                                           | 2.9 kB  00:00:00     
google-cloud-sdk/signature                                                                                                                                       |  454 B  00:00:00     
google-cloud-sdk/signature                                                                                                                                       | 1.4 kB  00:00:00 !!! 
google-compute-engine/signature                                                                                                                                  |  454 B  00:00:00     
google-compute-engine/signature                                                                                                                                  | 1.4 kB  00:00:00 !!! 
updates                                                                                                                                                          | 2.9 kB  00:00:00     
(1/9): epel/x86_64/group_gz                                                                                                                                      |  95 kB  00:00:00     
(2/9): epel/x86_64/updateinfo                                                                                                                                    | 1.0 MB  00:00:00     
(3/9): base/7/x86_64/group_gz                                                                                                                                    | 165 kB  00:00:00     
(4/9): extras/7/x86_64/primary_db                                                                                                                                | 164 kB  00:00:00     
(5/9): epel/x86_64/primary_db                                                                                                                                    | 6.7 MB  00:00:00     
(6/9): base/7/x86_64/primary_db                                                                                                                                  | 6.0 MB  00:00:00     
(7/9): google-cloud-sdk/primary                                                                                                                                  | 168 kB  00:00:00     
(8/9): google-compute-engine/primary                                                                                                                             | 3.9 kB  00:00:00     
(9/9): updates/7/x86_64/primary_db                                                                                                                               | 7.6 MB  00:00:00     
google-cloud-sdk                                                                                                                                                              1224/1224
google-compute-engine                                                                                                                                                             11/11
--> Processing Dependency: bind-utils for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: compat-libcap1 for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: compat-libstdc++-33 for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: glibc-devel for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: ksh for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: libaio for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: libaio-devel for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: libstdc++-devel for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: nfs-utils for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: psmisc for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: smartmontools for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: sysstat for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: unzip for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: xorg-x11-utils for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Processing Dependency: xorg-x11-xauth for package: oracle-database-preinstall-19c-1.0-1.el7.x86_64
--> Running transaction check
---> Package bc.x86_64 0:1.06.95-13.el7 will be installed
---> Package bind-utils.x86_64 32:9.11.4-9.P2.el7 will be installed
--> Processing Dependency: bind-libs-lite(x86-64) = 32:9.11.4-9.P2.el7 for package: 32:bind-utils-9.11.4-9.P2.el7.x86_64
--> Processing Dependency: bind-libs(x86-64) = 32:9.11.4-9.P2.el7 for package: 32:bind-utils-9.11.4-9.P2.el7.x86_64
--> Processing Dependency: liblwres.so.160()(64bit) for package: 32:bind-utils-9.11.4-9.P2.el7.x86_64
--> Processing Dependency: libisccfg.so.160()(64bit) for package: 32:bind-utils-9.11.4-9.P2.el7.x86_64
--> Processing Dependency: libisc.so.169()(64bit) for package: 32:bind-utils-9.11.4-9.P2.el7.x86_64
--> Processing Dependency: libirs.so.160()(64bit) for package: 32:bind-utils-9.11.4-9.P2.el7.x86_64
--> Processing Dependency: libdns.so.1102()(64bit) for package: 32:bind-utils-9.11.4-9.P2.el7.x86_64
...
Installed:
  oracle-database-preinstall-19c.x86_64 0:1.0-1.el7                                                                                                                                     

Dependency Installed:
  GeoIP.x86_64 0:1.5.0-14.el7                  bc.x86_64 0:1.06.95-13.el7           bind-libs.x86_64 32:9.11.4-9.P2.el7                     bind-libs-lite.x86_64 32:9.11.4-9.P2.el7 
  bind-license.noarch 32:9.11.4-9.P2.el7       bind-utils.x86_64 32:9.11.4-9.P2.el7 compat-libcap1.x86_64 0:1.10-7.el7                      compat-libstdc++-33.x86_64 0:3.2.3-72.el7
  geoipupdate.x86_64 0:2.5.0-1.el7             glibc-devel.x86_64 0:2.17-292.el7    glibc-headers.x86_64 0:2.17-292.el7                     gssproxy.x86_64 0:0.7.0-26.el7           
  kernel-headers.x86_64 0:3.10.0-1062.18.1.el7 keyutils.x86_64 0:1.5.8-3.el7        ksh.x86_64 0:20120801-140.el7_7                         libICE.x86_64 0:1.0.9-9.el7              
  libSM.x86_64 0:1.2.2-2.el7                   libX11.x86_64 0:1.6.7-2.el7          libX11-common.noarch 0:1.6.7-2.el7                      libXau.x86_64 0:1.0.8-2.1.el7            
  libXext.x86_64 0:1.3.3-3.el7                 libXi.x86_64 0:1.7.9-1.el7           libXinerama.x86_64 0:1.1.3-2.1.el7                      libXmu.x86_64 0:1.1.2-2.el7              
  libXrandr.x86_64 0:1.5.1-2.el7               libXrender.x86_64 0:0.9.10-1.el7     libXt.x86_64 0:1.1.5-3.el7                              libXtst.x86_64 0:1.2.3-1.el7             
  libXv.x86_64 0:1.0.11-1.el7                  libXxf86dga.x86_64 0:1.1.4-2.1.el7   libXxf86misc.x86_64 0:1.0.3-7.1.el7                     libXxf86vm.x86_64 0:1.1.4-1.el7          
  libaio.x86_64 0:0.3.109-13.el7               libaio-devel.x86_64 0:0.3.109-13.el7 libbasicobjects.x86_64 0:0.1.1-32.el7                   libcollection.x86_64 0:0.7.0-32.el7      
  libdmx.x86_64 0:1.1.3-3.el7                  libevent.x86_64 0:2.0.21-4.el7       libini_config.x86_64 0:1.3.1-32.el7                     libnfsidmap.x86_64 0:0.25-19.el7         
  libpath_utils.x86_64 0:0.2.1-32.el7          libref_array.x86_64 0:0.1.5-32.el7   libstdc++-devel.x86_64 0:4.8.5-39.el7                   libtirpc.x86_64 0:0.2.4-0.16.el7         
  libverto-libevent.x86_64 0:0.2.5-4.el7       libxcb.x86_64 0:1.13-1.el7           lm_sensors-libs.x86_64 0:3.4.0-8.20160601gitf9185e5.el7 mailx.x86_64 0:12.5-19.el7               
  nfs-utils.x86_64 1:1.3.0-0.65.el7            psmisc.x86_64 0:22.20-16.el7         quota.x86_64 1:4.01-19.el7                              quota-nls.noarch 1:4.01-19.el7           
  rpcbind.x86_64 0:0.2.0-48.el7                smartmontools.x86_64 1:7.0-1.el7_7.1 sysstat.x86_64 0:10.1.5-18.el7_7.1                      tcp_wrappers.x86_64 0:7.6-77.el7         
  unzip.x86_64 0:6.0-20.el7                    xorg-x11-utils.x86_64 0:7.5-23.el7   xorg-x11-xauth.x86_64 1:1.0.9-1.el7                    

Complete!

You must install some additional packages and update what you have already installed:

[root@rene-ace-test-inst1 ~]# yum -y install wget java-11-openjdk gcc gcc-c++
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: us.mirror.nsec.pt
...
Installed:
  gcc.x86_64 0:4.8.5-39.el7              gcc-c++.x86_64 0:4.8.5-39.el7              java-11-openjdk.x86_64 1:11.0.6.10-1.el7_7              wget.x86_64 0:1.14-18.el7_6.1             

Dependency Installed:
  avahi-libs.x86_64 0:0.6.31-19.el7        copy-jdk-configs.noarch 0:3.3-10.el7_5               cpp.x86_64 0:4.8.5-39.el7                  cups-libs.x86_64 1:1.6.3-40.el7             
  dejavu-fonts-common.noarch 0:2.33-6.el7  dejavu-sans-fonts.noarch 0:2.33-6.el7                fontconfig.x86_64 0:2.13.0-4.3.el7         fontpackages-filesystem.noarch 0:1.44-8.el7 
  giflib.x86_64 0:4.1.6-9.el7              java-11-openjdk-headless.x86_64 1:11.0.6.10-1.el7_7  javapackages-tools.noarch 0:3.4.1-11.el7   libfontenc.x86_64 0:1.1.3-3.el7             
  libjpeg-turbo.x86_64 0:1.2.90-8.el7      libmpc.x86_64 0:1.0.1-3.el7                          libxslt.x86_64 0:1.1.28-5.el7              lksctp-tools.x86_64 0:1.0.17-2.el7          
  mpfr.x86_64 0:3.1.1-4.el7                pcsc-lite-libs.x86_64 0:1.8.8-8.el7                  python-javapackages.noarch 0:3.4.1-11.el7  python-lxml.x86_64 0:3.2.1-4.el7            
  ttmkfdir.x86_64 0:3.0.9-42.el7           tzdata-java.noarch 0:2019c-1.el7                     xorg-x11-font-utils.x86_64 1:7.5-21.el7    xorg-x11-fonts-Type1.noarch 0:7.5-9.el7     

Complete!
[root@rene-ace-test-inst1 ~]# yum -y update
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: us.mirror.nsec.pt
 * epel: mirror.steadfastnet.com
 * extras: mirrors.gigenet.com
 * updates: bay.uchicago.edu
Resolving Dependencies
...
Installed:
  kernel.x86_64 0:3.10.0-1062.18.1.el7                                                                                                                                                  

Updated:
  binutils.x86_64 0:2.27-41.base.el7_7.3      firewalld.noarch 0:0.6.3-2.el7_7.4               firewalld-filesystem.noarch 0:0.6.3-2.el7_7.4  google-cloud-sdk.noarch 0:286.0.0-1    
  kernel-tools.x86_64 0:3.10.0-1062.18.1.el7  kernel-tools-libs.x86_64 0:3.10.0-1062.18.1.el7  kmod.x86_64 0:20-25.el7_7.1                    kmod-libs.x86_64 0:20-25.el7_7.1       
  libicu.x86_64 0:50.2-4.el7_7                python-firewall.noarch 0:0.6.3-2.el7_7.4         python-perf.x86_64 0:3.10.0-1062.18.1.el7      python-requests.noarch 0:2.6.0-9.el7_7 
  rsyslog.x86_64 0:8.24.0-41.el7_7.4          systemd.x86_64 0:219-67.el7_7.4                  systemd-libs.x86_64 0:219-67.el7_7.4           systemd-sysv.x86_64 0:219-67.el7_7.4   

Complete!

[root@rene-ace-test-inst1 ~]# yum clean all
Loaded plugins: fastestmirror
Cleaning repos: base epel extras google-cloud-sdk google-compute-engine updates
Cleaning up list of fastest mirrors

Make sure that there is at least 16GB of swap space:

[root@rene-ace-test-inst1 ~]# dd if=/dev/zero of=/swapfile bs=1M count=16384
16384+0 records in
16384+0 records out
17179869184 bytes (17 GB) copied, 80.5556 s, 213 MB/s
[root@rene-ace-test-inst1 ~]# mkswap /swapfile
Setting up swapspace version 1, size = 16777212 KiB
no label, UUID=50136cfb-9ab7-4d5a-8ed7-726da89a7ede
[root@rene-ace-test-inst1 ~]# chmod 0600 /swapfile
[root@rene-ace-test-inst1 ~]# swapon /swapfile
[root@rene-ace-test-inst1 ~]# echo "/swapfile          swap            swap    defaults        0 0" >> /etc/fstab
[root@rene-ace-test-inst1 ~]# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
[root@rene-ace-test-inst1 ~]# echo "transparent_hugepage=never" >> /etc/default/grub
[root@rene-ace-test-inst1 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-1062.18.1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-1062.18.1.el7.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-1062.12.1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-1062.12.1.el7.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-1062.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-1062.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-2a464dc72a01028f0cd989f1d687b484
Found initrd image: /boot/initramfs-0-rescue-2a464dc72a01028f0cd989f1d687b484.img
done

Disable the firewall and change SELINUX to permissive:

[root@rene-ace-test-inst1 ~]# systemctl stop firewalld
[root@rene-ace-test-inst1 ~]# systemctl disable firewalld
[root@rene-ace-test-inst1 ~]# sed -i.bak '/^SELINUX=/ s/SELINUX=.*/SELINUX=permissive/' /etc/selinux/config^C
[root@rene-ace-test-inst1 ~]# cat /etc/selinux/config | grep SELINUX
# SELINUX= can take one of these three values:
SELINUX=permissive
# SELINUXTYPE= can take one of three values:
SELINUXTYPE=targeted 

Next, enable hugepages by changing the value in /etc/sysctl.conf and restarting the VM:

[root@rene-ace-test-inst1 ~]# echo "vm.nr_hugepages=4096" >> /etc/sysctl.conf
[root@rene-ace-test-inst1 ~]# shutdown -r now
Connection to 34.66.112.87 closed by remote host.
Connection to 34.66.112.87 closed.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
Renes-MacBook-Pro-2:~ rene$ gcloud compute ssh rene-ace-test-inst1 --zone=us-central1-c
Last login: Tue Mar 31 01:31:10 2020 from 189.216.207.2
-bash: warning: setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory
[rene@rene-ace-test-inst1 ~]$ sudo su -
Last login: Tue Mar 31 01:31:21 UTC 2020 on pts/0
[root@rene-ace-test-inst1 ~]# grep Huge /proc/meminfo
AnonHugePages:         0 kB
HugePages_Total:    4096
HugePages_Free:     4096
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Create the directories for the binaries, the additional OS groups, and update the bash profile:

[root@rene-ace-test-inst1 ~]# mkdir -p /u01/app/19.0.0.0/grid
[root@rene-ace-test-inst1 ~]# mkdir -p /u01/app/oracle/product/19.0.0.0/dbhome_1
[root@rene-ace-test-inst1 ~]# mkdir -p /u01/app/oraInventory
[root@rene-ace-test-inst1 ~]# mkdir -p /u01/app/oracle/fast_recovery_area
[root@rene-ace-test-inst1 ~]# mkdir -p /u01/app/oracle_software
[root@rene-ace-test-inst1 ~]# mkdir -p /home/oracle/working
[root@rene-ace-test-inst1 ~]# chown -R oracle:oinstall /u01
[root@rene-ace-test-inst1 ~]# chown -R oracle:oinstall /home/oracle/working
[root@rene-ace-test-inst1 ~]# chmod -R 775 /u01

[root@rene-ace-test-inst1 ~]# 
[root@rene-ace-test-inst1 ~]# 
[root@rene-ace-test-inst1 ~]# cat <<EOT >> /home/oracle/.bash_profile
>     
> #  Oracle RDBMS Settings:
> export ORACLE_SID=ORCL
> export ORACLE_BASE=/u01/app/oracle
> export ORACLE_HOME=\${ORACLE_BASE}/product/19.0.0.0/dbhome_1
> export GRID_HOME=/u01/app/19.0.0.0/grid
> export PATH=\${ORACLE_HOME}/bin:\${PATH}
> export SWLIB=/u01/app/oracle_software
> EOT
[root@rene-ace-test-inst1 ~]# groupadd -g 54327 asmdba
[root@rene-ace-test-inst1 ~]# groupadd -g 54328 asmoper
[root@rene-ace-test-inst1 ~]# groupadd -g 54329 asmadmin
[root@rene-ace-test-inst1 ~]# usermod -u 54321 -g oinstall -G dba,asmadmin,asmdba,asmoper oracle
[root@rene-ace-test-inst1 ~]# 
[root@rene-ace-test-inst1 ~]# echo "umask 022" >> /home/oracle/.bashrc

Log in as the oracle use and download the latest OPatch using Maris Elsins getMOSpatch:

[rene@rene-ace-test-inst1 ~]$ sudo su - oracle
Last login: Tue Mar 31 01:09:04 UTC 2020 on pts/0
[oracle@rene-ace-test-inst1 ~]$ cd ${SWLIB}
[oracle@rene-ace-test-inst1 oracle_software]$ wget https://github.com/MarisElsins/getMOSPatch/raw/master/getMOSPatch.jar

...
(raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8932 (8.7K) [application/octet-stream]
Saving to: ‘getMOSPatch.jar’

100%[==============================================================================================================================================>] 8,932       --.-K/s   in 0s      

2020-03-31 01:46:46 (80.6 MB/s) - ‘getMOSPatch.jar’ saved [8932/8932]

[oracle@rene-ace-test-inst1 oracle_software]$ java -jar getMOSPatch.jar patch=6880880 platform=226P regexp=.*190.* download=all
Enter your MOS username: ********@*********.com
Enter your MOS password: 
|
Enter Comma separated platforms to list: 226P

We're going to download patches for the following Platforms/Languages:
 226P - Linux x86-64

Processing patch 6880880 for Linux x86-64 and applying regexp .*190.* to the filenames:
 1 - p6880880_190000_Linux-x86-64.zip
 Enter Comma separated files to download: all
 All files will be downloadad because download=all was specified.

Downloading all selected files:
 Downloading p6880880_190000_Linux-x86-64.zip: 110MB at average speed of 16744KB/s - DONE!  

Download Oracle base software using wget from edelivery.oracle.com. Download files V982063-01.zip (ORACLE RDBMS) and V982068-01.zip (ORACLE GI). You should have the following files:

[oracle@rene-ace-test-inst1 oracle_software]$ ls -ltr
total 5931508
-rw-r--r--. 1 oracle oinstall 3059705302 Apr 23  2019 V982063-01.zip
-rw-r--r--. 1 oracle oinstall 2889184573 Apr 23  2019 V982068-01.zip
-rw-r--r--. 1 oracle oinstall  115653541 Mar 31 01:49 p6880880_190000_Linux-x86-64.zip

Unzip the files to the corresponding location and update the OPatch:

[oracle@rene-ace-test-inst1 oracle_software]$ unzip -q ${SWLIB}/V982063-01.zip -d ${ORACLE_HOME}
[oracle@rene-ace-test-inst1 oracle_software]$ unzip -q ${SWLIB}/V982068-01.zip -d ${GRID_HOME}
[oracle@rene-ace-test-inst1 oracle_software]$ rm -rf ${ORACLE_HOME}/OPatch
[oracle@rene-ace-test-inst1 oracle_software]$ rm -rf ${GRID_HOME}/OPatch
[oracle@rene-ace-test-inst1 oracle_software]$ unzip -q ${SWLIB}/p6880880_190000_Linux-x86-64.zip -d ${ORACLE_HOME}
[oracle@rene-ace-test-inst1 oracle_software]$ unzip -q ${SWLIB}/p6880880_190000_Linux-x86-64.zip -d ${GRID_HOME}

As the root user, install the cvuqdisk:

[rene@rene-ace-test-inst1 ~]$ sudo su -
Last login: Tue Mar 31 01:35:45 UTC 2020 on pts/0
[root@rene-ace-test-inst1 ~]# cd /u01/app/19.0.0.0/grid/cv/rpm
[root@rene-ace-test-inst1 rpm]# rpm -qi cvuqdisk
package cvuqdisk is not installed
[root@rene-ace-test-inst1 rpm]# CVUQDISK_GRP=oinstall; export CVUQDISK_GRP
[root@rene-ace-test-inst1 rpm]# rpm -iv cvuqdisk-*.rpm
Preparing packages...
cvuqdisk-1.0.10-1.x86_64

As the oracle user, run cluvfy pre crsinst to make sure there is nothing missing so that you can successfully install OHAS and ASM. Because this is not a RAC environment, you only need to use ASM.

[oracle@rene-ace-test-inst1 ~]$ cd ${GRID_HOME}
[oracle@rene-ace-test-inst1 grid]$ ./runcluvfy.sh stage -pre crsinst -n `hostname -s`

ERROR: 
PRVG-10467 : The default Oracle Inventory group could not be determined.

Verifying Physical Memory ...PASSED
Verifying Available Physical Memory ...PASSED
Verifying Swap Size ...PASSED
Verifying Free Space: rene-ace-test-inst1:/usr,rene-ace-test-inst1:/var,rene-ace-test-inst1:/etc,rene-ace-test-inst1:/sbin,rene-ace-test-inst1:/tmp ...PASSED
Verifying User Existence: oracle ...
  Verifying Users With Same UID: 54321 ...PASSED
Verifying User Existence: oracle ...PASSED
...
CVU operation performed:      stage -pre crsinst
Date:                         Apr 4, 2020 2:09:38 PM
CVU home:                     /u01/app/19.0.0.0/grid/
User:                         oracle

Modify the response file so that you can use the GI software as a CRS_SWONLY installation:

[oracle@rene-ace-test-inst1 grid]$ cp ${GRID_HOME}/inventory/response/grid_install.rsp ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$   
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.option/ s~oracle.install.option=$~oracle.install.option=CRS_SWONLY~' ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^ORACLE_HOSTNAME/ s~ORACLE_HOSTNAME=$~ORACLE_HOSTNAME=`hostname -A`~' ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^INVENTORY_LOCATION/ s~INVENTORY_LOCATION=$~INVENTORY_LOCATION=/u01/app/oraInventory~' ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^ORACLE_BASE/ s~ORACLE_BASE=$~ORACLE_BASE='${ORACLE_BASE}'~' ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.asm.OSDBA/ s~oracle.install.asm.OSDBA=$~oracle.install.asm.OSDBA=asmdba~' ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.asm.OSOPER/ s~oracle.install.asm.OSOPER=$~oracle.install.asm.OSOPER=asmoper~' ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.asm.OSASM/ s~oracle.install.asm.OSASM=$~oracle.install.asm.OSASM=asmadmin~' ${SWLIB}/grid_install.rsp
[oracle@rene-ace-test-inst1 grid]$   
[oracle@rene-ace-test-inst1 grid]$ diff ${GRID_HOME}/inventory/response/grid_install.rsp ${SWLIB}/grid_install.rsp
60c60
< INVENTORY_LOCATION=
---
> INVENTORY_LOCATION=/u01/app/oraInventory
75c75
< oracle.install.option=
---
> oracle.install.option=CRS_SWONLY
80c80
< ORACLE_BASE=
---
> ORACLE_BASE=/u01/app/oracle
98c98
< oracle.install.asm.OSDBA=
---
> oracle.install.asm.OSDBA=asmdba
105c105
< oracle.install.asm.OSOPER=
---
> oracle.install.asm.OSOPER=asmoper
111c111
< oracle.install.asm.OSASM=
---
> oracle.install.asm.OSASM=asmadmin

Launch the software installation:

[oracle@rene-ace-test-inst1 grid]$ ${GRID_HOME}/gridSetup.sh -silent -responseFile ${SWLIB}/grid_install.rsp
Launching Oracle Grid Infrastructure Setup Wizard...

The response file for this session can be found at:
 /u01/app/19.0.0.0/grid/install/response/grid_2020-04-04_02-12-20PM.rsp

You can find the log of this install session at:
 /tmp/GridSetupActions2020-04-04_02-12-20PM/gridSetupActions2020-04-04_02-12-20PM.log

As a root user, execute the following script(s):
	1. /u01/app/oraInventory/orainstRoot.sh
	2. /u01/app/19.0.0.0/grid/root.sh

Execute /u01/app/oraInventory/orainstRoot.sh on the following nodes: 
[rene-ace-test-inst1]
Execute /u01/app/19.0.0.0/grid/root.sh on the following nodes: 
[rene-ace-test-inst1]

Successfully Setup Software.
Moved the install session logs to:
 /u01/app/oraInventory/logs/GridSetupActions2020-04-04_02-12-20PM

When the software installation is complete, run orainstRoot.sh and root.sh as root user. Also, run roothas.pl to setup the HAS stack:

[rene@rene-ace-test-inst1 ~]$ sudo su -
Last login: Sat Apr  4 14:06:06 UTC 2020 on pts/0
[root@rene-ace-test-inst1 ~]# /u01/app/oraInventory/orainstRoot.sh
Changing permissions of /u01/app/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /u01/app/oraInventory to oinstall.
The execution of the script is complete.
[root@rene-ace-test-inst1 ~]# /u01/app/19.0.0.0/grid/root.sh
Check /u01/app/19.0.0.0/grid/install/root_rene-ace-test-inst1_2020-04-04_14-15-37-955415144.log for the output of root script

[root@rene-ace-test-inst1 ~]# export GRID_HOME=/u01/app/19.0.0.0/grid
[root@rene-ace-test-inst1 ~]# ${GRID_HOME}/perl/bin/perl -I ${GRID_HOME}/perl/lib -I ${GRID_HOME}/crs/install ${GRID_HOME}/crs/install/roothas.pl
Using configuration parameter file: /u01/app/19.0.0.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/oracle/crsdata/rene-ace-test-inst1/crsconfig/roothas_2020-04-04_02-16-37PM.log
LOCAL ADD MODE 
Creating OCR keys for user 'oracle', privgrp 'oinstall'..
Operation successful.
LOCAL ONLY MODE 
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node rene-ace-test-inst1 successfully pinned.
2020/04/04 14:16:47 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'

rene-ace-test-inst1     2020/04/04 14:17:24     /u01/app/oracle/crsdata/rene-ace-test-inst1/olr/backup_20200404_141724.olr     724960844     
2020/04/04 14:17:24 CLSRSC-327: Successfully configured Oracle Restart for a standalone server
[root@rene-ace-test-inst1 ~]# sleep 10

[root@rene-ace-test-inst1 ~]# ${GRID_HOME}/bin/crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ons
               OFFLINE OFFLINE      rene-ace-test-inst1      STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        OFFLINE OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       rene-ace-test-inst1      STABLE
--------------------------------------------------------------------------------

As the root user, prepare the disks for ASM using UDEV rules:

[root@rene-ace-test-inst1 ~]# if [ ! -e /dev/sdb1 ]; then
>   echo -e "n\np\n1\n\n\nw" | fdisk /dev/sdb
> fi
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table
Building a new DOS disklabel with disk identifier 0xae81dcbe.

The device presents a logical sector size that is smaller than
the physical sector size. Aligning to a physical sector (or optimal
I/O) size boundary is recommended, or performance may be impacted.

Command (m for help): Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): Partition number (1-4, default 1): First sector (2048-314572799, default 2048): Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-314572799, default 314572799): Using default value 314572799
Partition 1 of type Linux and of size 150 GiB is set

Command (m for help): The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@rene-ace-test-inst1 ~]#  
[root@rene-ace-test-inst1 ~]# ASM_DISK1=`/usr/lib/udev/scsi_id -g -u -d /dev/sdb`
[root@rene-ace-test-inst1 ~]#  
[root@rene-ace-test-inst1 ~]# cat > /etc/udev/rules.d/99-oracle-asmdevices.rules <<EOF
> KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/\$parent", RESULT=="${ASM_DISK1}", SYMLINK+="oracleasm/asm-disk1", OWNER="oracle", GROUP="dba", MODE="0660"
> EOF
[root@rene-ace-test-inst1 ~]# cat /etc/udev/rules.d/99-oracle-asmdevices.rules
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0Google_PersistentDisk_rene-ace-disk-asm1", SYMLINK+="oracleasm/asm-disk1", OWNER="oracle", GROUP="dba", MODE="0660"
[root@rene-ace-test-inst1 ~]# /sbin/partprobe /dev/sdb1
[root@rene-ace-test-inst1 ~]# sleep 10

[root@rene-ace-test-inst1 ~]# /sbin/udevadm control --reload-rules
[root@rene-ace-test-inst1 ~]# sleep 10
[root@rene-ace-test-inst1 ~]# /sbin/partprobe /dev/sdb1
[root@rene-ace-test-inst1 ~]# sleep 10
[root@rene-ace-test-inst1 ~]# /sbin/udevadm control --reload-rules
[root@rene-ace-test-inst1 ~]# sleep 10
[root@rene-ace-test-inst1 ~]# ls -al /dev/oracleasm/*
lrwxrwxrwx. 1 root root 7 Apr  4 14:20 /dev/oracleasm/asm-disk1 -> ../sdb1

As the oracle user, prepare the RDBMS install software response file:

[oracle@rene-ace-test-inst1 grid]$ cp ${ORACLE_HOME}/install/response/db_install.rsp ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$   
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.option/ s~oracle.install.option=$~oracle.install.option=INSTALL_DB_SWONLY~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.InstallEdition/ s~oracle.install.db.InstallEdition=$~oracle.install.db.InstallEdition=EE~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^ORACLE_HOSTNAME/ s~ORACLE_HOSTNAME=$~ORACLE_HOSTNAME=`hostname -A`~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^UNIX_GROUP_NAME/ s~UNIX_GROUP_NAME=$~UNIX_GROUP_NAME=oinstall~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^INVENTORY_LOCATION/ s~INVENTORY_LOCATION=$~INVENTORY_LOCATION=/u01/app/oraInventory~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^ORACLE_HOME/ s~ORACLE_HOME=$~ORACLE_HOME='${ORACLE_HOME}'~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^ORACLE_BASE/ s~ORACLE_BASE=$~ORACLE_BASE='${ORACLE_BASE}'~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.DBA_GROUP/ s~oracle.install.db.DBA_GROUP=$~oracle.install.db.DBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.OPER_GROUP/ s~oracle.install.db.OPER_GROUP=$~oracle.install.db.OPER_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.BACKUPDBA_GROUP/ s~oracle.install.db.BACKUPDBA_GROUP=$~oracle.install.db.BACKUPDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.DGDBA_GROUP/ s~oracle.install.db.DGDBA_GROUP=$~oracle.install.db.DGDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.KMDBA_GROUP/ s~oracle.install.db.KMDBA_GROUP=$~oracle.install.db.KMDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.OSDBA_GROUP/ s~oracle.install.db.OSDBA_GROUP=$~oracle.install.db.OSDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.OSOPER_GROUP/ s~oracle.install.db.OSOPER_GROUP=$~oracle.install.db.OSOPER_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.OSBACKUPDBA_GROUP/ s~oracle.install.db.OSBACKUPDBA_GROUP=$~oracle.install.db.OSBACKUPDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.OSDGDBA_GROUP/ s~oracle.install.db.OSDGDBA_GROUP=$~oracle.install.db.OSDGDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.OSKMDBA_GROUP/ s~oracle.install.db.OSKMDBA_GROUP=$~oracle.install.db.OSKMDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^oracle.install.db.OSRACDBA_GROUP/ s~oracle.install.db.OSRACDBA_GROUP=$~oracle.install.db.OSRACDBA_GROUP=dba~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^SECURITY_UPDATES_VIA_MYORACLESUPPORT/ s~SECURITY_UPDATES_VIA_MYORACLESUPPORT$=~SECURITY_UPDATES_VIA_MYORACLESUPPORT=FALSE~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$ sed -i '/^DECLINE_SECURITY_UPDATES/ s~DECLINE_SECURITY_UPDATES=$~DECLINE_SECURITY_UPDATES=TRUE~' ${SWLIB}/db_install.rsp
[oracle@rene-ace-test-inst1 grid]$   
[oracle@rene-ace-test-inst1 grid]$ diff ${ORACLE_HOME}/install/response/db_install.rsp ${SWLIB}/db_install.rsp
29c29
< oracle.install.option=
---
> oracle.install.option=INSTALL_DB_SWONLY
34c34
< UNIX_GROUP_NAME=
---
> UNIX_GROUP_NAME=oinstall
41c41
< INVENTORY_LOCATION=
---
> INVENTORY_LOCATION=/u01/app/oraInventory
45c45
< ORACLE_HOME=
---
> ORACLE_HOME=/u01/app/oracle/product/19.0.0.0/dbhome_1
50c50
< ORACLE_BASE=
---
> ORACLE_BASE=/u01/app/oracle
62c62
< oracle.install.db.InstallEdition=
---
> oracle.install.db.InstallEdition=EE
79c79
< oracle.install.db.OSDBA_GROUP=
---
> oracle.install.db.OSDBA_GROUP=dba
85c85
< oracle.install.db.OSOPER_GROUP=
---
> oracle.install.db.OSOPER_GROUP=dba
90c90
< oracle.install.db.OSBACKUPDBA_GROUP=
---
> oracle.install.db.OSBACKUPDBA_GROUP=dba
95c95
< oracle.install.db.OSDGDBA_GROUP=
---
> oracle.install.db.OSDGDBA_GROUP=dba
100c100
< oracle.install.db.OSKMDBA_GROUP=
---
> oracle.install.db.OSKMDBA_GROUP=dba
105c105
< oracle.install.db.OSRACDBA_GROUP=
---
> oracle.install.db.OSRACDBA_GROUP=dba

Next, install the RDBMS software:

[oracle@rene-ace-test-inst1 ~]$ ${ORACLE_HOME}/runInstaller -silent -waitforcompletion -responseFile ${SWLIB}/db_install.rsp
Launching Oracle Database Setup Wizard...

The response file for this session can be found at:
 /u01/app/oracle/product/19.0.0.0/dbhome_1/install/response/db_2020-04-04_02-22-21PM.rsp

You can find the log of this install session at:
 /u01/app/oraInventory/logs/InstallActions2020-04-04_02-22-21PM/installActions2020-04-04_02-22-21PM.log

As a root user, execute the following script(s):
	1. /u01/app/oracle/product/19.0.0.0/dbhome_1/root.sh

Execute /u01/app/oracle/product/19.0.0.0/dbhome_1/root.sh on the following nodes: 
[rene-ace-test-inst1]

Successfully Setup Software.

As the root user, run RDBMS root.sh:

[root@rene-ace-test-inst1 ~]#  /u01/app/oracle/product/19.0.0.0/dbhome_1/root.sh
Check /u01/app/oracle/product/19.0.0.0/dbhome_1/install/root_rene-ace-test-inst1_2020-04-04_20-07-58-636555961.log for the output of root script

After the RDBMS software installation completes, setup the listener as the oracle user. Note that from this point forward, all tasks are performed as the oracle user:

[oracle@rene-ace-test-inst1 ~]$ lsnrctl start

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 04-APR-2020 14:39:21

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Starting /u01/app/oracle/product/19.0.0.0/dbhome_1/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Log messages written to /u01/app/oracle/diag/tnslsnr/rene-ace-test-inst1/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rene-ace-test-inst1.us-central1-c.c.oracle-migration.internal)(PORT=1521)))

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                04-APR-2020 14:39:22
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Log File         /u01/app/oracle/diag/tnslsnr/rene-ace-test-inst1/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rene-ace-test-inst1.us-central1-c.c.oracle-migration.internal)(PORT=1521)))
The listener supports no services
The command completed successfully
[oracle@rene-ace-test-inst1 ~]$ lsnrctl stop

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 04-APR-2020 14:39:29

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
The command completed successfully
[oracle@rene-ace-test-inst1 ~]$  
[oracle@rene-ace-test-inst1 ~]$ srvctl add listener
[oracle@rene-ace-test-inst1 ~]$ srvctl start listener

Create and start the ASM Instance:

[oracle@rene-ace-test-inst1 ~]$ export ORACLE_SID=+ASM
[oracle@rene-ace-test-inst1 ~]$  
[oracle@rene-ace-test-inst1 ~]$ cat <<EOT > ${GRID_HOME}/dbs/init+ASM.ora
> instance_type=ASM
> asm_diskstring='/dev/oracleasm/*'
> large_pool_size=12M
> remote_login_passwordfile='EXCLUSIVE'
> memory_target=0
> sga_target=3G
> pga_aggregate_target=400M
> processes=1024
> EOT
[oracle@rene-ace-test-inst1 ~]$   
[oracle@rene-ace-test-inst1 ~]$ srvctl add asm -d '/dev/oracleasm/*'
[oracle@rene-ace-test-inst1 ~]$ srvctl start asm
[oracle@rene-ace-test-inst1 ~]$  
[oracle@rene-ace-test-inst1 ~]$ export ORAENV_ASK=NO
[oracle@rene-ace-test-inst1 ~]$ . oraenv
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@rene-ace-test-inst1 ~]$   
[oracle@rene-ace-test-inst1 ~]$ echo "CREATE spfile FROM pfile;" | sqlplus -s / as sysasm

File created.

[oracle@rene-ace-test-inst1 ~]$ srvctl stop asm
[oracle@rene-ace-test-inst1 ~]$ srvctl start asm

Create the ASM diskgroup (DATA):

[oracle@rene-ace-test-inst1 ~]$ echo "
> SELECT header_status,path FROM v\$asm_disk;
>   
> CREATE DISKGROUP DATA EXTERNAL REDUNDANCY
>    DISK '/dev/oracleasm/asm-disk1'
>    ATTRIBUTE
>       'compatible.asm'   = '19.0.0.0.0',
>       'compatible.rdbms' = '19.0.0.0.0';
>   
> SELECT header_status,path FROM v\$asm_disk;
> " | sqlplus -s / as sysasm

HEADER_STATU
------------
PATH
--------------------------------------------------------------------------------
CANDIDATE
/dev/oracleasm/asm-disk1

Diskgroup created.

HEADER_STATU
------------
PATH
--------------------------------------------------------------------------------
MEMBER
/dev/oracleasm/asm-disk1

Create the database (DBTEST):

[oracle@rene-ace-test-inst1 ~]$ mkdir -p /u01/app/oracle/fast_recovery_area/DBTEST
[oracle@rene-ace-test-inst1 ~]$ 
[oracle@rene-ace-test-inst1 ~]$ export ORACLE_SID=DBTEST
[oracle@rene-ace-test-inst1 ~]$ dbca -silent \
>    -createDatabase \
>    -templateName General_Purpose.dbc \
>    -gdbName ${ORACLE_SID} \
>    -sid ${ORACLE_SID} \
>    -createAsContainerDatabase FALSE \
>    -sysPassword welcome1 \
>    -systemPassword welcome1 \
>    -emConfiguration NONE \
>    -datafileDestination '+DATA' \
>    -recoveryAreaDestination '/u01/app/oracle/fast_recovery_area' \
>    -redoLogFileSize 128 \
>    -storageType ASM \
>    -diskGroupName DATA \
>    -characterSet AL32UTF8 \
>    -nationalCharacterSet AL16UTF16 \
>    -registerWithDirService FALSE \
>    -automaticMemoryManagement FALSE \
>    -initparams \
>    db_create_file_dest='+DATA', \
>    db_create_online_log_dest_1='+DATA', \
>    filesystemio_options='SETALL', \
>    log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST', \
>    log_archive_format='DBTEST_%s%t%r.arc', \
>    pga_aggregate_target=200, \
>    processes=150, \
>    sga_target=1200, \
>    streams_pool_size=64M, \
>    undo_retention=7200
[WARNING] [DBT-06208] The 'SYS' password entered does not conform to the Oracle recommended standards.
   CAUSE: 
a. Oracle recommends that the password entered should be at least 8 characters in length, contain at least 1 uppercase character, 1 lower case character and 1 digit [0-9].
b.The password entered is a keyword that Oracle does not recommend to be used as password
   ACTION: Specify a strong password. If required refer Oracle documentation for guidelines.
[WARNING] [DBT-06208] The 'SYSTEM' password entered does not conform to the Oracle recommended standards.
   CAUSE: 
a. Oracle recommends that the password entered should be at least 8 characters in length, contain at least 1 uppercase character, 1 lower case character and 1 digit [0-9].
b.The password entered is a keyword that Oracle does not recommend to be used as password
   ACTION: Specify a strong password. If required refer Oracle documentation for guidelines.
Prepare for db operation
10% complete
Registering database with Oracle Restart
14% complete
Copying database files
43% complete
Creating and starting Oracle instance
45% complete
49% complete
53% complete
56% complete
62% complete
Completing Database Creation
68% complete
70% complete
71% complete
Executing Post Configuration Actions
100% complete
Database creation complete. For details check the logfiles at:
 /u01/app/oracle/cfgtoollogs/dbca/DBTEST.
Database Information:
Global Database Name:DBTEST
System Identifier(SID):DBTEST
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/DBTEST/DBTEST.log" for further details.

Adjust redo log settings to 128MB:

[oracle@rene-ace-test-inst1 ~]$ sqlplus

SQL*Plus: Release 19.0.0.0.0 - Production on Sat Apr 4 15:04:31 2020
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Enter user-name: /as sysdba

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> select name from v$database;

NAME
---------
DBTEST

SQL> SET serverout on
   
DECLARE
   file_bytes           NUMBER;
   max_group            INT;
   num_groups           INT;
   num_threads          INT;
   created              INT;
   str                  VARCHAR2(128);
BEGIN
   --SELECT MAX(bytes) INTO file_bytes FROM v$log;
   SELECT 128*1024*1024 INTO file_bytes FROM dual;
   SELECT COUNT(DISTINCT(thread#)) INTO num_threads FROM v$log;
   SELECT MAX(group#), COUNT(DISTINCT(group#)) INTO max_group, num_groups FROM v$logfile WHERE type = 'ONLINE';
   SELECT COUNT(*) INTO created FROM v$standby_log;
   IF (created = 0) THEN
      FOR i IN 1..num_threads LOOP
         --FOR j IN i..max_group+i LOOP
         FOR j IN 1..5 LOOP
            str := 'ALTER DATABASE ADD LOGFILE THREAD '||i||' GROUP '||TO_CHAR(j+num_groups*i)||' SIZE '||file_bytes;
            DBMS_OUTPUT.put_line(str);
            EXECUTE IMMEDIATE str;
         END LOOP;
      END LOOP;
   END IF;
END;
/SQL> SQL>   2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25  
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 4 SIZE 134217728
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 5 SIZE 134217728
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 6 SIZE 134217728
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 7 SIZE 134217728
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 8 SIZE 134217728

PL/SQL procedure successfully completed.

Adjust the database to use large pages only, minimize SGA allocations, and adjust file system IO options:

[oracle@rene-ace-test-inst1 ~]$ echo "
>    alter system set USE_LARGE_PAGES=ONLY scope=spfile;
>    alter system set FILESYSTEMIO_OPTIONS=SETALL scope=spfile;
>    alter system set SGA_TARGET=650M scope=spfile;
>    alter system set SGA_MAX_SIZE=650M scope=spfile;
>    alter system set SHARED_POOL_SIZE=256M scope=spfile;
>    alter system set PGA_AGGREGATE_TARGET=50M scope=spfile;
>    alter system set DB_CACHE_SIZE=64M scope=spfile;
>    alter system set PARALLEL_MAX_SERVERS=0 scope=spfile;
>    alter system set RECYCLEBIN=OFF scope=spfile;
> " | sqlplus -s / as sysdba

System altered.


System altered.


System altered.


System altered.


System altered.


System altered.


System altered.


System altered.


System altered.

Stop and restart the database so that the new parameters take effect. Use crs_status.sh to verify the status of the OHAS environment:

[oracle@rene-ace-test-inst1 ~]$ srvctl stop database -d ${ORACLE_SID}
[oracle@rene-ace-test-inst1 ~]$ srvctl start database -d ${ORACLE_SID}

[oracle@rene-ace-test-inst1 ~]$ wget https://raw.githubusercontent.com/rene-ace/oracle_bash_scripts/master/crs_status.sh --output-document=${HOME}/working/crs_status.sh
...
(raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1320 (1.3K) [text/plain]
Saving to: ‘/home/oracle/working/crs_status.sh’

100%[==============================================================================================================================================>] 1,320       --.-K/s   in 0s      

2020-04-04 20:44:09 (224 MB/s) - ‘/home/oracle/working/crs_status.sh’ saved [1320/1320]

[oracle@rene-ace-test-inst1 working]$ chmod 750 ${HOME}/working/crs_status.sh
[oracle@rene-ace-test-inst1 working]$ sed -i '/^CRS_HOME/ s~CRS_HOME=$~CRS_HOME='${GRID_HOME}'~' ${HOME}/working/crs_status.sh
[oracle@rene-ace-test-inst1 working]$ ${HOME}/working/crs_status.sh
Name                                          Target     State           Server                    State details
------------------------------------------------------------------------------------------------------------------------
ora.DATA.dg                                   ONLINE     ONLINE          rene-ace-test-inst1       STABLE 
ora.LISTENER.lsnr                             ONLINE     ONLINE          rene-ace-test-inst1       STABLE 
ora.asm                                       ONLINE     ONLINE          rene-ace-test-inst1       Started,STABLE 
ora.ons                                       OFFLINE    OFFLINE         rene-ace-test-inst1       STABLE 
ora.cssd                                      ONLINE     ONLINE          rene-ace-test-inst1       STABLE 
ora.dbtest.db                                 ONLINE     ONLINE          rene-ace-test-inst1       Open,HOME=/u01/app/o 
ora.diskmon                                   OFFLINE    OFFLINE         STABLE                     
ora.evmd                                      ONLINE     ONLINE          rene-ace-test-inst1       STABLE 

Now that you have created the Google VM, installed OHAS, and created the Oracle DB, we’re set to perform several tasks that I’ll be describing in future series posts. I hope this helps expand your understanding of Google Cloud and how to install a database in it.

Note: This was originally published at rene-ace.

Datascape Podcast Episode 41 – Choosing a Public Cloud for Oracle Database

$
0
0

You can run Oracle anywhere. It’s very flexible, which means anywhere you can get a VM, you can run it, including a virtual box on your laptop. There’s even have a free edition (XE) if you don’t have large data needs. In this episode, we’ll discuss why you would choose to host your Oracle Database anywhere other than its native cloud. We look at the feasibility of doing so, along with some of the benefits and drawbacks that you’ll likely encounter moving to a public cloud.

Simon Pane, a friend of the show and Principal Consultant at Pythian, is here to walk us through this topic. As usual, he provides excellent insights. We start by discussing Oracle’s cloud (OCI) and the reasons to stay with it, which include its flexibility and very good support. However, the platform is in its relative infancy which means that there will likely some problems.

We then move on to the big three: Azure, AWS, and Google Cloud. Moving to a public cloud provider might be part of an organizational “bigger picture” and other non-functional reasons. With Azure and AWS, clients are likely hoping to take advantage of the other features, such as Azure’s Office 365 and AWS’s RDS. Despite being licensed for Oracle, Azure is not optimized for it because there are no special shapes or options. AWS does have managed services, but still has drawbacks, which Simon explores in-depth.

We then move on to Google Cloud, the newest kid on the block. There are certainly great features (like BigQuery) that come along with it, but there is a major discrepancy: Licensing. The Oracle licensing policy does not include GC, so Simon offers some potential workarounds for this problem.

Ultimately, there is no slam-dunk solution for where to host Oracle databases. If you put your blinders on and focus on one area, it might seem that one cloud resolves all of your needs. However, it really comes down to making trade-offs and deciding which need is most important. Be sure to tune in today!

Key Points from this Episode

  • Discover reasons someone might move an Oracle on-premises database to a public cloud.
  • How Oracle-supported versions work and why this might be a reason for using public clouds.
  • Insights into Oracle cloud (OCI), its bad name, and the two reasons people don’t use it.
  • The motivators for utilizing OCI: It’s best at home, supportability, and unique needs.
  • Learn why you would use Azure as a home for your Oracle Database.
  • The pain points you might encounter hosting an Oracle Database on Azure.
  • Developments between Microsoft and Oracle. Two former enemies now building bridges.
  • Reasons to go with AWS: Brand recognition, first to market, and managed service.
  • Oracle licenses are incredibly expensive and drive the cost up in cloud and on-premises.
  • Discover some of the shortcomings of hosting an Oracle Database on AWS.
  • An overview of the special Oracle licensing considerations to remember.
  • The benefits and drawbacks of using Google Cloud for Oracle Databases.
  • Cloud SQL: Another example of Oracle’s terrible product names!
  • A universal cloud challenge: None of them are great with upgrades.
  • Why having cloud as backup or DR can be a good way to navigate the cloud decision matrix.
  • What public clouds have in store for Oracle going into the future.

Links Mentioned in Today’s Episode

OATUG Forum Online (or, the 2020 Version of Collaborate Las Vegas)

$
0
0

Hi everyone!

It’s been a very busy year for me, so I haven’t had much time to blog. But, here we go!

This isn’t a technical post, but more of an opinion piece. I’ve split this post into two parts. Part one was written before I spoke at the OATUG (Oracle Applications and Technology Users Group) Forum online. Part two was written after I spoke.

Part 1: Before…

The global pandemic has disrupted virtually everything in our world. The same goes for the world of IT, including one of the best things about our industry: Conferences and live events.

I was supposed to attend two major conferences in the first quarter of 2020. One was cancelled and the other became an online event. Aborting major events like these is no trivial undertaking. Not only for the organizers themselves but also for all the sponsors and attendees (customers and vendors included).

Huge events like Oracle Open World, Google Next, or the one that will be the focus of most of this post, Collaborate, have a long-lasting impact that extends throughout the year. At these events, a casual meeting or a chat during a coffee break can kickstart a relationship. It might develop into a large services agreement, a partnership, or a whole new successful startup.

You have to agree that these events are important, if not crucial, for our industry.

Collaborate

As for Collaborate, well, I was very excited to not only attend but also speak at the event. After attending DOAG, BIWA, and UKOUG the previous years, Collaborate is the next big Oracle-related event in which I’d be participating. Then, of course, came COVID-19.

However, Collaborate isn’t cancelled. Instead, it’s now a fully virtual event. Here I’d like to give a massive KUDOS to the event organizers for their commitment to the event and the attendees. It certainly couldn’t have been easy to make the initial decision to proceed with Collaborate. Nor could it have been easy to completely undo and transform months and months of planning and logistics into a fully virtual conference in a matter of weeks.

I was very pleased to still be selected to speak at the (now) online conference. And, a big thanks to Pythian for their speaker program. It enables me to attend these gatherings, virtual or otherwise.

So, here I am, a few hours prior to my first big online presentation to an unknown number of people, and crossing my fingers that everything works as expected. This is basically how I feel before any “typical” presentation. The main difference? Well, the presentation won’t be quite the same. Interaction with the listening public is close to zero, and there is little practical opportunity for people to reach out to you with questions afterwards.

My Thoughts…

I know, I know… there’s always social media, and so on. However, it’s not the same, and you know it. There’s no immediate feedback from the attendees, and you can’t really tell if they find the content interesting or boring. Nor is it easy to tell whether you should slow down, or speed up. It definitely feels more impersonal.

This is my opinion from a “presenter” standpoint. As an “attendee,” there are two major issues for me: One is that it’s no longer a live in-person event, it’s awkward to ask your manager for time to attend it. The second big issue for me is the time zone. I’m in the Eastern time zone, and the event is based in Las Vegas, so sessions can go all the way from 4:30 PM to 10:30 PM EDT. This, on top of my regular eight-hour workday, is simply too much.

So, I might be able to attend a session or two and watch saved sessions at a later time. This takes me back to my previous point about lost face-to-face interactions with people.

Anyway, I’m really interested to see how this plays out. I really hope the event is a big success because, unfortunately, our current “situation” might last a long time. We need to prepare for that possibility.

Part 2: After…

And… the event is over. Unfortunately, it wasn’t very satisfactory, which makes me a little sad. You see, Collaborate is one of the biggest Oracle events that I know of with a great many attendees, but only four showed up to my live presentation. Yes, the feedback was very good, and surely those people who attended were very interested in the topic I was speaking about. I thank each and every one of them.

A few days later, I checked my presentation statistics and there were a few views and downloads. But overall, the number is far below the 20 to 50 people that attended this same presentation at other on-site conferences.

Of course, I’m not a renowned speaker and the subject matter might not be of broader interest. However, the online experience is just “not the same,” and that takes a toll on the number of attendees.

Regardless, it was a good overall experience. I give my sincere congratulations to the organizers for the huge effort they clearly put into the entire event.

Hopefully, we can all meet in person soon and enjoy a truly magnificent Collaborate.

AWS RDS: 5 Must-Know Actions for Oracle DBAs

$
0
0

Managing Oracle on AWS has some twists. Here are five daily DBA activities that have changed on AWS:

Kill Sessions:

begin
rdsadmin.rdsadmin_util.kill(
sid => &sid,
serial => &serial,
method => 'IMMEDIATE');
end;
/

 

Flush shared_pool or buffer_cache:

exec rdsadmin.rdsadmin_util.flush_shared_pool;
exec rdsadmin.rdsadmin_util.flush_buffer_cache;

 

Perform RMAN Operations:

BEGIN
 rdsadmin.rdsadmin_rman_util.validate_database(
 p_validation_type => 'PHYSICAL+LOGICAL',
 p_parallel => 4,
 p_section_size_mb => 10,
 p_rman_to_dbms_output => FALSE);
END;
/

 

Grant Privileges to SYS Objects

# Grant

begin
    rdsadmin.rdsadmin_util.grant_sys_object(
        p_obj_name  => 'V_$SESSION',
        p_grantee   => 'PYTHIAN',
        p_privilege => 'SELECT');
end;
/

# Grant with Grant Option

begin
    rdsadmin.rdsadmin_util.grant_sys_object(
        p_obj_name     => 'V_$SESSION',
        p_grantee      => 'PYTHIAN',
        p_privilege    => 'SELECT',
        p_grant_option => true);
end;
/

# Revoke

begin
    rdsadmin.rdsadmin_util.revoke_sys_object(
        p_obj_name  => 'V_$SESSION',
        p_revokee   => 'PYTHIAN',
        p_privilege => 'SELECT');
end;
/

 

Create Custom Functions to Verify Passwords:

begin
    rdsadmin.rdsadmin_password_verify.create_verify_function(
        p_verify_function_name => 'CUSTOM_PASSWORD_FUNCTION', 
        p_min_length           => 12, 
        p_min_uppercase        => 2, 
        p_min_digits           => 1, 
        p_min_special          => 1,
        p_disallow_at_sign     => true);
end;
/

If you want to double-check the generated code, here’s simple trick: Check on DBA_SOURCE:

col text format a150
select TEXT  from DBA_SOURCE 
where OWNER = 'SYS' and NAME = 'CUSTOM_PASSWORD_FUNCTION' order by LINE;

I hope this helps!

Oracle Database and Two-Factor Authentication (2FA)

$
0
0

Background

A common question from Oracle customers is whether they can configure the Oracle Database to use:

  • Two-Factor Authentication (2FA)
  • Multi-Factor Authentication (MFA)
  • A “Time-based One-time Password” (TOTP) code, which is usually a six-digit code generated from a hardware or software application.

The short answer is, yes!(though, not natively). You can implement it through other directory service technologies, namely RADIUS.

Previous blog posts discussed the setup and testing of the new Oracle 18c+ feature of Oracle “Centrally Managed Users” (CMU). It allows you to manage Oracle database users through Microsoft Active Directory (AD), and effectively offloads user management to AD. Oracle RADIUS authentication isn’t part of CMU. It’s a similar but slightly different implementation, and each has its own benefits and limitations.

However, by leveraging Oracle’s compatibility with the RADIUS protocol and an external directory service, you can achieve true 2FA capabilities for the Oracle database (and on a per-user basis, meaning that application and/or service accounts remain unaffected).

This post describes how to implement Oracle database 2FA using FreeRADIUS. The subsequent post extends the setup to use the commercially available Cisco Duo platform instead.

RADIUS Quick Summary

The RADIUS (Remote Authentication Dial-in Service) protocol is based on “AAA:” Authentication, Authorization, and Accounting. This post is based mainly on the Authentication part. Furthermore, RADIUS can operate in two modes: synchronous mode and challenge-response (asynchronous) mode. Oracle Database is actually capable of both (and has been since at least Oracle 8i). However, the most practical and applicable form is “synchronous,” and will therefore be the focus of this post.

RADIUS can authenticate against numerous sources including its own “users” flat file, LDAP directory services, Microsoft Active Directory, and others. RADIUS refers to “clients” and “NASs” (Network Access Servers, which broker the credentials). When authenticating against a RADIUS server, the Oracle Database acts as the “RADIUS client” and “NAS.”

From the Oracle Database “Database Licensing Information User Manual” (part number E94254-18 dated April 2020):

Network encryption (native network encryption, network data integrity, and SSL/TLS) and strong authentication services (Kerberos, PKI, and RADIUS) are no longer part of Oracle Advanced Security and are available in all licensed editions of all supported releases of Oracle Database.

Source: https://docs.oracle.com/en/database/oracle/oracle-database/19/dblic/database-licensing-information-user-manual.pdf

However, to use RADIUS authentication, the Oracle client software must include the Advanced Security option. Therefore, a full client installation is required and the InstantClient is not sufficient. Verify Oracle client software compatibility using the adapters command.

Authentication Process

The process flow is relatively simple, yet important to understand:

  1. The Oracle client attempts to connect to the database and provides credentials (username and password) along with a 2FA TOTP code.
  2. Oracle Database receives this information, finds the user in the local catalog, and determines that the authentication type is EXTERNAL.
  3. Oracle Database then scans the SQLNET.ORA file to determine where to authenticate EXTERNAL users.
  4. Using the RADIUS connectivity information from the SQLNET.ORA file, the Oracle Database passes the credential details onto the RADIUS server.
  5. The RADIUS server first authenticates the username/password with a directory service which could be a local file, Active Directory, an LDAP service, etc. This is “Primary Authentication.”
  6. If validated, the RADIUS server then authenticates the TOTP with the multi-factor authentication service (for example, the Google Authenticator PAM module, Cisco Duo, etc). This is “Secondary Authentication.”
  7. If also validated, the RADIUS server passes back the “Access-Accept” response to the Oracle Database which then accepts and completes the connection.

Here’s an illustration of the process:

Prerequisite Setup

To use 2FA with a TOTP, a “RADIUS server” is required. Some common options include:

This post focuses on setting up and configuring Oracle Database 2FA through FreeRADIUS. FreeRADIUS is a pretty simple software suite for DBAs to set up for testing and experimentation purposes without worrying about licensing costs or third-party vendors.

The first requirement is to find a location to run FreeRADIUS. This can be a local server or a Virtual Machine (for example, created within VirtualBox or a cloud VM). For examples on how to create a cloud VM in OCI for DBA testing and experimentation, go to https://blog.pythian.com/part-1-creating-an-oracle-18c-centrally-managed-users-testbed-using-oracle-cloud-infrastructure/.

VM requirements are minimal for testing/experimentation purposes. One or two vCPUs with 2 GB of memory should be sufficient.

For a base OS, many versions of Linux are supported including CentOS 7 and Oracle Linux 7.

Oracle Database and RADIUS

For an explanation about how Oracle Database interacts with RADIUS (and specifically FreeRADIUS), see Jared Still’s blog post: https://blog.pythian.com/using-freeradius-to-authorize-oracle-connections/

A key takeaway from that post: FreeRADIUS users need to be in UPPERCASE for Oracle Database compatibility.

Installing FreeRADIUS

This section summarizes the installation steps for FreeRADIUS on CentOS 7 or Oracle Linux 7. Perform all steps as “root” user, unless otherwise indicated:

If you are installing on Oracle Linux 7, add the necessary YUM repo (not required for CentOS 7):

cat << EOF > /etc/yum.repos.d/networkradius.repo
[networkradius]
name=NetworkRADIUS-7
baseurl=http://packages.networkradius.com/centos/7/repo/
gpgcheck=0
EOF

Perform other Linux prerequisites:

sed -i 's/=enforcing/=permissive/g' /etc/selinux/config
setenforce Permissive

yum -y update

Install the core FreeRADIUS package and the utilities package. The later adds testing tools:

yum -y install freeradius freeradius-utils

Adjust the Linux firewall (if required):

firewall-cmd --permanent --zone=public --add-port=1812/udp
firewall-cmd --permanent --zone=public --add-port=1813/udp
firewall-cmd --reload

Create a service for automatic-start (but don’t start it yet):

systemctl enable radiusd.service

Initial RADIUS Testing

To perform a basic test of the RADIUS functionality, a test user is required. FreeRADIUS is configured based on several files, including an “authorize” file. This used to be a “users” file. “users” is now just a symbolic link to “authorize“.

IMPORTANT: Directives in the “authorize” file are processed sequentially. Therefore, when testing, it’s recommended you put the entries near or at the top to ensure they aren’t skipped due to some preceding and/or superseding directive.

Add a test user called “testuser” with a password of “Passw0rd” to the file:

sed -i '1itestuser Cleartext-Password := \"Passw0rd\"' /etc/raddb/mods-config/files/authorize
head -4 /etc/raddb/mods-config/files/authorize

To test, it’s recommended to manually start the FreeRADIUS software (instead of through the Linux service) in debug mode:

radiusd -X

From a different window/session, test the connection using the command:

radtest testuser Passw0rd localhost 0 testing123

IMPORTANT: The “testing123” argument is a “RADIUS secret” configured in the default “RADIUS client.” This secret is used for secure communication between RADIUS clients and the server. See Jared Still’s blog post for more information.

The result should be “Received Access-Accept …

FreeRADIUS Linux-PAM Setup

The next step is to set up FreeRADIUS to use Linux-PAM (Pluggable Authentication Modules).

IMPORTANT: For initial proof of concept experimentation and testing, local Linux users are authenticated using RADIUS and 2FA. Therefore, in this test scenario, the RADIUS server runs with root permissions so it can read all 2FA configuration files. In a production or hardened environment (including those using external directory services) you should configure FreeRADIUS to run in a “lower-permission” environment.

Change both user and group to root (in the test environment):

grep 'user\|group' /etc/raddb/radiusd.conf | grep = | grep -v ^[[:space:]]\#
sed -i '/user =/  s/\(\#\)\{0,1\}[[:space:]]user = radius.*/\t user = root/'   /etc/raddb/radiusd.conf
sed -i '/group =/ s/\(\#\)\{0,1\}[[:space:]]group = radius.*/\t group = root/' /etc/raddb/radiusd.conf
grep 'user\|group' /etc/raddb/radiusd.conf | grep = | grep -v ^[[:space:]]\#

Then, enable Pluggable Authentication Modules (PAM) including adding the required symbolic links:

grep pam /etc/raddb/sites-enabled/default
sed -i '/^\#[[:space:]]pam/ s/^\#//' /etc/raddb/sites-enabled/default
grep pam /etc/raddb/sites-enabled/default

ln -s /etc/raddb/mods-available/pam /etc/raddb/mods-enabled/pam
ls -l /etc/raddb/mods-enabled/pam
ls -l /etc/raddb/mods-available/pam

Change auth-type to PAM in the authorize file. Remember that file directives are processed sequentially, so place it near the top:

sed -i '2iDEFAULT Auth-Type := PAM' /etc/raddb/mods-config/files/authorize
head -5 /etc/raddb/mods-config/files/authorize

Finally, add a new Linux user on the same system for testing (remember that this user must be in uppercase for Oracle Database usage):

useradd RADUSER
passwd RADUSER
# Assume the set password is Passw0rd

Test FreeRADIUS and Linux-PAM authorization by running the FreeRADIUS server in debugging mode again:

radiusd -X

From another session/window, test a connection:

radtest RADUSER Passw0rd localhost 0 testing123

The result should again be “Received Access-Accept …

Google Authenticator PAM Module Setup

Google provides an example PAM module for 2FA through https://github.com/google/google-authenticator-libpam (Cisco Duo or Otka provide alternative options).

Install the Google Authenticator PAM module:

yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum -y install google-authenticator

Perform a basic configuration for the RADUSER local Linux user:

sudo -u RADUSER google-authenticator \
   --time-based \
   --allow-reuse \
   --no-rate-limit \
   --window-size=3 \
   --force

IMPORTANT: The command above uses configuration options optimized for testing. In an actual production environment, I recommend reviewing all options and deploying in a more more secure manner. For example, using “–disallow-reuse” and similar.

Sample output:

You can generate the TOTP codes using a mobile phone application such as “Google Authenticator” or “Authy.” Scan the generated QR code with the phone app or manually enter the “secret key” into the app.

However, when testing, constantly generating codes on a phone can become cumbersome. Generating the codes using a Linux command might be more efficient.

To generate codes from the Linux bash shell, install the oathtool package:

yum -y install oathtool

Then, you can generate codes using the “secret key” from the user’s ~/.google-authenticator file. For example:

oathtool --base32 --totp "`head -1 /home/RADUSER/.google_authenticator`"

Finally, you must adjust the Linux-PAM configuration to reference the new Google Authenticator PAM Module:

cat << EOF > /etc/pam.d/radiusd
#%PAM-1.0
auth       requisite    pam_google_authenticator.so forward_pass
auth       required     pam_sepermit.so
auth       substack     password-auth
auth       include      postlogin
account    required     pam_nologin.so
account    include      password-auth
session    include      password-auth
EOF

Testing Using the Local User and One-time Password

With the FreeRADIUS and Google Authenticator PAM Module implementation of 2FA, the password must be a single string that is comprised of the actual user password and the passcode. So, the format is: <password><TOTP>.

Other implementations (such as Cisco Duo) might allow for a comma between the two, or push notification to a mobile device. However, for FreeRADIUS and the Google Authenticator PAM Module, the format is one continuous string with no additional characters.

So, to test:

MY_PASSWD=Passw0rd
MY_OTP=$(oathtool --base32 --totp "`head -1 /home/RADUSER/.google_authenticator`")
radtest RADUSER ${MY_PASSWD}${MY_OTP} localhost 0 testing123

Example successful output:

 

 

 

 

 

Testing from the Oracle Database

The previously referenced blog post by Jared Still covers Oracle setup for FreeRADIUS. Review this post for additional information about the Oracle setup steps in this section. Repeating the steps in a simplified format:

As “root” on the FreeRADIUS server, add a FreeRADIUS client that the Oracle Databases will use to connect:

cat << EOF >> /etc/raddb/clients.conf

client oracle-db {
    ipaddr = 192.168.1.0/24
    secret = secretoracle
    shortname = oracle
    nastype = other
}
EOF

Restart the FreeRADIUS service to ensure all changes take effect:

systemctl restart radiusd.service
systemctl status radiusd.service

As the “oracle” user (or Oracle software owner) on the Oracle Database server, save the RADIUS secret to a file:

mkdir -p ${ORACLE_HOME}/network/security
echo "secretoracle" > ${ORACLE_HOME}/network/security/radius.key
chmod 600 ${ORACLE_HOME}/network/security/radius.key

Add the required entries to the SQLNET.ORA file and verify the file (add the appropriate IP address for the FreeRADIUS server):

cat <<EOF >> ${ORACLE_HOME}/network/admin/sqlnet.ora
SQLNET.AUTHENTICATION_SERVICES=(BEQ,TCPS,RADIUS)
SQLNET.RADIUS_AUTHENTICATION=192.168.1.212
SQLNET.RADIUS_AUTHENTICATION_PORT=1812
SQLNET.RADIUS_SECRET=${ORACLE_HOME}/network/security/radius.key
EOF

If you want, you can include other parameters such as the RADIUS timeout and retries. Alternatively, for higher availability and redundancy, use the SQLNET.RADIUS_ALTERNATE parameter to specify a backup RADIUS server to use if the primary becomes unavailable.

IMPORTANT: The SQLNET.ORA configuration above is for the database server only (RDBMS home). For Oracle clients, all that’s required is that the SQLNET.AUTHENTICATION_SERVICES parameter includes RADIUS. The other SQLNET.ORA parameters are not required/used in Oracle client homes.

Make sure that the database initialization parameters are set properly:

sqlplus -s / as sysdba << EOF
alter system set os_authent_prefix='' scope=spfile sid='*';
alter system set remote_os_authent=false scope = spfile sid='*';
shutdown immediate
startup
alter pluggable database all open;
EOF

Add the test user to the database:

sqlplus -s / as sysdba << EOF
alter session set container=PDB1;
create user raduser identified externally;
grant create session to raduser;
grant select on v_\$database to raduser;
EOF

Testing is as simple as providing the TOTP in the password field. Because the user was added to the database using “ IDENTIFIED EXTERNALLY,” this is the Linux user’s password.

Using a connection string such as <username>/<password><passcode>, here’s an example of a full connection using SQLPlus:

Success! An Oracle Database connection to a database user using an external credential and 2FA!

If the password or the TOTP code is incorrect, then the standard “ORA-01017: invalid username/password; logon denied” error occurs.

Next Steps: Microsoft Active Directory

For many implementations, the next logical step is implemented using Microsoft Active Directory (AD) users instead of local Linux users. Switching this test setup to use AD instead of local users for authentication is relatively easy and involves the following:

cat << EOF > /etc/pam.d/radiusd
#%PAM-1.0
auth       requisite    pam_google_authenticator.so forward_pass
auth       required     pam_sss.so use_first_pass
account    required     pam_nologin.so
account    include      password-auth
session    include      password-auth
EOF
  • Logging to the FreeRADIUS server using the AD domain user and generating the ~/.google-authenticator configuration file using the previously described process.
  • Creating the domain user in the database and testing.

Conclusion

Configuring the Oracle Database to use 2FA is something that many inquire about but few actually do. However, doing so isn’t overly complex and allows you to leverage open source software such as FreeRADIUS. All that’s required is a small VM to act as a RADIUS server.

In a real production environment, this server could become a single point of failure and therefore might require you to set it up with reliable infrastructure and/or in a redundant configuration (by including the SQLNET.RADIUS_ALTERNATE parameter in the SQLNET.ORA file).

This environment is simple and flexible for DBA testing. However, most enterprise users will instead want to leverage a more robust and flexible solution such as the commercial offerings from Cisco Duo or Otka. Configuring Oracle Database authentication using Cisco Duo is covered in the next post in this series.

The Brand New Exadata X8M Deployment Process Revealed

$
0
0

Here we will see how the deployment process of the new Exadata X8M works.

 

RoCE issues from the factory

Exadata X8M servers are coming from the factory with the RoCE private network disabled. In case the Field Engineer assigned to work on the physical setup of the Exadata did not enable the RoCE network it is your job to do so.

RoCE network must be enabled on all Compute Nodes and also on all Storage Servers.

In Exadata X8M the private network is not on InfiniBand switches anymore, but on RoCE (RDMA over Converged Ethernet) Fabric switches. The interface cards we see in the operating system are re0 and re1.

When checking the active interface cards we cannot see re0 and re1:

[root@ex03db01 ~]# ifconfig
bondeth0: flags=5187<up,broadcast,running,master,multicast>  mtu 1500
        inet 10.201.80.54  netmask 255.255.254.0  broadcast 10.201.81.255
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 54309  bytes 3744342 (3.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14088  bytes 1318384 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
eth0: flags=4163<up,broadcast,running,multicast>  mtu 1500
        inet 10.201.84.190  netmask 255.255.254.0  broadcast 10.201.85.255
        ether 00:10:e0:ee:c5:6c  txqueuelen 1000  (Ethernet)
        RX packets 279171  bytes 18019054 (17.1 MiB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 9553  bytes 1693920 (1.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x9ca00000-9cafffff
 
eth3: flags=6211<up,broadcast,running,slave,multicast>  mtu 1500
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 31847  bytes 2396622 (2.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14088  bytes 1318384 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
eth4: flags=6211<up,broadcast,running,slave,multicast>  mtu 1500
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 22492  bytes 1349520 (1.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2  bytes 104 (104.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
lo: flags=73<up,loopback,running>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 136405  bytes 6139347 (5.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 136405  bytes 6139347 (5.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Most of the InfiniBand related commands/tools do not work anymore, but ibstat still does, so we can use that tool to check the state of the private network state:

[root@ex03db01 ~]# ibstat | grep -i 'state\|rate'
                State: Down
                Physical state: Disabled
                Rate: 100
                State: Down
                Physical state: Disabled
                Rate: 100

Checking the config of RoCE interface cards:

[root@ex03db01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-re0
#### DO NOT REMOVE THESE LINES ####
#### %GENERATED BY CELL% ####
DEVICE=re0
BOOTPROTO=none
ONBOOT=no
HOTPLUG=no
IPV6INIT=no
 
[root@ex03db01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-re1
#### DO NOT REMOVE THESE LINES ####
#### %GENERATED BY CELL% ####
DEVICE=re1
BOOTPROTO=none
ONBOOT=no
HOTPLUG=no
IPV6INIT=no

Bringing RoCE interface cards up:

[root@ex03db01 ~]# ifup re0
/sbin/ifup-local: /sbin/ifup-local re0:
/sbin/ifup-local:  + RoCE configuration...
/sbin/ifup-local:  + Matched (wildcard) interface re0.
/sbin/ifup-local:  + RoCE Configuration: /bin/roce_config -i re0...
 
NETDEV=re0; IBDEV=mlx5_0; PORT=1
 + RoCE v2 is set as default rdma_cm preference
 + Tos mapping is set
 + Default roce tos is set to 32
 + Trust mode is set to dscp
 + PFC is configured as 0,1,1,1,1,1,0,0
 + Congestion control algo/mask are set as expected
 + Buffers are configured as 32768,229120,0,0,0,0,0,0
 
Finished configuring "re0" ã½(â¢â¿â¢)ã
 
/sbin/ifup-local:  + Non-RoCE Configuration...
/sbin/ifup-local: Non-RoCE Configuration: Nothing to do for re0.
 
 
[root@ex03db01 ~]# ifup re1
/sbin/ifup-local: /sbin/ifup-local re1:
/sbin/ifup-local:  + RoCE configuration...
/sbin/ifup-local:  + Matched (wildcard) interface re1.
/sbin/ifup-local:  + RoCE Configuration: /bin/roce_config -i re1...
 
NETDEV=re1; IBDEV=mlx5_0; PORT=2
 + RoCE v2 is set as default rdma_cm preference
 + Tos mapping is set
 + Default roce tos is set to 32
 + Trust mode is set to dscp
 + PFC is configured as 0,1,1,1,1,1,0,0
 + Congestion control algo/mask are set as expected
 + Buffers are configured as 32768,229120,0,0,0,0,0,0
 
Finished configuring "re1" ã½(â¢â¿â¢)ã
 
/sbin/ifup-local:  + Non-RoCE Configuration...
/sbin/ifup-local: Non-RoCE Configuration: Nothing to do for re1.

Now we can see that the interfaces re0 and re1 are up, but with no IPs assigned:

[root@ex03db01 ~]# ifconfig
bondeth0: flags=5187<up,broadcast,running,master,multicast>  mtu 1500
        inet 10.201.80.54  netmask 255.255.254.0  broadcast 10.201.81.255
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 54533  bytes 3767354 (3.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14414  bytes 1349944 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
eth0: flags=4163<up,broadcast,running,multicast>  mtu 1500
        inet 10.201.84.190  netmask 255.255.254.0  broadcast 10.201.85.255
        ether 00:10:e0:ee:c5:6c  txqueuelen 1000  (Ethernet)
        RX packets 279584  bytes 18051211 (17.2 MiB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 9727  bytes 1720009 (1.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x9ca00000-9cafffff
 
eth3: flags=6211<up,broadcast,running,slave,multicast>  mtu 1500
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 32071  bytes 2419634 (2.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14414  bytes 1349944 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
eth4: flags=6211<up,broadcast,running,slave,multicast>  mtu 1500
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 22492  bytes 1349520 (1.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2  bytes 104 (104.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
lo: flags=73<up,loopback,running>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 136804  bytes 6157123 (5.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 136804  bytes 6157123 (5.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
re0: flags=4163<up,broadcast,running,multicast>  mtu 1500
        ether 0c:42:a1:3b:45:12  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
re1: flags=4163<up,broadcast,running,multicast>  mtu 1500
        ether 0c:42:a1:3b:45:13  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

We can use ibstat again to confirm the interfaces are enabled:

[root@ex03db01 ~]# ibstat | grep -i 'state\|rate'
                State: Active
                Physical state: LinkUp
                Rate: 100
                State: Active
                Physical state: LinkUp
                Rate: 100

OEDA specifics

To start any Exadata deployment you need the OEDA configuration files. They are a set of files generated by the OEDA (Oracle Exadata Deployment Assistant) tool. OEDA tool is currently a web-based tool that will allow the client to fill up all the IP addresses and hostnames that the new Exadata will be assigned. Normally this step is taken by the client with the support of their network team.

Configuration files needed:

  • Clientname-clustername.xml
  • Clientname-clustername-InstallationTemplate.html
  • Clientname-clustername-preconf.csv

The OEDA tool for Linux is also needed and can be downloaded from the Patch ID 30640393. It is recommended to go with the latest version available, but if the configuration files were generated with a different/older version go with that version to avoid warnings during the execution of the onecommand.

Stage the OEDA for Linux in /u01/onecommand/ and unzip it:

[root@ex03db01 ~]# mkdir -p /u01/onecommand/
[root@ex03db01 ~]# unzip -q p30640393_193800_Linux-x86-64.zip -d /u01/onecommand/
[root@ex03db01 ~]# cd /u01/onecommand/linux-x64

Once in the correct directory run onecommand to list the steps just to make sure it is working:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -l
 Initializing
 
1. Validate Configuration File
2. Setup Required Files
3. Create Users
4. Setup Cell Connectivity
5. Verify Infiniband
6. Calibrate Cells
7. Create Cell Disks
8. Create Grid Disks
9. Install Cluster Software
10. Initialize Cluster Software
11. Install Database Software
12. Relink Database with RDS
13. Create ASM Diskgroups
14. Create Databases
15. Apply Security Fixes
16. Install Autonomous Health Framework
17. Create Installation Summary
18. Resecure Machine

applyElasticConfig.sh preparation and execution

Technical background

applyElasticConfig.sh is a script, provided by Oracle within the OEDA, which performs the initial setup of the compute nodes and storage servers. That script works with the factory IP range and hostnames by default, but we found a way to trick it and make it work even when the client had already changed the IP addresses and hostnames. The initial setup is basically defining the network configuration, IP addresses, hostnames, DNS and NTP configuration and the script will look for nodes in the IP range of the 172.x.x.x network, so if the client had already changed the IPs and hostnames the script will not find anything. It is worth to mention that there is no documentation about this anywhere in the docs.oracle.com. You can find something here:

  • Configuring Oracle Exadata Database Machine
  • ApplyElasticConfig failed during the execution of elasticConfig.sh (Doc ID 2175587.1)
  • Bug 23064772 OEDA: applyelasticconfig.sh fails with error unable to locate rack item with ulocation

Even though these documents briefly mention the applyElasticConfig.sh script they do not mention how to overcome the issue when the IPs and hostnames were already changed.

Preparation

In order to make the script look for the servers when their hostnames and IPs were changed, you have to edit the es.properties file which is located under /u01/onecommand/linux-x64/properties. Consider changing only the parameters related to the IPs, Subnets, and Hostnames. The variables we care about are: ROCEELASTICNODEIPRANGE, ROCEELASTICILOMIPRANGE, ELASTICSUBNETS and SKIPHOSTNAMECHECK. Change those to the range of IPs found in the Clientname-clustername-InstallationTemplate.html for each network:

  • ROCEELASTICNODEIPRANGE expects the range of IPs in the management network.
  • ROCEELASTICILOMIPRANGE expects the range of IPs of the ILOM of the servers.
  • ELASTICSUBNETS expects the subnet of the management network.
  • SKIPHOSTNAMECHECK defaults to false, so if the hostnames were also changed you want to set this to true.

Find some examples below:

[root@ex03db01 linux-x64]# cat properties/es.properties|grep ELASTIC
#ROCEELASTICNODEIPRANGE=192.168.1.1:192.168.1.99
ROCEELASTICNODEIPRANGE=10.201.84.190:10.201.84.206
ROCEELASTICILOMIPRANGE=10.201.84.196:10.201.84.201
ELASTICCONFIGMARKERFILE=/.elasticConfig
ELASTICRACKNAMES=x5,x6,sl6,x7,x8
QINQELASTICCONFIGMINVERION=20.1.0.0.0.200323
#ELASTICSUBNETS=172.16.2:172.16.3:172.16.4:172.16.5:172.16.6:172.16.7
ELASTICSUBNETS=10.201.84
 
[root@ex03db01 linux-x64]# grep SKIPHOST properties/es.properties
#SKIPHOSTNAMECHECK=false
SKIPHOSTNAMECHECK=true

Execution

Now that you have the es.properties ELASTIC* parameters matching your infrastructure configuration you are ready to execute the applyElasticConfig.sh script. To execute it you just need to call the script passing the Clientname-clustername.xml configuration file to it:

[root@ex03db01 linux-x64]# ./applyElasticConfig.sh -cf /root/config/Client-ex03.xml
 Applying Elastic Config...
 Discovering pingable nodes in IP Range of 10.201.84.190 - 10.201.84.206.....
 Found 6 pingable hosts..[10.201.84.193, 10.201.84.194, 10.201.84.195, 10.201.84.190, 10.201.84.191, 10.201.84.192]
 Validating Hostnames..
 Discovering ILOM IP Addresses..
 Getting uLocations...
 Getting Mac Addressess..
 Getting uLocations...
 Mapping Machines with local hostnames..
 Mapping Machines with uLocations..
 Checking if Marker file exists..
 Updating machines with Mac Address for 6 valid machines.
 Creating preconf..
 Writing host-specific preconf files..
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03cel02_preconf.csv for ex03cel02 ....
 Preconf file copied to ex03cel02 as /var/log/exadatatmp/firstconf/ex03cel02_preconf.csv
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03db01_preconf.csv for ex03db01 ....
 Preconf file copied to ex03db01 as /var/log/exadatatmp/firstconf/ex03db01_preconf.csv
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03db03_preconf.csv for ex03db03 ....
 Preconf file copied to ex03db03 as /var/log/exadatatmp/firstconf/ex03db03_preconf.csv
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03cel03_preconf.csv for ex03cel03 ....
 Preconf file copied to ex03cel03 as /var/log/exadatatmp/firstconf/ex03cel03_preconf.csv
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03cel01_preconf.csv for ex03cel01 ....
 Preconf file copied to ex03cel01 as /var/log/exadatatmp/firstconf/ex03cel01_preconf.csv
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03db02_preconf.csv for ex03db02 ....
 Preconf file copied to ex03db02 as /var/log/exadatatmp/firstconf/ex03db02_preconf.csv
 Running Elastic Configuration on ex03cel02.client.com
 Running Elastic Configuration on ex03db01.client.com
 Running Elastic Configuration on ex03db03.client.com
 Running Elastic Configuration on ex03cel03.client.com
 Running Elastic Configuration on ex03cel01.client.com
 Running Elastic Configuration on ex03db02.client.com
 /////

OEDA onecommand preparation and execution

Technical background

OEDA is a set of scripts, files, and a form we use to plan and deploy an Exadata. Sometimes we refer to it as the onecommand utility. It is called onecommand because with just one command we can deploy everything. This onecommand is the install.sh script.

Preparation

To be able to run the install.sh script we have to prepare some things first in the environment. Some prerequisites:

  • The switches must have been already set up by the Field Engineer responsible for the physical installation of the hardware.
  • The applyElasticConfig.sh script must have been run and completed successfully.
  • The files listed in the “Appendix B” of the Clientname-clustername-InstallationTemplate.html must be staged to /u01/onecommand/linux-x64/WorkDir.

Stage the files listed in the “Appendix B” of the Clientname-clustername-InstallationTemplate.html to /u01/onecommand/linux-x64/WorkDir:

[root@ex03db01 ~]# ls -lh /u01/onecommand/linux-x64/WorkDir
total X.9G
-rwxr-xr-x 1 root root 355M Jun  9 12:34 ahf_setup
-rw-r--r-- 1 root root 2.9G Jun  9 12:54 V982063-01.zip
-rw-r--r-- 1 root root 2.7G Jun  9 12:57 V982068-01.zip
-rw-r--r-- 1 root root 2.4G Jun  9 12:57 p30805684_190000_Linux-x86-64.zip
-rw-r--r-- 1 root root 600M Jun  9 12:57 p6880880_180000_Linux-x86-64.zip
-rw-r--r-- 1 root root 1.3G Jun  9 12:57 p30899722_190000_Linux-x86-64.zip

After all of this is done you can run the step 1 to validate the configuration files with the environment:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -s 1
 Initializing
 Executing Validate Configuration File
 Validating cluster: ex03-clu1
  Locating machines...
 Validating platinum...
 Checking Disk Tests Status....
 Disks Tests are not running/active on any of the Storage Servers or not applicable for this Image Version.
 Validating nodes for database readiness...
 Completed validation...
 
 SUCCESS: Ip address: 10.201.84.190 is configured correctly
 SUCCESS: Ip address: 10.201.80.54 is configured correctly
 SUCCESS: Ip address: 10.201.84.191 is configured correctly
 SUCCESS: Ip address: 10.201.80.55 is configured correctly
 SUCCESS: Ip address: 10.201.84.192 is configured correctly
 SUCCESS: Ip address: 10.201.80.56 is configured correctly
 SUCCESS: Ip address: 10.201.80.60 is configured correctly
 SUCCESS: Ip address: 10.201.80.62 is configured correctly
 SUCCESS: Ip address: 10.201.80.61 is configured correctly
 SUCCESS: Ip address: 10.201.80.58 is configured correctly
 SUCCESS: Ip address: 10.201.80.59 is configured correctly
 SUCCESS: Ip address: 10.201.80.57 is configured correctly
 SUCCESS: Validated NTP server 10.248.1.1
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/V982063-01.zip exists...
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/p30805684_190000_Linux-x86-64.zip exists...
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/V982068-01.zip exists...
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/p6880880_180000_Linux-x86-64.zip exists...
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/p30899722_190000_Linux-x86-64.zip exists...
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/ahf_setup exists...
 SUCCESS: Disks Tests are not running/active on any of the Storage Servers or not applicable for this Image Version.
 SUCCESS: Required Kernel Version 4.14.35.1902.9.2 for Oracle19c found on ex03db01
 SUCCESS: Required Kernel Version 4.14.35.1902.9.2 for Oracle19c found on ex03db02
 SUCCESS: Required Kernel Version 4.14.35.1902.9.2 for Oracle19c found on ex03db03
 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with UEK5 on  ex03db01
 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with UEK5 on  ex03db02
 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with UEK5 on  ex03db03
 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with image version 19.3.6.0.0 on Cluster ex03-clu1
 SUCCESS: DatabaseHome Version 19.7.0.0.200414 is compatible with image version 19.3.6.0.0 on Cluster ex03-clu1
 SUCCESS: Disk size 14000GB on cell ex03cel01.client.com matches the value specified in the OEDA configuration file
 SUCCESS: Disk size 14000GB on cell ex03cel02.client.com matches the value specified in the OEDA configuration file
 SUCCESS: Disk size 14000GB on cell ex03cel03.client.com matches the value specified in the OEDA configuration file
 SUCCESS: Number of physical disks on ex03cel01.client.com matches the value specified in OEDA configuration file
 SUCCESS: Number of physical disks on ex03cel02.client.com matches the value specified in OEDA configuration file
 SUCCESS: Number of physical disks on ex03cel03.client.com matches the value specified in OEDA configuration file
 Successfully completed execution of step Validate Configuration File [elapsed Time [Elapsed = 85395 mS [1.0 minutes] Tue Jun 09 22:51:44 PDT 2020]]

If it finishes successfully you are good to move forward.

Execution

Now we just need to execute the remaining steps. You can execute one-by-one or all in a row. I normally do the step 1 and step 2 separate from the others just because they tend to fail easier than others. Running all of them in a row would not cause any harm since once any step fails the execution will immediately stop. So it is up to you how you would like to execute it.

In case you need to undo any of the steps you can use the -u and the step you would like to undo. You can use the install.sh -h to help you on that:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/Client-ex03.xml -h
 Warning: Invalid input(s) for {-h=null}
 **********************************
 
  install.sh -cf <config.xml> -l [options]
  install.sh -cf <config.xml> -s <step #=''> | -r <num-num>
  install.sh
  ARGUMENTS:
   -l                 List all the steps that exist
   -cf                Use to specify the full path for the config file
   -s <step #=''>        Run only the specified step
   -r <num-num>       Run the steps one after the other as long as no errors
                      are encountered
   -u <num-num> | <step#> Undo a range of steps or a particular step
                      For a range of steps, specify the steps in reverse order
   -h                 Print usage information
   -override          Force to run undo steps related to celldisk and grid disk
   -force             Delete binaries under grid home and database home when
                      uninstalling clusterware and database software
   -delete            Delete staging area/directories
   -nocalibratecell   Create the installation summary file without running the calibrate cell command
   -noinfinicheck     Create the installation summary file without running InfiniBand verification
   -p                 Prompts for root password for each or all the nodes. This option allows
                      deployments in Exadata environments with non-default and/or different
                       root passwords on each of the nodes in the rack
   -usesu             Use SU with root account to run commands for grid/oracle users
   -sshkeys           Run deployment with root SSH Keys that are setup by setuprootssh.sh or oedacli. Must be used with "-usesu"
   -customstep        Run custom actions. Actions can be:
                           updatecellroute:  generate cellroute.ora in domUs
   -clustername       Specify the cluster name, or All. Only used with -customstep to specify
                       the cluster on which to run the custom action
   -upgradeNetworkFirmware  X7 Broadcom network card Firmware upgrade
  Version : 200519

To undo a step simply execute this one to undo step 2:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/Client-ex03.xml -u 2

Or to undo from step 2 to step 4:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/Client-ex03.xml -u 2-4

Here is the execution of step 2:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -s 2
 Initializing
 Executing Setup Required Files
 Copying and extracting required files...
 Required files are:
 /u01/onecommand/linux-x64/WorkDir/p30899722_190000_Linux-x86-64.zip
 /u01/onecommand/linux-x64/WorkDir/p6880880_180000_Linux-x86-64.zip
 /u01/onecommand/linux-x64/WorkDir/p30805684_190000_Linux-x86-64.zip
 /u01/onecommand/linux-x64/WorkDir/V982068-01.zip
 /u01/onecommand/linux-x64/WorkDir/V982063-01.zip
 Copying required files...
 Checking status of remote files...
 Checking status of existing files on remote nodes...
 Getting status of local files...
 Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/V982063-01.zip at /u01/app/oracle/Oeda/Software/V982063-01.zip
 Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/V982068-01.zip at /u01/app/oracle/Oeda/Software/V982068-01.zip
 Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/p30805684_190000_Linux-x86-64.zip at /u01/app/oracle/Oeda/Software/p30805684_190000_Linux-x86-64.zip
 Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/p30899722_190000_Linux-x86-64.zip at /u01/app/oracle/Oeda/Software/p30899722_190000_Linux-x86-64.zip
 Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/p6880880_180000_Linux-x86-64.zip at /u01/app/oracle/Oeda/Software/Patches/p6880880_180000_Linux-x86-64.zip
 Copying file: p30805684_190000_Linux-x86-64.zip to node ex03db02.client.com
 Copying file: p30899722_190000_Linux-x86-64.zip to node ex03db02.client.com
 Copying file: p6880880_180000_Linux-x86-64.zip to node ex03db02.client.com
 Copying file: p30805684_190000_Linux-x86-64.zip to node ex03db03.client.com
 Copying file: p30899722_190000_Linux-x86-64.zip to node ex03db03.client.com
 Copying file: p6880880_180000_Linux-x86-64.zip to node ex03db03.client.com
 Completed copying files...
 Extracting required files...
 Copying resourcecontrol and other required files
 No config Keys in the configuration file..
 Creating databasemachine.xml for EM discovery
 Done Creating databasemachine.xml for EM discovery
 Successfully completed execution of step Setup Required Files [elapsed Time [Elapsed = 325110 mS [5.0 minutes] Wed Jun 10 12:16:46 CDT 2020]]

Here is the execution of steps from 3 to 8:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -r 3-8
 Initializing
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Create Users
 Creating users...
 Creating users in cluster ex03-clu1
 Validating existing users and groups...
 Creating required directories on nodes in cluster ex03-clu1
 Updating /etc/hosts on nodes in cluster ex03-clu1
 Setting up ssh for users in cluster ex03-clu1
 Creating cell diag collection user CELLDIAG on cell servers..
 Completed creating all users...
 Successfully completed execution of step Create Users [elapsed Time [Elapsed = 77818 mS [1.0 minutes] Wed Jun 10 12:20:31 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Setup Cell Connectivity
 Creating cellip.ora and cellinit.ora  ...
 Creating cellip.ora for cluster ex03-clu1
 Creating cellinit.ora for cluster ex03-clu1
 Done creating cellip.ora and cellinit.ora...
 Successfully completed execution of step Setup Cell Connectivity [elapsed Time [Elapsed = 14675 mS [0.0 minutes] Wed Jun 10 12:20:52 CDT 2020]]
 Executing Verify Infiniband
 Validating infiniband network with rds-ping...
 Check Admin network connectivity...
 Running infinicheck to verify infiniband fabric for cluster ex03-clu1...
 Running verify topology to verify infiniband network...
 No Infiniband link errors found...
 SUCCESS: Verify topology does not report any errors on node ex03db01.client.com...
 ****************ex03db01*****************
 Command: /opt/oracle.SupportTools/ibdiagtools/verify-topology
 Verify topology is not supported on RoCE
 ********************************************
 SUCCESS: Verify topology does not report any errors on node ex03db02.client.com...
 ****************ex03db02*****************
 Command: /opt/oracle.SupportTools/ibdiagtools/verify-topology
 Verify topology is not supported on RoCE
 ********************************************
 SUCCESS: Verify topology does not report any errors on node ex03db03.client.com...
 ****************ex03db03*****************
 Command: /opt/oracle.SupportTools/ibdiagtools/verify-topology
 Verify topology is not supported on RoCE
 ********************************************
 Successfully completed execution of step Verify Infiniband [elapsed Time [Elapsed = 280227 mS [4.0 minutes] Wed Jun 10 12:25:37 CDT 2020]]
 Executing Calibrate Cells
 Calibrating cells...
 Successfully completed execution of step Calibrate Cells [elapsed Time [Elapsed = 461064 mS [7.0 minutes] Wed Jun 10 12:33:18 CDT 2020]]
 Executing Create Cell Disks
 Validating Self-Signed Certificates on cell servers...
 Fixing Cell Certificates on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com]
 Reconfiguring WLS...
 Cell name attribute does not match hostnames
 Cell ex03cel03 has cell name ru06, cell name attribute will be reset to ex03cel03
 Cell ex03cel01 has cell name ru02, cell name attribute will be reset to ex03cel01
 Cell ex03cel02 has cell name ru04, cell name attribute will be reset to ex03cel02
 Checking physical disks for errors before creating celldisks
 Creating cell disks...
 Dropping Flash Cache before enabling WriteBack on cells [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com]
 Enable FlashCache mode to WriteBack in [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com]
 Creating flashcache on cells...
 Successfully completed execution of step Create Cell Disks [elapsed Time [Elapsed = 218067 mS [3.0 minutes] Wed Jun 10 12:36:56 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Create Grid Disks
 Creating grid disks for cluster ex03-clu1
 Checking Cell Disk status...
 Successfully completed execution of step Create Grid Disks [elapsed Time [Elapsed = 123858 mS [2.0 minutes] Wed Jun 10 12:39:04 CDT 2020]]
[root@ex03db01 linux-x64]#

Here is the execution of steps from 9 to 16:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -r 9-16
 Initializing
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Install Cluster Software
 Installing cluster ex03-clu1
 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin
 Writing grid response file for cluster ex03-clu1
 Running clusterware installer...
 Setting up Opatch for cluster ex03-clu1
 Patching cluster ex03-clu1...
 Successfully completed execution of step Install Cluster Software [elapsed Time [Elapsed = 667497 mS [11.0 minutes] Wed Jun 10 12:51:15 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Initialize Cluster Software
 Initializing cluster ex03-clu1
 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin
 Writing grid response file for cluster ex03-clu1
 Running root.sh on node ex03db01.client.com
 Checking file root_ex03db01.client.com_2020-06-10_12-54-03-631071286.log on node ex03db01.client.com
 Running root.sh on node ex03db02.client.com
 Checking file root_ex03db02.client.com_2020-06-10_13-02-42-916817198.log on node ex03db02.client.com
 Running root.sh on node ex03db03.client.com
 Checking file root_ex03db03.client.com_2020-06-10_13-05-42-659221162.log on node ex03db03.client.com
 Generating response file for Configuration Tools...
 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin
 Writing grid response file for cluster ex03-clu1
 Running Configuration Assistants on ex03db01.client.com
 Checking status of cluster...
 Cluster Verification completed successfully
 Successfully completed execution of step Initialize Cluster Software [elapsed Time [Elapsed = 1184567 mS [19.0 minutes] Wed Jun 10 13:11:06 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Install Database Software
 Installing database software ...
 Validating nodes for database readiness...
 Installing database software with database home name DbHome1
 Installing database software ...
 Extracting Database Software file /u01/app/oracle/Oeda/Software/V982063-01.zip into /u01/app/oracle/product/19.0.0.0/dbhome_1
 Running database installer on node ex03db01.client.com ... Please wait...
 After running database installer...
 Patching Database Home /u01/app/oracle/product/19.0.0.0/dbhome_1
 Successfully completed execution of step Install Database Software [elapsed Time [Elapsed = 717961 mS [11.0 minutes] Wed Jun 10 13:23:11 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Relink Database with RDS
 Successfully completed execution of step Relink Database with RDS [elapsed Time [Elapsed = 36009 mS [0.0 minutes] Wed Jun 10 13:23:54 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Create ASM Diskgroups
 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin
 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin
 Validating ASM Diskgroups..
 Successfully completed execution of step Create ASM Diskgroups [elapsed Time [Elapsed = 138147 mS [2.0 minutes] Wed Jun 10 13:26:20 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Create Databases
 Setting up Huge Pages for Database..[test]
 Creating database [test]...
 Patch 30805684 requires specific post-installation steps. Databases will be restarted ...
 Running datapatch on database [test]
 Recompiling Invalid Objects (if any) on database [test]
 Successfully completed execution of step Create Databases [elapsed Time [Elapsed = 1252604 mS [20.0 minutes] Wed Jun 10 13:47:19 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Apply Security Fixes
 Setting up Huge Pages for ASM Instance..
 Bouncing clusterware to set required parameters...
 Checking and enabling turbo mode if required...
 ex03db03.client.com Command: /opt/oracle.SupportTools/fix_17898503_Enable_Turbo_Mode.sh produced null output but executed successfully on ex03db03.client.com
 ex03db02.client.com Command: /opt/oracle.SupportTools/fix_17898503_Enable_Turbo_Mode.sh produced null output but executed successfully on ex03db02.client.com
 ex03db01.client.com Command: /opt/oracle.SupportTools/fix_17898503_Enable_Turbo_Mode.sh produced null output but executed successfully on ex03db01.client.com
 Copying over /root/config/client-ex03.xml to all nodes under /etc/exadata/config
 Successfully completed execution of step Apply Security Fixes [elapsed Time [Elapsed = 436720 mS [7.0 minutes] Wed Jun 10 13:54:43 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Install Autonomous Health Framework
 Copying over AHF to all nodes in the Cluster..[ex03db01, ex03db02, ex03db03]
 Configuring Autonomous Health Framework(AHF) on all computes nodes..
 AHF has been installed on all compute nodes at: /opt/oracle.ahf . EXAchk can be run by invoking ./exachk
 Generating an EXAchk report...
 EXAchk zip file in ex03db01:/u01/app/oracle.ahf/data/ex03db01/exachk/exachk_ex03db01_test_061020_13567.zip
 Generating the EXAchk Infrastructure Report...
 EXAchk zip file in ex03db01:/u01/app/oracle.ahf/data/ex03db01/exachk/exachk_ex03db01_test_061020_141143_infrastructure.zip
 Successfully completed execution of step Install Autonomous Health Framework [elapsed Time [Elapsed = 2234216 mS [37.0 minutes] Wed Jun 10 14:32:04 CDT 2020]]
[root@ex03db01 linux-x64]#

Here is the execution of step 17:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -s 17
 Initializing
 Executing Create Installation Summary
 Getting system details...
 Generating Installation Summary report: /u01/onecommand2/linux-x64/ExadataConfigurations/client-Development-InstallationReport.xml...
 Creating Installation template /u01/onecommand2/linux-x64/ExadataConfigurations/client-InstallationTemplate.html...
 Created Installation template /u01/onecommand2/linux-x64/ExadataConfigurations/client-InstallationTemplate.html
 All deployment reports are stored in /u01/onecommand2/linux-x64/ExadataConfigurations/client-AK00625423-deploymentfiles.zip
 Generating Platinum CSV file and copying it over to /opt/oracle.SupportTools on all compute nodes
 Writing platinum file  : /u01/onecommand2/linux-x64/WorkDir/client_null-platinum.csv
 Successfully completed execution of step Create Installation Summary [elapsed Time [Elapsed = 53311 mS [0.0 minutes] Wed Jun 10 14:36:07 CDT 2020]]

Just a rac-status.sh run to check how the cluster was setup (learn more about rac-status.sh here):

[root@ex03db01 ~]# ./pythian/rac-status.sh -a
 
                Cluster ex03-clu1 is a X8M-2 Elastic Rack HC 14TB
 
        Type      |      Name      |   db01   |   db02   |   db03   |
  ------------------------------------------------------------------
   asm            | asm            |  Online  |  Online  |  Online  |
   asmnetwork     | asmnet1        |  Online  |  Online  |  Online  |
   chad           | chad           |  Online  |  Online  |  Online  |
   cvu            | cvu            |  Online  |     -    |     -    |
   dg             | DATA           |  Online  |  Online  |  Online  |
   dg             | RECO           |  Online  |  Online  |  Online  |
   dg             | SPARSE         |  Online  |  Online  |  Online  |
   network        | net1           |  Online  |  Online  |  Online  |
   ons            | ons            |  Online  |  Online  |  Online  |
   proxy_advm     | proxy_advm     | Offline x| Offline x| Offline x|
   qosmserver     | qosmserver     |  Online  |     -    |     -    |
   vip            | db01           |  Online  |     -    |     -    |
   vip            | db02           |     -    |  Online  |     -    |
   vip            | db03           |     -    |     -    |  Online  |
   vip            | scan1          |     -    |  Online  |     -    |
   vip            | scan2          |     -    |     -    |  Online  |
   vip            | scan3          |  Online  |     -    |     -    |
  ------------------------------------------------------------------
    x  : Resource is disabled
       : Has been restarted less than 24 hours ago
 
      Listener    |      Port      |   db01   |   db02   |   db03   |     Type     |
  ---------------------------------------------------------------------------------
   ASMNET1LSNR_ASM| TCP:1525       |  Online  |  Online  |  Online  |   Listener   |
   LISTENER       | TCP:1521       |  Online  |  Online  |  Online  |   Listener   |
   LISTENER_SCAN1 | TCP:1864       |     -    |  Online  |     -    |     SCAN     |
   LISTENER_SCAN2 | TCP:1864       |     -    |     -    |  Online  |     SCAN     |
   LISTENER_SCAN3 | TCP:1864       |  Online  |     -    |     -    |     SCAN     |
  ---------------------------------------------------------------------------------
       : Has been restarted less than 24 hours ago
 
         DB       |     Version    |   db01   |   db02   |   db03   |    DB Type   |
  ---------------------------------------------------------------------------------
   test           | 19.0.0.0   (1) |   Open   |   Open   |   Open   |    RAC (P)   |
  ---------------------------------------------------------------------------------
  ORACLE_HOME references listed in the Version column
 
         1 : /u01/app/oracle/product/19.0.0.0/dbhome_1  oracle oinstall
 
       : Has been restarted less than 24 hours ago
 
 
[root@ex03db01 ~]# ps -ef|grep pmon
root     362094  50259  0 14:40 pts/1    00:00:00 grep --color=auto pmon
oracle   364290      1  0 13:52 ?        00:00:00 asm_pmon_+ASM1
oracle   367756      1  0 13:53 ?        00:00:00 ora_pmon_test1
[root@ex03db01 ~]#

That’s it. The deployment is finished. Now you just need to patch the compute nodes, storage servers, RoCE switches, GI, and DBs to whatever version you would like to go up to.

You might be thinking “what about step 18”. Well, step 18 “Resecure the machine” means you will harden the servers by dropping SSH keys, enhancing password complexity, expire current passwords, and implement password expiration time, etc. Sometimes those changes make the administration a bit harder and also you might want to implement your own security policies. So we normally skip this step, but again, it is up to you.

See you next time, sincerely,

Franky Faust

 


Mining Statspack and AWR Repositories for Historical Trends

$
0
0

There are various ways that we can approach the analysis of a past database performance problem. The initial steps usually differ based on their scope. Is it limited to a certain SQL/process/task, or is it perceived as a database-wide slowdown? Additionally, is it occurring at the moment, or is it an event that occurred in the past?

In case the starting scope of analysis is database-wide, mining the Automatic Workload Repository (AWR) is one of the methods we can use to obtain historical performance trends. However, not all customers have access to it, either because it requires the Diagnostics Pack license on Enterprise Edition, or because they are running Oracle Standard Edition, where it’s not present.

In such cases, we can still use Statspack as a free alternative to AWR, even though it’s not as sophisticated. One of Statspack’s shortcomings is it doesn’t store Active Session History data, which we can use to drill-down into the activity of particular sessions over time. With Statspack we’re missing this session-level granularity.

In this post, I’m going to present a script I use to get an overview of the workload dynamics of a database querying the Statspack repository. There’s also an AWR counterpart, as I’ll mention later in the post.

statspack_load_trends.sql

1. Script’s Properties

Let’s summarize the script’s main properties:

  1. It queries the Statspack repository directly and doesn’t create any (temporary) objects, nor relies on any pre-created Statspack reports.
  2. We can use it to analyze a Statspack repository imported from another DB, containing data with a different DBID as the DB hosting the repository.
  3. We can analyze a period spanning instance restart(s). The script considers only adjacent snapshot pairs having the same startup_time value. In this case, “adjacent” denotes two snapshots which belong to the same [DBID, instance number] pair, and which SNAP_IDs are closest to each other when ordered. That’s just to emphasize that the difference between two consecutive SNAP_IDs is not always one (think RAC with cached sequence values, an instance restart, or pressure on the shared pool).

2. Purpose

The script provides a quick high-level overview of the DB workload dynamics. It reports a combination of historical OS utilization statistics (stats$osstat), system time model statistics (stats$sys_time_model), and aggregated instance-level statistics (stats$sysstat) for a given period. Currently, it doesn’t query stats$system_event for wait event details. Several time-based metrics are presented in a form of Average Active Sessions (AAS), calculated by simply dividing the observed time-based metric by the elapsed time in the observed period.

You can download the script here, and its AWR counterpart is available here. Before running the AWR version, make sure the database has the Diagnostics Pack license. The following explanation applies to both scripts.

Note: I got the idea for this script from various AWR mining scripts in Chapter 5 (Sizing Exadata) from the “Oracle Exadata Recipes: A Problem-Solution Approach” book. Additionally, the idea to calculate and display CPU core-normalized values for some of the CPU usage statistics originates from John Beresniewicz’s AWR1page project.

3. Calculating CPU Capacity

The script considers the number of CPU cores, and not threads (in case hyperthreading is enabled) to calculate the number of CPU seconds between two Statspack snapshots. Various publications explain the reasons for this approach, but to summarize: Depending on the workload type, hyperthreading can provide up to approx. 30% higher CPU throughput compared to non-hyperthread mode.

When the number of processes running on CPU approach the number of CPU cores, the system might become CPU saturated/over-subscribed. At that point, its response time becomes unpredictable, and additional workload decreases the amount of useful work performed by the system.

4. Usage

The script produces a very wide output, so I recommend spooling it out to a file for easier viewing. Because Statspack tables have public synonyms, any user that has permission to select from the repository tables can run it.

Note: I’ve seen the script fail with errors like “ORA-00936: missing expression,” or “ORA-01722: invalid number” when used on databases running with cursor_sharing=FORCE. To avoid the error, I included the /*+ cursor_sharing_exact*/ hint in the script’s SELECT statement. Setting cursor_sharing=EXACT at the session-level is also a valid alternative.

SQL> spool load_trends.txt
SQL> @statspack_load_trends.sql

First, we provide the DBID and instance number we want to analyze. If we don’t provide an instance number, all of the instances for the provided DBID are considered:

Instances in this Statspack schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   DB Id   |Inst Num|DB Name     |Instance    |Host
-----------|--------|------------|------------|-------------
 1558102526|       1|ORCL        |orcl1       |ol7-122-rac1
 1558102526|       2|ORCL        |orcl2       |ol7-122-rac2


Enter DBID to analyze - default "1558102526" :   /* enter DBID to analyze */
Enter instance number or "all" to analyze all instancs for DBID = 1558102526 - default "all" : /* report data for a specific RAC instance or all of them */

Next, we provide a time range to analyze.

5. Sample Output

Let’s check what the output looks like. Due to its width, and to fit the blog format, it’s segmented in several sections. Also, due to some (temporary) blog formatting limitations, I recommend viewing wide code sections by clicking “Open code in new window.”

  • “CPU #Cores;#Threads”: The number of CPU cores and threads (in case of enabled hyperthreading) reported by the OS.
  • “Tot CPU Time Avail [Cores] (s)”: The number of CPU seconds available between the two analyzed snapshots based on the number of CPU cores.
Instance|Snap ID        |Begin Snap     |End Snap       |Elapsed|            CPU|     Tot CPU Time|
  Number|start-end      |Time           |Time           |   Mins|#Cores;#Threads|Avail [Cores] (s)|
--------|---------------|---------------|---------------|-------|---------------|-----------------|
       1|19195-19196    |16-APR-20 17:00|16-APR-20 18:00|  59.98|         24; 48|        86,376.00|
        |19196-19197    |16-APR-20 18:00|16-APR-20 19:00|  60.00|         24; 48|        86,400.00|
        |19197-19198    |16-APR-20 19:00|16-APR-20 20:00|  60.00|         24; 48|        86,400.00|
        |19198-19199    |16-APR-20 20:00|16-APR-20 21:00|  60.02|         24; 48|        86,424.00|
        |19199-19200    |16-APR-20 21:00|16-APR-20 22:00|  59.98|         24; 48|        86,376.00|

Note: One hour between snapshots is probably excessive, but that’s beyond the scope of this post.

Time Model Statistics: stats$sys_time_model

The next section reports the time spent in the database in the form of Average Active Sessions (AAS). For completeness, and to better understand what the figures represent, I’m including how various statistics from stats$sys_time_model are related:

                DB time = CPU + Wait time spent by foreground sessions
background elapsed time = CPU + Wait time spent by background sessions
                 DB CPU = CPU time spent by foreground sessions
    background cpu time = CPU time spent by background sessions

Considering the above, we can calculate the AAS figures as follows:

       Total AAS = ("DB time" + "background elapsed time")/elapsed_time
Total AAS on CPU = ("DB CPU" + "background cpu time")/elapsed_time

FG AAS = "DB time" / elapsed_time
BG AAS = "background elapsed time" / elapsed_time

FG AAS on CPU = "DB CPU" / elapsed_time
BG AAS on CPU = "background cpu time" / elapsed_time

Total AAS in wait = ("Total AAS" - "Total AAS on CPU") / elapsed_time

FG AAS in wait = ("DB time" - "DB CPU") / elapsed_time
BG AAS in wait = ("background elapsed time" - "background cpu time") / elapsed_time

Columns reporting CPU-related figures display two values: The “usual” AAS value, and the “core-normalized Average Active Sessions” value, using the acronym “NPC”. If the core-normalized value approaches (or even crosses) the value of “1,” the system could potentially be CPU saturated:

  • “AAS [FG+BG]”: The number of AAS considering foreground and background sessions.
  • “AAS [FG]”: The number of AAS considering foreground sessions only.
  • “AAS [BG]”: The number of AAS considering background sessions only.

Continuing with the CPU component of the above three mentioned metrics:

  • “AAS on CPU [FG+BG]”: The number of AAS on CPU considering foreground and background sessions, followed by core-normalized AAS on CPU (NPC).
  • “AAS on CPU [FG]”: Same as above, though only considering foreground sessions.
  • “AAS on CPU [BG]”: Same as above, though only considering background sessions.

The “AAS wait” figures follow the same principle as the above-mentioned columns, and “AAS RMAN CPU” represents AAS performing “RMAN CPU time (backup/restore)” activities.

Returning to our sample output:

      AAS|      AAS|      AAS|AAS on CPU| AAS on CPU|AAS on  |AAS on CPU|AAS on  |AAS on CPU| AAS wait| AAS wait| AAS wait|             |
  [FG+BG]|     [FG]|     [BG]|[FG+BG]   |[FG+BG] NPC|CPU [FG]|  [FG] NPC|CPU [BG]|[BG] NPC  |  [FG+BG]|     [FG]|     [BG]|AAS RMAN CPU |
---------|---------|---------|----------|-----------|--------|----------|--------|----------|---------|---------|---------|-------------|
     96.4|     94.3|      2.0|       8.5|        0.4|     8.3|       0.3|     0.2|       0.0|     87.9|     86.0|      1.9|          0.0|
     32.9|     31.6|      1.3|      10.3|        0.4|    10.1|       0.4|     0.2|       0.0|     22.5|     21.4|      1.1|          0.0|
     59.4|     58.9|      0.6|      23.3|        1.0|    23.2|       1.0|     0.1|       0.0|     36.2|     35.7|      0.5|          0.0|
     13.3|     12.9|      0.5|       5.8|        0.2|     5.7|       0.2|     0.1|       0.0|      7.5|      7.1|      0.4|          0.0|
     23.0|     22.2|      0.8|       6.0|        0.3|     5.9|       0.2|     0.1|       0.0|     17.0|     16.3|      0.7|          0.0|

The first line reports 94.3 foreground AAS, out of which only 8.3 were on CPU, and 86 were in various waits. Looking at the third line, the situation changes, as out of the 58.9 AAS, 23.3 were on CPU, and 35.7 in waits. Checking the per-CPU-core-normalized value, we see it reports 1, which means the machine might be approaching or has already crossed CPU saturation. We also see that there was no RMAN activity occurring during that time. Background processes also spent most of their time in waits, rather than on CPU.

For convenience, we have displayed the number of seconds consumed by foreground sessions, breaking them further down into CPU and wait components, and reporting the relative percentages. This is basically the same information we saw in the previous section, just expressed as time instead of AAS:

  DB Time (s)            DB CPU (s)                      |
[FG CPU+WAIT]  =           [FG CPU] +          [FG WAIT] |
---------------------------------------------------------|
    339,555.62 =      29,924.51  9% +     309,631.11 91% |
    113,683.70 =      36,469.52 32% +      77,214.18 68% |
    211,880.46 =      83,404.47 39% +     128,475.99 61% |
     46,325.13 =      20,692.78 45% +      25,632.35 55% |
     79,966.07 =      21,274.38 27% +      58,691.69 73% |

OS Statistics – stats$osstat

Next, OS statistics from stats$osstat are displayed. “Tot OS Load@end_snap” is the recorded OS load at the time of the end snapshot creation. The other four columns represent Average Active Processes (AAP), which is simply the measured time of each named statistic divided by elapsed time in the observed period. Similarly, as above, the normalized value per core is also reported here for the BUSY statistic (sum of USER+SYS). The meaning is the same; If the value approaches 1, the system might be CPU saturated.

In our sample report, the third line reports 23.9 processes on CPU, or 1 per CPU core (that’s considering all the OS processes, not only Oracle’s). That also correlates with the “AAS on CPU [FG+BG]” figure in the third line we saw in the above snippet. Because in this particular case the machine is dedicated to one Oracle instance, it used all of the available CPU:

       Tot OS|           |AAP OS  |   AAP OS|   AAP OS|   AAP OS|
Load@end_snap|AAP OS BUSY|BUSY NPC|     USER|      SYS|   IOWAIT|
-------------|-----------|--------|---------|---------|---------|
         37.0|        9.2|     0.4|      8.2|      0.9|     12.1|
         76.8|       10.8|     0.4|      9.8|      0.9|      5.3|
          9.4|       23.9|     1.0|     22.9|      0.9|      2.3|
          4.3|        6.2|     0.3|      5.7|      0.5|      1.4|
          4.8|        6.4|     0.3|      5.6|      0.7|      4.7|

System Statistics: stats$sysstat

Finally, stats$sysstat reports various system statistics. I won’t describe their meaning because that’s beyond the scope of this post. It’s worth noting that apart from “Logons Current,” almost all other statistics are expressed in units of work per second. The only exceptions are statistics related to parallel operations. Because their usage usually pertains to “heavy-duty” DDL/DML tasks, we don’t expect to see many such operations per second. Thus, the whole snapshot interval seems a more appropriate time-frame to report the number of occurrences of such events.

    Logons|        |      User|            |SQL*Net roundtrips|SQL*Net roundtrips|   Bytes received via|     Bytes sent via|   Bytes received via|
   Current|Logons/s|   calls/s|  Executes/s|  to/from client/s|  to/from dblink/s|SQL*Net from client/s|SQL*Net to client/s|SQL*Net from dblink/s|
----------|--------|----------|------------|------------------|------------------|---------------------|-------------------|---------------------|
     556.0|     9.5|     872.9|       692.2|             723.0|           4,575.6|          1,846,238.6|        6,305,967.2|          1,177,004.3|
     527.0|    16.2|   1,008.0|       639.4|             828.5|           5,773.2|          2,462,067.1|        7,760,807.5|          1,453,024.0|
     607.0|    18.5|     738.8|       588.3|             556.1|           5,618.1|          1,986,647.1|        3,644,026.9|          1,448,627.4|
     427.0|     9.2|     873.3|       910.0|             716.4|           5,972.3|          2,691,244.6|        4,067,039.1|          1,532,389.7|
     418.0|     7.4|     719.9|       627.8|             588.5|           7,471.6|          2,564,916.7|        3,773,344.1|          1,852,806.9|

   
     Bytes sent via|Cluster wait|Session logical|      DB block|    Consistent| Consistent reads |  Physical|  Physical|Physical read|Physical write|
SQL*Net to dblink/s|      time/s|        reads/s|     changes/s|   changes/sec|undo rec applied/s|   reads/s|  writes/s|IO requests/s| IO requests/s|
-------------------|------------|---------------|--------------|--------------|------------------|----------|----------|-------------|--------------|
          576,510.8|         0.0|      339,009.8|      31,353.8|       3,062.0|           4,002.4|  47,349.4|   1,879.1|      2,621.5|         448.5|
          726,935.7|         0.0|      487,469.9|      48,874.4|         487.3|             563.9|  31,277.7|   2,127.6|      5,021.6|         526.8|
          707,648.6|         0.0|      343,665.8|      38,862.0|         379.4|             362.9|  37,057.7|     777.7|      1,949.2|         265.2|
          751,698.7|         0.0|      288,724.2|      26,163.7|         618.0|             435.8|  14,001.6|     823.3|        828.5|         274.1|
          940,096.3|         0.0|      335,631.4|      24,500.0|         198.5|             211.9|  53,625.8|     638.2|      2,451.6|         227.5|
     


    Parses|      Hard|     Parse|     Parse|      User|       User|     Redo size|          Redo|      Rollback changes|
   total/s|  parses/s|describe/s|failures/s| commits/s|rollbacks/s|       bytes/s|      writes/s|undo records applied/s|
----------|----------|----------|----------|----------|-----------|--------------|--------------|----------------------|
     142.0|     103.9|       0.0|       0.0|     158.2|        0.8|  46,951,095.1|         137.8|                   0.1|
     143.4|     100.1|       0.0|       0.3|     155.0|        0.8|  49,017,168.1|         170.3|                   1.2|
     135.9|      89.2|       0.0|       0.1|     143.3|        0.8|  11,513,858.2|         149.7|                   0.1|
     141.1|     109.8|       0.0|       0.0|     284.4|        0.8|   9,513,089.1|         226.2|                   0.1|
     123.0|      93.4|       0.0|       0.0|     175.5|        0.9|   7,462,206.6|         169.3|                   0.3|
     

     
           Queries|    DML statements|     PX oper not|PX oper downgraded|PX oper downgraded|PX oper downgraded|PX oper downgraded|PX oper downgraded	   
parallelized/Total|parallelized/Total|downgraded/Total|   to serial/Total|75 to 99 pct/Total|50 to 75 pct/Total|25 to 50 pct/Total| 1 to 25 pct/Total	   
------------------|------------------|----------------|------------------|------------------|------------------|------------------|------------------	   
           1,912.0|               3.0|         1,989.0|              38.0|               0.0|               1.0|               4.0|               1.0	   
           2,450.0|               6.0|         2,551.0|              10.0|               0.0|               0.0|               0.0|               0.0	   
           2,477.0|              13.0|         2,584.0|               9.0|               0.0|               0.0|               0.0|               0.0	   
           1,553.0|               3.0|         1,646.0|               9.0|               0.0|               0.0|               0.0|               0.0	   
           1,390.0|               2.0|         1,487.0|               8.0|               0.0|               0.0|               0.0|               0.0	   

6. Visualizing Results

When comparing a period where there was an issue to one where the database was running fine, or also when just checking for trends, it’s more convenient to plot the results. That’s an easy way to get an overview of how certain metrics changed over time, or how do they compare across nodes on a RAC database.

To ease that task, the two scripts contain two ways of formatting columns: One for plotting/charting purposes, and one displaying column headings on two lines for a more user-friendly format (the format used in the above descriptions). Based on the needs, the appropriate block of column formatting commands has to be uncommented in the script. You can plot results using a third-party utility, such as Tableau, which is used to produce graphs in the following sections.

CPU Usage Distribution and Limits Across Nodes

The graph below represents the average number of OS processes on CPU, normalized to the CPU core count for each of the three nodes on a RAC system. As noted above, when the normalized value per core crosses the value of “1,” the host might be oversubscribed on CPU. Nodes 2 and 3 are usually below the value of 1. However, spikes in usage on node 1 might require further investigation. Also, there seems to be an imbalance in CPU usage across nodes:Number of OS processes on CPU, normalized to the CPU core count

Breakdown of Foreground and Background Sessions on CPU and Wait Components, Expressed as AAS

When investigating a problematic time period, we can quickly get a high-level overview of the relation between CPU and waits experienced by foreground and background sessions:

Average Active Sessions breakdown to foreground and background CPU and wait components

Observing Waits After Applying a Fix to Reduce Them

After applying “a fix” on April 16th, the time spent waiting by foreground sessions decreased substantially. CPU demand also decreased. Foreground CPU vs waits

 

Comparing the Number of AAS on CPU Normalized to CPU Core Count at the OS and DB Level on a Three-Node RAC DB

The observed DB isn’t using all/most of the CPU available on the nodes, and there’s “something else” using it at the OS level. That’s visible on the graph for node 1, between May 6th at 7-9 PM, where CPU usage at the OS level increased, but that was not the case for DB sessions. Additionally, because we have the normalized per CPU core value displayed, we can see that node 1 crosses the value of 1 quite often.Comparison of Average Active Sessions between OS and DB level

Logons per Second and Number of Logged-in Sessions per Node

RAC nodes 1 and 2 already have a relatively high number of Logons per second at ~5, whereas node 3 has this even higher at ~23. Additionally, there’s a large imbalance in the number of established sessions on node 3 compared to nodes 1 and 2. Because each node has 8 physical cores (not visible from the below graphs), the ~2500 established sessions represent a potential risk should too many of them try to become active at the same time. Overall it seems a connection pooling review is in place for this system.Session trends

A final example from a different server, but still related to connection management. Detecting a large drop in the number of established sessions and correlating with the number of logins per second:Session trends

7. Other Mining Tools

Some time ago, Maris Elsins published a post describing a set of handy scripts to mine the AWR repository. Make sure to check it out!

To conclude, here’s a list of some free mining utilities. These are mostly created for AWR, but some are for Statspack. Some of them parse a pre-created set of AWR/Statspack reports. Others connect directly to the database and extract/analyze data from there. The script presented in this post might not offer the same functionality as those utilities. However, for some of my use-cases, it complemented them by providing a customized set of pre-calculated workload related figures.

CreateGoldImage Followup – Is CreateGoldImage Required for Cloning Oracle Home?

$
0
0

On October 31, 2019, I blogged about How To Use CreateGoldImage For Cloning In 19c.

CreateGoldImage seems like a nice new feature; however, it’s also very buggy.

Here are a few bugs:

2641469.1 27432355.8 27432338.8

I received a suggestion to use tar to clone Grid Home; however, I was uncertain if it would work.

Why would Oracle create a new process for CreateGoldImage when traditional tar works?

Out of curiosity, I decided to test this by creating tarball for GRID_HOME from 19.8 RAC.

Use the tarball to upgrade Oracle Restart 12.2 to 19.8

==============================
Source: 19c Oracle RAC is located at /u01/app/19.0.0.0/grid
==============================

### Create tarball for 19.8 GRID_HOME from RAC instance:

[root@ol7-19-lax1 grid]# pwd
/u01/app/19.0.0/grid
[root@ol7-19-lax1 grid]# time tar -cvzpf /tmp/grid19.8_clone.tar.gz . > /tmp/clone_grid19.8.log ; echo $?

### Transfer tarball to target.

==============================
TARGET: 19c Oracle Restart is located at /u01/app/oracle/19.8.0.0/grid
==============================

I know what you are thinking. Why is GRID_HOME at two different locations?

The two vagrant boxes I have used to create test cases were created by two different individuals.

### Review current patch for 12.2 Oracle Restart:

[grid@db-asm-1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
27144050;Tomcat Release Update 12.2.0.1.0(ID:171023.0830) (27144050)
26839277;DBWLM RELEASE UPDATE 12.2.0.1.0(ID:170913) (26839277)
28163235;ACFS JUL 2018 RELEASE UPDATE 12.2.0.1.180717 (28163235)
28163190;OCW JUL 2018 RELEASE UPDATE 12.2.0.1.180717 (28163190)
28163133;Database Jul 2018 Release Update : 12.2.0.1.180717 (28163133)
OPatch succeeded.

### Copy tarball to /u01/app/oracle/19.8.0.0.

[grid@db-asm-1 19.8.0.0]$ time cp -fv /media/swrepo/grid19.8_clone.tar.gz . ; echo $?
‘/media/swrepo/grid19.8_clone.tar.gz’ -> ‘./grid19.8_clone.tar.gz’
[grid@db-asm-1 19.8.0.0]$ ls -l
total 7015760
-rwxr-xr-x 1 grid oinstall 7184131622 Aug 15 22:30 grid19.8_clone.tar.gz

### Unzip tarball to /u01/app/oracle/19.8.0.0

[grid@db-asm-1 19.8.0.0]$ time gunzip grid19.8_clone.tar.gz
[grid@db-asm-1 19.8.0.0]$ ls -l
total 13542216
-rwxr-xr-x 1 grid oinstall 13867223040 Aug 15 22:30 grid19.8_clone.tar

### Create directory for GRID_HOME – /u01/app/oracle/19.8.0.0/grid

[grid@db-asm-1 19.8.0.0]$ mkdir grid

### Untar to GRID_HOME – /u01/app/oracle/19.8.0.0/grid

[grid@db-asm-1 19.8.0.0]$ cd grid
[grid@db-asm-1 grid]$ time tar xf ../grid19.8_clone.tar . ; echo $?

### Upgrade 12.2 Oracle HAS to 19.8 using 19.8 tarball (GUI Method):

[grid@db-asm-1 grid]$ ./gridSetup.sh

### Verify Oracle Restart is 19.8:

[grid@db-asm-1 grid]$ $ORACLE_HOME/OPatch/opatch lspatches
31335188;TOMCAT RELEASE UPDATE 19.0.0.0.0 (31335188)
31305087;OCW RELEASE UPDATE 19.8.0.0.0 (31305087)
31304218;ACFS RELEASE UPDATE 19.8.0.0.0 (31304218)
31281355;Database Release Update : 19.8.0.0.200714 (31281355)
OPatch succeeded.
[grid@db-asm-1 grid]$

In conclusion, there is an option to use tarball if CreateGoldImage does not work; thereby avoiding creating a ticket with Oracle support.

Please note: I am unaware whether using tarball is supported — it would be best to check with Oracle support.

How To Get The Most Out Of Your Next Virtual Learning Event

$
0
0

COVID-19 pandemic has disrupted every aspect of our lives this year. But in July, in spite of all the upheaval, the All India Oracle Users Group (AIOUG) still managed to hold its annual Oracle Groundbreakers Yatra conference. By moving the event online, the organizers were able to present more than 100 sessions in 14 days, offering over 125 hours of learning, plus networking with 100+ speakers from more than two dozen countries.

COVID-19 has made virtual learning the new normal. I know not everyone is happy about this but, to be honest, I love virtual webinars. They offer benefits you just can’t get from live events, including:

  • The very best subject matter experts. Many people who deserve the title of “expert” can’t afford the time it takes to travel to a live event. But when organizers are running a virtual conference, they can invite experts from anywhere in the world, and make them available to the audience at almost any time.
  • Savings in time and money. Online conferences give me all the learning of a live event without the cost of travel, food and accommodations. Plus, attending virtually means my time commitment is measured in hours instead of days.
  • Learning without intimidation. Not everyone is comfortable asking questions in front of a large crowd. Online learning is a great equalizer, and anyone can raise a virtual hand to ask a question without feeling shy. These events have made it easy for me to connect with speakers and grow my network through LinkedIn and Twitter.

But we should also talk about the things that make virtual events less than ideal. During the OG Yatra conference, Oracle’s Jim Grisanzio listed some major drawbacks of the online experience:

  • There’s no human contact. When your speaker is present only virtually, it’s easy to get distracted and end up learning nothing at all.
  • It’s boring. Virtual presentations lack eye contact, real-time exchanges, and audience reactions that help to keep us engaged with the material.
  • It’s stressful. As countless teachers have discovered during the pandemic, the attention span for online learning is much shorter than it is for the live classroom experience.

There’s no doubt that virtual learning presents real challenges. But as someone who has grown to actually prefer online events, I’m happy to offer some practical tips to help you make the most of virtual webinars and learning:

  • Don’t be greedy. Don’t sign up for every session. You’ll end up exhausted and unable to learn much of anything. Instead, be selective. Choose only those sessions that really interest you and that are worthy of your complete focus.
  • Do your homework. Before you join a webinar, do a bit of research on the topic. The more basic knowledge you have, the more value you’ll get from the session.
  • Learn about the speakers. Review their LinkedIn profiles so that you know something about their background and career path. This way, you’ll be better able to form realistic expectations for each session.
  • Focus completely on learning. Don’t try to do your job at the same time you’re attending a webinar. You’ll learn nothing, and your work will suffer. If you’re holding a pager or you have an on-call shift, flip the pager and swap shifts with a colleague.
  • Take notes. Before you log on, make sure you have a notebook and pen at the ready. Research shows that old-school note-taking leads to better retention of learning. That’s because writing notes by hand forces our brains to process what we’re putting down on paper. Plus, writing notes by hand does a better job of triggering that brain activity than typing notes on a computer. People who take screenshots of slides typically never look at them again. When it comes to learning, screenshots are pretty much useless.

I hope this post will help you make the most of virtual webinars and learning. Sure, online events have their drawbacks, but they are undeniably the way of the future — especially in light of COVID-19. Today, virtual learning offers us a great way to acquire knowledge safely and affordably from just about anywhere. In the future, I believe technology will create solutions to some of the difficulties of the online experience. But for now, organizations like the All India Oracle Users Group deserve our thanks for creating our industry’s newest and best opportunities to learn.

Using SQLCODE Function

$
0
0

The idea of this post is to show how to handle errors using SQLCODE function. That function shows the code error for the latest exception when running a PL/SQL code.

Here is a small example of code that raises an exception when two or more consecutive ORA-02292 errors occur.

--Create a Table to capture how many times ORA-02292 errors occurred.
CREATE TABLE tcount (x int); -- table to count the number of 2292 errors

--Parent Table
create table orders (order_id int, total_sales int);
CREATE UNIQUE INDEX orders_pk on orders (order_id);
ALTER TABLE orders ADD CONSTRAINT orders_pk PRIMARY KEY (order_id) ENABLE;

--Child Table
CREATE TABLE products
( product_id numeric(10) not null,
supplier_id numeric(10) not null,
order_id int,
CONSTRAINT fk_orders
FOREIGN KEY (order_id)
REFERENCES orders(order_id)
);

--Add data in Parent and Child Tables
insert into orders values (1,1);
insert into products values (1,1,1);
commit;

--Verify the data on those Tables
select * from orders;
select * from products;

--Create the procedure to Delete data from Parent table and force the error ORA-02292
CREATE OR REPLACE PROCEDURE delete_order
(order_id_in IN NUMBER)
IS
vtest2292 int := 0;
BEGIN
  delete from orders where order_id = order_id_in;
  commit;

  -- if the delete instruction runs fine the tcount table is truncated
  EXECUTE IMMEDIATE ('truncate table tcount');

EXCEPTION
  WHEN OTHERS THEN
  DECLARE
  error_code NUMBER := SQLCODE;
  BEGIN
    IF error_code = -2292 THEN
      DBMS_OUTPUT.PUT_LINE('ERROR 2292!!!!!!!!!!!!!'); -- error found
      insert into tcount values (1);
      commit;

      select count(1) into vtest2292 from tcount;

      IF vtest2292 >= 2 then
        DBMS_OUTPUT.PUT_LINE('ERROR 2292 >x2!!!!!!!!!!!!!!'); -- two or more consecutive errors found
        raise_application_error (-20001,'Two or more ORA-2292 were occurred deleting an order.');
      END IF;
    ELSE
      raise_application_error (-20002,'An ERROR has occurred deleting an order.');
    END IF;
  END;
END;
/

--Job to run the procedure delete_order and confirm that just after the second consecutive execution the error will be written in alert log
BEGIN
  DBMS_SCHEDULER.create_job (
  job_name => 'job_delete_order',
  job_type => 'PLSQL_BLOCK',
  job_action => 'begin delete_order(1); end;',
  start_date => SYSTIMESTAMP,
  repeat_interval => 'FREQ=HOURLY;BYMINUTE=0; interval=1;',
  enabled => TRUE);
END;
/

--Running the job job_delete_order to force the error, we have to monitor the database alert log to confirm that just after two executions the error will be visible there
set serveroutput on
select count(1) from tcount;
exec dbms_scheduler.run_job('job_delete_order',FALSE);
select count(1) from tcount;

tail -f alert_<database_name>.log

The text associated with the ORA-02292 error is “integrity constraint (%s.%s) violated – child record found.” This issue can be fixed by deleting the child rows first, and afterward deleting the parent row.

Another option is to modify the foreign key constraint to delete child rows when a parent row is deleted by setting up the “Foreign Key with Delete Cascade” option, but with SQLCODE function we can capture the error and handle it as needed.

Overcoming ORA-01722 Error While Doing an Upgrade With Different DST Versions

$
0
0

This will be a short entry, but hopefully, it will help if you face the error “ORA-01722: invalid number” while doing an upgrade.

Today I was doing an Oracle RDBMS upgrade from 12.2 with DST 34, to 19.7 with DST 35. The upgrade failed with the error below:

Processing config file ...
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be processed

-------------------------------------------------
Errors in database [TEST]
Stage     [DBUPGRADE]
Operation [STOPPED]
Status    [ERROR]
Info    [
Error: UPG-1400
UPGRADE FAILED [TEST]
Cause: This indicates that the database upgrade failed with errors.
For further details, see the log file located at /u01_diag/working/pythian/upgrade/upg_logs/TEST/102/autoupgrade_20200921_user.log]

-------------------------------------------------
Logs: [/u01_diag/working/pythian/upgrade/upg_logs/TEST/102/autoupgrade_20200921_user.log]
-------------------------------------------------Job 102 completed
------------------- Final Summary --------------------
Number of databases            [ 1 ]

Jobs finished successfully     [0]
Jobs failed                    [1]
Jobs pending                   [0]
-------------------- JOBS FAILED ---------------------
Job 102 for TEST

Exiting


-- From the Upgrade Log


17:12:53 SQL> Rem Check if time zone file version used by the database exists in new home
17:12:53 SQL> SELECT TO_NUMBER('MUST_PATCH_TIMEZONE_FILE_VERSION_ON_NEW_ORACLE_HOME')
17:12:53   2     FROM sys.props$
17:12:53   3     WHERE
17:12:53   4       (
17:12:53   5        (name = 'DST_PRIMARY_TT_VERSION' AND TO_NUMBER(value$) > &C_LTZ_CONTENT_VER)
17:12:53   6        AND
17:12:53   7        (0 = (select count(*) from v$timezone_file))
17:12:53   8       );
old   5:       (name = 'DST_PRIMARY_TT_VERSION' AND TO_NUMBER(value$) > &C_LTZ_CONTENT_VER)
new   5:       (name = 'DST_PRIMARY_TT_VERSION' AND TO_NUMBER(value$) > 32)
SELECT TO_NUMBER('MUST_PATCH_TIMEZONE_FILE_VERSION_ON_NEW_ORACLE_HOME')
                 *
ERROR at line 1:
ORA-01722: invalid number

While searching for the error, the first thing that came into my mind is that the 19.7 was patched incorrectly, but that wasn’t the case. As you can see, the 19.7 was patched with a higher version of the DST patch than the 12.2 OH

([SID:TEST][oracle@hostname:/u01/app/oracle/product/12.2.0.1/dbhome_1/OPatch ] )
oracle $ ./opatch lspatches
29997937;RDBMS - DSTV34 UPDATE - TZDATA2019B
30133625;OJVM RELEASE UPDATE: 12.2.0.1.191015 (30133625)
30138470;Database Oct 2019 Release Update : 12.2.0.1.191015 (30138470)
30122814;OCW OCT 2019 RELEASE UPDATE 12.2.0.1.191015 (30122814)

OPatch succeeded.

([SID:TEST][oracle@hostname:/u01/app/oracle/product/19.7.0.0/dbhome_1/OPatch ] )
oracle $ ./opatch lspatches
30565805;RDBMS 19C  REGRESSION  ORA-01843  NOT A VALID MONTH ERROR WHEN EXECUTING TO_DATE(20191120,RRMMDD)
31335037;RDBMS - DSTV35 UPDATE - TZDATA2020A
30894985;OCW RELEASE UPDATE 19.7.0.0.0 (30894985)
30869156;Database Release Update : 19.7.0.0.200414 (30869156)

OPatch succeeded.

My first move was to search for the following files in the 19.7 OH:

  • timezlrg_34.dat
  • timezone_34.dat

They weren’t there, and because this is a cumulative patch, DST 34 files shouldn’t be missing from the 19.7 OH patched with DST 35. Accordingly, I would classify this as a bug in the DST 35 patch.

([SID:TEST][oracle@hostname:home/oracle/working/antunez ] )
oracle $ cd $ORACLE_HOME/oracore/zoneinfo

([SID:TEST][oracle@hostname:/u01/app/oracle/product/19.7.0.0/dbhome_1/oracore/zoneinfo ] )
oracle $ ls -ltr | egrep "timezlrg_34.dat|timezone_34.dat" | wc -l
0

I fixed this by copying the files from the 12.2 $ORACLE_HOME/oracore/zoneinfo to the 19.7 $ORACLE_HOME/oracore/zoneinfo

([SID:TEST][oracle@hostname:/u01/app/oracle/product/19.7.0.0/dbhome_1/oracore/zoneinfo ] )
oracle $ cp /u01/app/oracle/product/12.2.0.1/dbhome_1/oracore/zoneinfo/timezone_34.dat ./

([SID:TEST][oracle@hostname:/u01/app/oracle/product/19.7.0.0/dbhome_1/oracore/zoneinfo ] )
oracle $ cp /u01/app/oracle/product/12.2.0.1/dbhome_1/oracore/zoneinfo/timezlrg_34.dat ./

([SID:TEST][oracle@hostname:/u01/app/oracle/product/19.7.0.0/dbhome_1/oracore/zoneinfo ] )
oracle $ ls -ltr | egrep "timezlrg_34.dat|timezone_34.dat" | wc -l
2

Once I did this, I relaunched my upgrade. This time, it finished successfully

([SID:TEST][oracle@hostname:/home/oracle/working/antunez/TEST ] )
oracle $ $ORACLE_HOME19c/jdk/bin/java -jar $ORACLE_HOME19c/rdbms/admin/autoupgrade.jar -config TEST_config.cfg -mode DEPLOY -noconsole
AutoUpgrade tool launched with default options
Processing config file ...
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be processed

Job 102 completed
------------------- Final Summary --------------------
Number of databases            [ 1 ]

Jobs finished successfully     [1]
Jobs failed                    [0]
Jobs pending                   [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 102 for TEST

---- Drop GRP at your convenience once you consider it is no longer needed ----
Drop GRP from TEST: drop restore point AUTOUPGRADE_221145114461854_TEST

Hopefully, this small post will help you if you are faced with this error, as it seems the DST 35 patch is not including the necessary DST 34 files.

Note: This was originally posted on rene-ace.com.

Resolving Error ORA-28353: Failed to Open Wallet

$
0
0

Full disclosure: this is a post I’ve had in draft mode for almost one and a half years.

I noticed the original error after applying the October 2018 bundle patch (BP) for 11.2.0.4. While I realize most clients are no longer in 11.2.0.4, this information remains valid for anyone upgrading from 11.2 to 12, 18 or 19c.

I had been doing several tests on my Spanish RAC (Real Application Cluster) Attack for 12.2. The goal was to patch my client to October 2018 PSU; obtaining enough security leverage to avoid patching their database and do their DB (database) upgrade to 18c. I created RAC VMs to enable testing. I also set up my environment to match the client’s, which had TDE with FIPS 140 enabled (I will provide more details on this later in the post).

While the patching was successful, the problem arose after applying the patch. I was unable to open the database despite having the correct password for the encryption key.

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-28365: wallet is not open

SQL> alter system set encryption key identified by “xxx”;
alter system set encryption key identified by “xxxx”
*
ERROR at line 1:
ORA-28353: failed to open wallet

SQL> select * from v$encryption_wallet;

INST_ID WRL_TYPE
---------- --------------------
WRL_PARAMETER
------------------------
STATUS
------------------
1 file
/etc/oracle/wallets/$ORACLE_UNQNAME
CLOSED

SQL> ALTER SYSTEM SET ENCRYPTION WALLET OPEN IDENTIFIED BY *****;
ALTER SYSTEM SET ENCRYPTION WALLET OPEN IDENTIFIED BY *****
*
ERROR at line 1:
ORA-28353: failed to open wallet

When I tried to open the database, this is what appeared in the alert.log:

Tue Feb 16 11:21:56 2018
QMNC started with pid=81, OS id=86184
kcbztek_get_tbskey: decrypting encrypted key for tablespace 16 without opening the wallet

I did a rollback of the patch, and as soon as I rolled back the patch, the database opened:

SQL> select name,open_mode ,database_role from v$database;

NAME OPEN_MODE DATABASE_ROLE
--------- -------------------- ----------------
TEST READ WRITE PRIMARY

After many days of looking for information to address the error, I noticed that FIPS 140-2 was enabled. FIPS (Federal Information Processing Standard), 140-2, is a US government standard defining cryptographic module security requirements.

You can see it’s enabled for SSL in the following file:

[oracle@collabn1 /u01/app/oracle/product/11.2.0.4/dbhome_1/ldap/admin ]$ cat fips.ora
#
# file to be reside in $ORACLE_HOME/ldap/admin
#
# sec-220
#
sslfips_140=true

I was able to find a document called After Applying October 2018 CPU/PSU, Auto-Login Wallet Stops Working For TDE With FIPS Mode Enabled (Doc ID 2474806.1).

This helped me discover the solution is to patch the DB with October 2018 PSU and, after patching the binaries, recreate the auto login file cwallet.sso with a compatibility of version 12. After you have done this, you will be able to open your DB normally.

[oracle@collabn1 ~]$ cd /etc/oracle/wallets/test
[oracle@collabn1 /etc/oracle/wallets/test]$ mv cwallet.sso cwallet.sso.original

[oracle@collabn1 /etc/oracle/wallets/test]$ orapki wallet create -wallet /etc/oracle/wallets/test -auto_login -compat_v12
Oracle PKI Tool : Version 11.2.0.4.0 - Production
Copyright (c) 2004, 2013, Oracle and/or its affiliates. All rights reserved.
Enter wallet password: ******

[oracle@collabn1 ~]$ srvctl status database -d test
Instance test1 is running on node collabn1
Instance test2 is running on node collabn2

SQL> select name,open_mode ,database_role from v$database;

NAME OPEN_MODE DATABASE_ROLE
--------- -------------------- ----------------
TEST READ WRITE PRIMARY

SQL> select * from v$encryption_wallet;

INST_ID WRL_TYPE
---------- --------------------
WRL_PARAMETER
------------------------
STATUS
------------------
1 file
/etc/oracle/wallets/$ORACLE_UNQNAME
OPEN

IMPORTANT: DO NOT recreate the ewallet.p12 file! This will likely cause data loss, as you will lose the master key required to decrypt your encrypted data.

Below is an example of what you DO NOT WANT TO DO:

[oracle@collabn1 ~]$ cd /etc/oracle/wallets/test
[oracle@collabn1 /etc/oracle/wallets/test]$ mv ewallet.p12 ewallet.p12.old

[oracle@collabn1 /etc/oracle/wallets/test]$ sqlplus / as sysdba
SQL> ADMINISTER KEY MANAGEMENT SET KEY IDENTIFIED BY Welcome1 WITH BACKUP;

keystore altered

It’s important to note that the above also applies to Jan 2019 Database BP, or to any upgrade from 11.2.0.4 to 12, 18 or 19c. This means you will face this issue for anything after October 2018 if you are using TDE and SSL with FIPS.

Note: This was originally posted in rene-ace.com.

How to Enable RAM Cache in Oracle Exadata

$
0
0

A brief background on RAM cache or in-memory OLTP acceleration

In this post I will present a feature — to simplify let’s call it RAM cache — introduced in Oracle Exadata Storage Server image 18.1. You’ve probably heard a lot about the new Oracle Exadata X8M and its Intel Optane DC Persistent Memory (PMem). This new feature + architecture allows the database processes running on the database servers to remotely read and write through a protocol called Remote Direct Memory Access (RDMA) from / to the PMem cards in the storage servers. You can find the detailed Oracle Exadata X8M deployment process here.

It’s true that RDMA has existed in the Oracle Exadata architecture from the beginning, as Oracle points out in their blog post titled, Introducing Exadata X8M: In-Memory Performance with All the Benefits of Shared Storage for both OLTP and Analytics:

RDMA was introduced to Exadata with InfiniBand and is a foundational part of Exadata’s high-performance architecture.

What you may not know, is there’s a feature called in-memory OLTP acceleration (or simply RAM cache) which was introduced in the Oracle Exadata Storage Server image 18.1.0.0.0 when Oracle Exadata X7 was released. This feature allows read access to the storage server RAM on any Oracle Exadata system (X6 or higher) running that version or above. Although, this is not the same as PMem, since RAM is not persistent, it is still very cool since it allows you to take advantage of the RAM available in the storage servers.

In-memory OLTP acceleration (or simply RAM cache).

Modern generations of Exadata storage servers come with a lot of RAM available. By comparison, X8 and X7 come with 192GB of RAM by default, as opposed to the 128GB of RAM that came with X6.

Unfortunately, the RAM cache feature is only available on storage servers X6 or higher and these are the requirements:

  • Oracle Exadata System Software 18c (18.1.0).
  • Oracle Exadata Storage Server X6, X7 or X8.
  • Oracle Database version 12.2.0.1 April 2018 DBRU, or 18.1 or higher.

That large amount of RAM is rarely fully utilized by the Oracle Exadata storage servers. This RAM cache feature allows you to use all or part of the available RAM in the storage servers. Doing this extends your database buffer cache to the storage server’s RAM for read operations.

In the new Oracle Exadata X8M the I/O latency is under 19µs for read operations. This is due to the PMem cache combined with the RoCE (RDMA over converged ethernet) network. In the Oracle Exadata X7/X8 the I/O latency for reads with RAM cache using RDMA over InfiniBand is around 100µs. Without RAM cache the number goes up to 250µs reading directly from the flash cache. The following information is from the Oracle Exadata Database Machine X8-2 data sheet:

For OLTP workloads Exadata uniquely implements In-Memory OLTP Acceleration. This feature utilizes the memory installed in Exadata Storage Servers as an extension of the memory cache (buffer cache) on database servers. Specialized algorithms transfer data between the cache on database servers and in-memory cache on storage servers. This reduces the IO latency to 100 us for all IOs served from in-memory cache. Exadata’s (sic) uniquely keeps only one in-memory copy of data across database and storage servers, avoiding memory wastage from caching the same block multiple times. This greatly improves both efficiency and capacity and is only possible because of Exadata’s unique end-to-end integration.

How I set up RAM cache in the Exadata storage servers

As I mentioned previously, the recent generation of Oracle Exadata storage servers come with a lot of RAM. This RAM is normally not used at its fullest by the cellsrv services and features. Having said that, I normally take into consideration the amount of free memory (RAM) in the storage servers. First, I pick the storage server using the most RAM and do the math: freemem*0.7=RAM cache value. Next, I set the RAM cache to 70 percent of the free memory of the storage server using more RAM than the others. Note: I avoid using all the free memory for the RAM cache in case the storage server requires more memory for storage indexes or other needs in the future.

Let’s say my busiest storage server has 73GB of free memory. Applying the formula we get to: 73*0.7=51.1GB.

Oracle Exadata architecture was built to spread the workload evenly across the entire storage grid, so you’ll notice that the storage servers use pretty much the same amount of memory (RAM).

Here comes the action and fun. We must first check how much memory is available in our storage servers by running this from dcli (make sure your cell_group file is up-to-date):

[root@exadbadm01 ~]# dcli -l root -g cell_group free -g

In my case the cel01 is the storage server using more memory than others. Let’s check some details of this storage server:

[root@exaceladm01 ~]# cellcli
CellCLI: Release 19.2.7.0.0 - Production on Thu Aug 06 07:44:59 CDT 2020
Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved.
CellCLI> LIST CELL DETAIL
name: exaceladm01
accessLevelPerm: remoteLoginEnabled
bbuStatus: normal
cellVersion: OSS_19.2.7.0.0_LINUX.X64_191012
cpuCount: 24/24
diagHistoryDays: 7
fanCount: 8/8
fanStatus: normal
flashCacheMode: WriteBack
flashCacheCompress: FALSE
httpsAccess: ALL
id: 1446NM508U
interconnectCount: 2
interconnect1: bondib0
iormBoost: 0.0
ipaddress1: 192.168.10.13/22
kernelVersion: 4.1.12-124.30.1.el7uek.x86_64
locatorLEDStatus: off
makeModel: Oracle Corporation SUN SERVER X7-2L High Capacity
memoryGB: 94
metricHistoryDays: 7
notificationMethod: mail,snmp
notificationPolicy: critical,warning,clear
offloadGroupEvents:
powerCount: 2/2
powerStatus: normal
ramCacheMaxSize: 0
ramCacheMode: Auto
ramCacheSize: 0
releaseImageStatus: success
releaseVersion: 19.2.7.0.0.191012
rpmVersion: cell-19.2.7.0.0_LINUX.X64_191012-1.x86_64
releaseTrackingBug: 30393131
rollbackVersion: 19.2.2.0.0.190513.2
smtpFrom: "exadb Exadata"
smtpFromAddr: exaadmin@loredata.com.br
smtpPort: 25
smtpServer: mail.loredata.com.br
smtpToAddr: support@loredata.com.br
smtpUseSSL: FALSE
snmpSubscriber: host=10.200.55.182,port=162,community=public,type=asr,asrmPort=16161
status: online
temperatureReading: 23.0
temperatureStatus: normal
upTime: 264 days, 8:48
usbStatus: normal
cellsrvStatus: running
msStatus: running
rsStatus: running

From the output above we can see that the parameter ramCacheMode is set to auto while ramCacheMaxSize and ramCacheSize are 0. These are the default values and mean the RAM cache feature is not enabled.

This storage server has ~73GB of free / available memory (RAM):

[root@exaceladm01 ~]# free -m
total used free shared buff/cache available
Mem: 96177 15521 72027 4796 8628 75326
Swap: 2047 0 2047

Now we can enable the RAM cache feature by changing the parameter ramCacheMode to “On”:

CellCLI> ALTER CELL ramCacheMode=on
Cell exaceladm01 successfully altered

Immediately after the change we check the free / available memory (RAM) in the storage server operation system:

[root@exaceladm01 ~]# free -m
total used free shared buff/cache available
Mem: 96177 15525 72059 4796 8592 75322
Swap: 2047 0 2047

Not much has changed, because the memory remains available for the storage server to use for RAM cache. However, when we enable the RAM cache feature, the storage server will not automatically allocate / use this memory.

We can see that only 10GB was defined in the ramCacheMaxSize and ramCacheSize parameters:

CellCLI> LIST CELL DETAIL
name: exaceladm01
accessLevelPerm: remoteLoginEnabled
bbuStatus: normal
cellVersion: OSS_19.2.7.0.0_LINUX.X64_191012
cpuCount: 24/24
diagHistoryDays: 7
fanCount: 8/8
fanStatus: normal
flashCacheMode: WriteBack
flashCacheCompress: FALSE
httpsAccess: ALL
id: 1446NM508U
interconnectCount: 2
interconnect1: bondib0
iormBoost: 0.0
ipaddress1: 192.168.10.13/22
kernelVersion: 4.1.12-124.30.1.el7uek.x86_64
locatorLEDStatus: off
makeModel: Oracle Corporation SUN SERVER X7-2L High Capacity
memoryGB: 94
metricHistoryDays: 7
notificationMethod: mail,snmp
notificationPolicy: critical,warning,clear
offloadGroupEvents:
powerCount: 2/2
powerStatus: normal
ramCacheMaxSize: 10.1015625G
ramCacheMode: On
ramCacheSize: 10.09375G
releaseImageStatus: success
releaseVersion: 19.2.7.0.0.191012
rpmVersion: cell-19.2.7.0.0_LINUX.X64_191012-1.x86_64
releaseTrackingBug: 30393131
rollbackVersion: 19.2.2.0.0.190513.2
smtpFrom: "exadb Exadata"
smtpFromAddr: exaadmin@loredata.com.br
smtpPort: 25
smtpServer: mail.loredata.com.br
smtpToAddr: support@loredata.com.br
smtpUseSSL: FALSE
snmpSubscriber: host=10.200.55.182,port=162,community=public,type=asr,asrmPort=16161
status: online
temperatureReading: 23.0
temperatureStatus: normal
upTime: 264 days, 8:49
usbStatus: normal
cellsrvStatus: running
msStatus: running
rsStatus: running

To confirm we can run the following query from cellcli:

CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
10.1015625G On 10.09375G

To reduce the memory used by the RAM cache feature we can simply change the ramCacheMaxSize parameter:

CellCLI> ALTER CELL ramCacheMaxSize=5G;
Cell exaceladm01 successfully altered

If we check the values of the RAM cache parameters we will see this:

CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
5G On 0

As soon as the database blocks start being copied to the RAM cache we will see the ramCacheSize value increasing:

CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
5G On 3.9250G

Increasing a bit more:

CellCLI> ALTER CELL ramCacheMaxSize=15G;
Cell exaceladm01 successfully altered

When checking, you’ll notice it takes a while for the cellsrv to populate the RAM cache with blocks copied from the flash cache:

CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
15G On 0
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
15G On 11.8125G
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
15G On 15G

Re-setting to auto makes everything clear again:

CellCLI> ALTER CELL ramCacheMode=Auto
Cell exaceladm01 successfully altered
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
0 Auto 0

Now we adjust to the value we got from our calculation of 70 percent of the free memory:

CellCLI> ALTER CELL ramCacheMode=On
Cell exaceladm01 successfully altered
CellCLI> ALTER CELL ramCacheMaxSize=51G
Cell exaceladm01 successfully altered
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
51G On 32.8125G
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
51G On 35.2500G
CellCLI> LIST CELL ATTRIBUTES ramCacheMaxSize,ramCacheMode, ramCacheSize
51G On 51G

With that configuration in place, if we want to be notified if the storage server is running out memory we can quickly create a threshold based on the cell memory utilization (CL_MEMUT) metric to notify us when the memory utilization goes beyond 95 percent:

CellCLI> CREATE THRESHOLD CL_MEMUT.interactive comparison=">", critical=95

Conclusion

To sum up, RAM cache (aka, in-memory OLTP acceleration) is a feature available only on Oracle Exadata Database Machine X6 or higher with at least the 18.1 image. In addition, it’s available for the Oracle Database 12.2.0.1 with April 2018 DBRU or higher. This feature helps extend the database buffer cache to the free RAM in the storage servers, but only for read operations, since RAM is not persistent. For persistent memory, Oracle introduced the Persistent Memory Cache with Oracle Exadata Database Machine X8M.

It’s worth mentioning that a database will only leverage RAM cache when there is pressure on the database buffer cache. The data blocks present in the RAM cache are persistently stored in the storage server’s flash cache. When a server process on the database side requests a block that is no longer stored in the database buffer cache, but is in the RAM cache, the cellsrv will send this block from the RAM cache to the buffer cache for the server process to read it. It is faster to read from the RAM cache instead of reading it from the flash cache or disk.

While the in-memory OLTP acceleration feature is not a magic solution, it is a plus for our Exadata system. Since we almost always see free memory in the storage server, this is a way of optimizing the resources we’ve already paid for. This feature is already in the Exadata licenses, so there is no extra cost option, and it is not related to the database in-memory option. Having Exadata is all you need.

Happy caching! See you next time!

Franky

References:


OPatchAuto Bug on 19.3 GI/RDBMS When Patching to 19.6

$
0
0

The other day I was doing a fresh install of Oracle 19.3. I was using the binaries from edelivery, doing some testing in that version, then patching to 19.6. The installation and the tests went fine. Next, I proceeded to analyze patch 30501910 and saw that everything was OK. 

## Note.- I trimmed the log for readability
[root@node2 ~]$ . oraenv
ORACLE_SID = [root] ? DB193H12
The Oracle base has been set to /u01/app/oracle
[root@node2 ~]$ export PATH=$ORACLE_HOME/OPatch:$PATH
[root@node2 ~]$ opatchauto apply /u01/patches/30501910 -analyze
OPatchauto session is initiated at Tue Oct  6 13:23:11 2020
...
Host:node2
CRS Home:/u01/app/19.3.0.0/grid
Version:19.0.0.0.0

==Following patches were SUCCESSFULLY analyzed to be applied:

Patch: /u01/patches/30501910/30489227
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_11-41-25AM_1.log

Patch: /u01/patches/30501910/30489632
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_11-41-25AM_1.log

Patch: /u01/patches/30501910/30655595
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_11-41-25AM_1.log

Patch: /u01/patches/30501910/30557433
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_11-41-25AM_1.log

OPatchauto session completed at Tue Oct  6 11:42:46 2020

Time taken to complete the session 2 minutes, 41 seconds


I proceeded with the patch with the command below:

opatchauto apply /u01/patches/30501910

Everything went well in node 1, with the patching completed successfully in GI and RDBMS, but the patch on node 2 failed on the RDBMS with the error below:

## Note.- I trimmed the log for readability
[root@node2 ~]$ opatchauto apply /u01/patches/30501910
...
---------------------------Patching Failed---------------------------------
Command execution failed during patching in home: /u01/app/oracle/product/19.3.0.0/dbhome_1, host: node2.
Command failed:  /u01/app/oracle/product/19.3.0.0/dbhome_1/OPatch/opatchauto  apply /u01/patches/30501910 -oh /u01/app/oracle/product/19.3.0.0/dbhome_1 -target_type rac_database -binary -invPtrLoc /u01/app/oracle/product/19.3.0.0/dbhome_1/oraInst.loc -jre /u01/app/oracle/product/19.3.0.0/dbhome_1/OPatch/jre -persistresult /u01/app/oracle/product/19.3.0.0/dbhome_1/opatchautocfg/db/sessioninfo/sessionresult_node2_rac_2.ser -analyzedresult /u01/app/oracle/product/19.3.0.0/dbhome_1/opatchautocfg/db/sessioninfo/sessionresult_analyze_node2_rac_2.ser
Command failure output: 
==Following patches FAILED in apply:

Patch: /u01/patches/30501910/30489227
Log: /u01/app/oracle/product/19.3.0.0/dbhome_1/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-05_17-25-22PM_1.log
Reason: Failed during Patching: oracle.opatch.opatchsdk.OPatchException: ApplySession failed in system modification phase... 'ApplySession::apply failed: java.io.IOException: oracle.sysman.oui.patch.PatchException: java.io.FileNotFoundException: /u01/app/oraInventory/ContentsXML/oui-patch.xml (Permission denied)' 

After fixing the cause of failure Run opatchauto resume
...

After looking for information, I found MOS (My Oracle Support) note 2582139.1, which directed me to copy the oui-patch.xml from node 1 to node 2. 

[grid@node1 ~]$ cd /u01/app/oraInventory/ContentsXML/
[grid@node1 ContentsXML]$ scp oui-patch.xml node2:/u01/app/oraInventory/ContentsXML/
oui-patch.xml   
 
[root@node2 ContentsXML]# ls -ltr
total 20
-rw-rw----. 1 grid   oinstall 565 Oct  6 11:13 inventory.xml
-rw-r--r--. 1 oracle oinstall 174 Oct  6 12:43 oui-patch.xml.back
-rw-rw----. 1 grid   oinstall 300 Oct  6 12:43 comps.xml
-rw-rw----. 1 grid   oinstall 292 Oct  6 12:43 libs.xml
-rw-rw----. 1 grid   oinstall 174 Oct  6 14:02 oui-patch.xml                                                                        

After doing this, I reran the OPatchAuto. This time I faced the error “Unable to create patchObject.”

## Note.- I trimmed the log for readability
[root@node2 ~]$ opatchauto apply /u01/patches/30501910
...
---------------------------Patching Failed---------------------------------
Command execution failed during patching in home: /u01/app/oracle/product/19.3.0.0/dbhome_1, host: node2.
Command failed:  /u01/app/oracle/product/19.3.0.0/dbhome_1/OPatch/opatchauto  apply /u01/patches/30501910 -oh /u01/app/oracle/product/19.3.0.0/dbhome_1 -target_type rac_database -binary -invPtrLoc /u01/app/oracle/product/19.3.0.0/dbhome_1/oraInst.loc -jre /u01/app/oracle/product/19.3.0.0/dbhome_1/OPatch/jre -persistresult /u01/app/oracle/product/19.3.0.0/dbhome_1/opatchautocfg/db/sessioninfo/sessionresult_node2_rac_2.ser -analyzedresult /u01/app/oracle/product/19.3.0.0/dbhome_1/opatchautocfg/db/sessioninfo/sessionresult_analyze_node2_rac_2.ser
Command failure output: 
==Following patches FAILED in apply:

Patch: /u01/patches/30501910/30489227
Log: 
Reason: Failed during Patching: oracle.opatch.opatchsdk.OPatchException: Unable to create patchObject
Possible causes are:
   ORACLE_HOME/inventory/oneoffs/30489227 is corrupted. PatchObject constructor: Input file "/u01/app/oracle/product/19.3.0.0/dbhome_1/inventory/oneoffs/30489227/etc/config/actions" or "/u01/app/oracle/product/19.3.0.0/dbhome_1/inventory/oneoffs/30489227/etc/config/inventory" does not exist. 

After fixing the cause of failure Run opatchauto resume
...

To address this error, I found another note — 2286025.1 — which advises taking any of the following actions:

  1. OPatch lsinventory
  2. OPatch – analyze
  3. Restore database home backup for Hostname2.

After trying options 1 and 2, I kept getting the same error, so I proceeded to patch the GI on node 2.

## Note.- I trimmed the log for readability
[root@node2 ~]$ . oraenv
ORACLE_SID = [root] ? +ASM2
The Oracle base has been set to /u01/app/grid
[root@node2 ~]$ export PATH=$ORACLE_HOME/OPatch:$PATH
[root@node2 ~]$ opatchauto apply /u01/patches/30501910 -oh $ORACLE_HOME
OPatchauto session is initiated at Tue Oct  6 12:28:25 2020

...

--------------------------------Summary--------------------------------

Patching is completed successfully. Please find the summary as follows:

Host:node2
CRS Home:/u01/app/19.3.0.0/grid
Version:19.0.0.0.0
Summary:

==Following patches were SUCCESSFULLY applied:

Patch: /u01/patches/30501910/30489227
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_12-33-02PM_1.log

Patch: /u01/patches/30501910/30489632
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_12-33-02PM_1.log

Patch: /u01/patches/30501910/30557433
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_12-33-02PM_1.log

Patch: /u01/patches/30501910/30655595
Log: /u01/app/19.3.0.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2020-10-06_12-33-02PM_1.log

OPatchauto session completed at Tue Oct  6 12:49:02 2020
Time taken to complete the session 20 minutes, 38 seconds

For the RDBMS I followed my own blog post (How to Clone an Oracle Home in 19c) and cloned the Oracle home from node 1, as this one was already patched correctly with 30501910.

[oracle@node1 ~]$ export ORACLE_HOME=/u01/app/oracle/product/19.3.0.0/dbhome_1
[oracle@node1 ~]$ $ORACLE_HOME/runInstaller -silent  -createGoldImage -destinationLocation /u01/patches
Launching Oracle Database Setup Wizard...

Successfully Setup Software.
Gold Image location: /u01/patches/db_home_2020-10-06_02-11-59PM.zip

[oracle@node1 ~]$ unset ORACLE_HOME
[oracle@node1 ~]$ unset ORACLE_SID
[oracle@node1 ~]$ unset ORACLE_RSID
[oracle@node1 ~]$ unset ORACLE_UNQNAME
[oracle@node1 ~]$ unset ORACLE_BASE

[oracle@node1 ~]$ export ORACLE_HOME=/u01/app/oracle/product/19.3.0.0/dbhome_2
[oracle@node1 ~]$ export ORACLE_HOSTNAME=node1
[oracle@node1 ~]$ export ORA_INVENTORY=/u01/app/oraInventory
[oracle@node1 ~]$ export NODE1_HOSTNAME=node1
[oracle@node1 ~]$ export NODE2_HOSTNAME=node2
[oracle@node1 ~]$ export ORACLE_BASE=/u01/app/oracle

[oracle@node1 ~]$ unzip -q /u01/patches/db_home_2020-10-06_02-11-59PM.zip -d $ORACLE_HOME
 
[oracle@node1 ~]$ env | egrep "ORA|NODE"
ORACLE_BASE=/u01/app/oracle
NODE2_HOSTNAME=node2
ORACLE_HOSTNAME=node1
NODE1_HOSTNAME=node1
ORA_INVENTORY=/u01/app/oraInventory
ORACLE_HOME=/u01/app/oracle/product/19.3.0.0/dbhome_2

[oracle@node1 ~]$ cd $ORACLE_HOME
[oracle@node1 dbhome_2]$ ${ORACLE_HOME}/runInstaller -ignorePrereq -waitforcompletion -silent \
> -responseFile ${ORACLE_HOME}/install/response/db_install.rsp \
> oracle.install.option=INSTALL_DB_SWONLY \
> ORACLE_HOSTNAME=${ORACLE_HOSTNAME} \
> UNIX_GROUP_NAME=oinstall \
> INVENTORY_LOCATION=${ORA_INVENTORY} \
> SELECTED_LANGUAGES=en \
> ORACLE_HOME=${ORACLE_HOME} \
> ORACLE_BASE=${ORACLE_BASE} \
> oracle.install.db.InstallEdition=EE \
> oracle.install.db.OSDBA_GROUP=dba \
> oracle.install.db.OSOPER_GROUP=dba \
> oracle.install.db.OSBACKUPDBA_GROUP=dba \
> oracle.install.db.OSDGDBA_GROUP=dba \
> oracle.install.db.OSKMDBA_GROUP=dba \
> oracle.install.db.OSRACDBA_GROUP=dba \
> oracle.install.db.CLUSTER_NODES=${NODE1_HOSTNAME},${NODE2_HOSTNAME} \
> oracle.install.db.isRACOneInstall=false \
> oracle.install.db.rac.serverpoolCardinality=0 \
> oracle.install.db.config.starterdb.type=GENERAL_PURPOSE \
> oracle.install.db.ConfigureAsContainerDB=false \
> SECURITY_UPDATES_VIA_MYORACLESUPPORT=false \
> DECLINE_SECURITY_UPDATES=true
Launching Oracle Database Setup Wizard...
...
The response file for this session can be found at:
 /u01/app/oracle/product/19.3.0.0/dbhome_2/install/response/db_2020-10-06_03-06-10PM.rsp

You can find the log of this install session at:
 /u01/app/oraInventory/logs/InstallActions2020-10-06_03-06-10PM/installActions2020-10-06_03-06-10PM.log

As a root user, execute the following script(s):
	1. /u01/app/oracle/product/19.3.0.0/dbhome_2/root.sh

Execute /u01/app/oracle/product/19.3.0.0/dbhome_2/root.sh on the following nodes: 
[node1, node2]

After doing this, the only task remaining was to verify that the Oracle home was patched correctly, and switch the database to the new Oracle home.

[oracle@node1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
30557433;Database Release Update : 19.6.0.0.200114 (30557433)
30489227;OCW RELEASE UPDATE 19.6.0.0.0 (30489227)

OPatch succeeded.

[oracle@node2 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
30557433;Database Release Update : 19.6.0.0.200114 (30557433)
30489227;OCW RELEASE UPDATE 19.6.0.0.0 (30489227)

OPatch succeeded.

[oracle@node1 ~]$ srvctl modify database -db test -oraclehome /u01/app/oracle/product/19.3.0.0/dbhome_2
[oracle@node1 ~]$ srvctl config database -db test -a
Database unique name: test
Database name: test
Oracle home: /u01/app/oracle/product/19.3.0.0/dbhome_2
Oracle user: oracle
Spfile: +DATA/test/PARAMETERFILE/spfile.270.1053084685
Password file: +DATA/test/PASSWORD/pwdtest.258.1053083711
Domain: 
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: 
Disk Groups: RECO,DATA
Mount point paths: 
Services: 
Type: RAC
Start concurrency: 
Stop concurrency: 
OSDBA group: dba
OSOPER group: oper
Database instances: test1,test2
Configured nodes: node1,node2
CSS critical: no
CPU count: 0
Memory target: 0
Maximum memory: 0
Default network number for database services: 
Database is administrator managed

I plan to re-try this exercise by copying the oui-patch.xml before patching and see if that fixes this error. In the meantime, I hope this blog post helps you in case you face the above-mentioned bugs.

Reducing Contention on Hot Cursor Objects (Cursor: Pin S)

$
0
0

First, let me offer a little explanation about the wait event “cursor: pin S.”

Oracle states: “A session waits on this event when it wants to update a shared mutex pin and another session is currently in the process of updating a shared mutex pin for the same cursor object.” In other words, two or more sessions are trying to concurrently run the same statement (the same cursor in library cache), which forces them to compete to update a shared mutex pin for the same cursor object.

This wait event provides very useful information to identify why sessions are competing to update a shared mutex pin:

Information identifying why sessions are competing to update a shared mutex pin - cursor: pin S.

Here’s how a mutex works:

If a session wants to use a cursor, it must not disappear from the library cache while in use. The session uses a mutex to ensure the cursor cannot be changed or deleted so, to this end, it logs that there is an interested session by incrementing the mutex usage by one. This is called taking a shared lock.

The process for taking a shared lock:

  1. A session wants to run a cursor and so checks the owning cursor pin mutex to see if there is a session waiting to change the mutex (e.g. performing a hard-parse). It does this by checking the high-order bits to see if they are zero or have a session ID.
  2. If the high-order bits are zero, then it locks and increments by one (this is an atomic action). Waiting to lock and increment causes the “cursor: pin S” wait event. This increment is done on the low-order bits of the mutex.
  3. If the lock and increment fails, then some other session must be updating the mutex, so it’s necessary to sleep and try again, i.e. lock and increment. The “cursor: pin S” wait event will be longer. This can cause extra CPU load on the server as it spins attempting to update the mutex.
  4. If the high-order bits are not zero then there is a session waiting to change the mutex. The current interested session waits on the event “cursor: pin S wait on X.” If this is the case then it sleeps and tries again.
  5. Once the cursor is closed and finished, the shared lock on the mutex must be released by performing a lock and decrementing by one. Once again, if there is a failure to lock and decrement the next step is to sleep and try again.

If a session wants to perform a hard parse on a cursor already existing in the library cache it must acquire the mutex in exclusive mode.

The process for taking an exclusive lock:

  1. A session wants to perform a hard parse on a statement so it checks the cursor pin mutex to see if it’s in use.
  2. It checks the high-order bits and, if zero, updates the high-order bits to the current session ID (this compare-and-swap routine is a CPU atomic action).
  3. If the high-order bits are already set, the process has to wait on the event “cursor: pin X.” The session then sleeps and tries again.
  4. Once the high-order bits are set to the current session ID, it checks the low-order bits to see if the cursor is currently in use.
  5. If the low-order bits are not zero, it must wait for the counter to decrement to zero (Note: the counter cannot be incremented once the high-order bits are set to the session ID).
  6. Once the low-order bits are set to zero then the hard parse can proceed.
  7. The session removes the exclusive mutex lock by resetting the high-order bits to zero.

Real Production Environment

Oracle states that the wait event should be very rare, since updating the shared mutex pin is extremely fast.

This is true in most cases, but we recently ran into a scenario where a vast number of sessions  were waiting on “cursor: pin S” while running different statements (v$session showed different sql_id). As Oracle states, the process to take a shared lock was very fast, but it still generated a big impact on performance. As the load from the application grew (in this case load grew exponentially, as this was an online retailer with increased traffic due to COVID-19), this impact became more and more evident.

In this scenario, we found a number of statements that shared the same characteristic; they all invoked the same function which, among other tasks, ran a simple select statement. As they were all trying to run the select statement in the function, they were all competing for access to the shared mutex pin for that specific select statement.

The best approach to reduce the contention on the shared mutex for the statement is to mark it as hot. This allows Oracle to split the executions of this statement among several “SQL-copies.” Once the statement is marked as hot, Oracle will stop running the original statement and will, instead, split the executions among the different “SQL-copies” of the cursor object. Since each “SQL-copy” has its own shared mutex, load on the single mutex will be split among the copies generated by this mechanism.

Even though it doesn’t eliminate the waits on “cursor: pin S,” this technique proved to be very effective by dramatically reducing waits and improving performance on a real productive environment with around 9000 executions of the same SQL statement per second.

Below is a step-by-step guide to create a scenario very similar to the issue we encountered, along with the solution implemented to overcome its impact on performance. I also cover some important topics to keep in mind when you reboot the database, and when dealing with RAC environments.

Creating a test environment to reproduce waits on “cursor: pin S”

In order to create the test environment, I created a dummy table, a function, two shell scripts and four SQL scripts.

1. Create a dummy table (code_table) and populate it with some simple values using a for loop.

SQL> create table code_table (code_name char(1), low_value number, high_value number);

Table created.

SQL> declare
  2    letters char(26) := 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
  3    v_num number := 1;
  4  begin
  5    for i in 1..26 LOOP
  6      insert into code_table values (substr(letters,i,1), i*v_num, i*(v_num+1000));
  7      v_num := v_num + 1001;
  8    end loop;
  9    commit;
 10  end; 
 11  /

PL/SQL procedure successfully completed.

2. Create a function (fx_num) that contains the select statement that will force the sessions to wait on “cursor: pin S’.

SQL> create or replace function fx_num (v_name varchar) return number is
  2    v_low number;
  3    v_high number;
  4  begin
  5    select low_value, high_value into v_low, v_high from code_table where code_name=v_name;
  6    return(DBMS_RANDOM.value(low => v_low, high => v_high));
  7 end;
  8 /

Function created.

3. Create a shell script (launch_test.sh) that will spam multiple concurrent sessions to mimic high concurrency on the object cursor.

::::::::::::::::
launch_test.sh
::::::::::::::::

#!/bin/bash

export ORACLE_SID=proddb2
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/db
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
export PATH=$ORACLE_HOME/bin:$PATH

for i in {1..50}
do
nohup sqlplus -s user/pass @launch_sql1.sql >> launch_sql1.log 2>&1 &
nohup sqlplus -s user/pass @launch_sql2.sql >> launch_sql2.log 2>&1 &
nohup sqlplus -s user/pass @launch_sql3.sql >> launch_sql3.log 2>&1 &
nohup sqlplus -s user/pass @launch_sql4.sql >> launch_sql4.log 2>&1 &
done

exit 0

4. Create four SQL scripts (launch_sql[1-4].sql), containing a simple query that calls the function created previously. Each of these scripts will be executed 50 times; thus simulating 200 concurrent sessions trying to call the function (fx_num).

::::::::::::::
sts_tst1.sql
::::::::::::::

select f_random(substr(to_char(sysdate,'MON'),1,1)) from dual;
exit
::::::::::::::
sts_tst2.sql
::::::::::::::

select object_name from user_objects where object_id < fx_num(substr(object_id,1,1));
exit
::::::::::::::
sts_tst3.sql
::::::::::::::

select name, count(*) from user_source where line < fx_num(substr(name,1,1)) group by name;
exit
::::::::::::::
sts_tst4.sql
::::::::::::::

select trunc(last_analyzed), sample_size, count(*) from user_tab_cols where column_id < fx_num(substr(column_name,1,1)) group by trunc(last_analyzed), sample_size;
exit

5. Finally, create a shell script (check_waits.sh) to list the sessions waiting on “cursor: pin S.” As stated before, the column P1 of gv$session view shows the hash value of the statement protected by the shared mutex. Showing this value will help identify the root cause of the waits on “cursor: pin S.”

:::::::::::::::::
check_waits.sql
:::::::::::::::::

set lines 200 pages 200
prompt - Waits on cursor pin S:
select inst_id, substr(event,1,30) event, p1, sql_id, count(*)
from gv$session
where event = 'cursor: pin S'
group by inst_id, substr(event,1,30), p1, sql_id;
exit

Running the test scenario

First, let’s generate the waits on “cursor: pin S”:

1. In order to check for session waiting on the wait event “cursor: pin S,” I run the SQL script check_waits.sql on an endless while loop as follows:

[oracle@oradb02 ddml]$ while true; do sqlplus -s '/ as sysdba' @check_waits.sql; sleep 1; done

- Waits on cursor pin S:

no rows selected

- Waits on cursor pin S:

no rows selected

2. On a second terminal, I execute the shell script launch_test.sh to generate 200 sessions by executing the four SQL scripts 50 times each. This step forces the database to receive 200 executions of different select statements that invoke the function called fx_num.

[oracle@oradb02 ddml]$ ./launch_test.sh

3. On the first terminal, we can now see a lot of sessions waiting on “cursor: pin S.” As you can see in the following output, instance two registers up to 157 sessions waiting on this event at one point in time.

[oracle@oradb02 ddml]$ while true; do sqlplus -s '/ as sysdba' @check_waits; sleep 1; done

- Waits on cursor pin S:

no rows selected

- Waits on cursor pin S:

no rows selected

- Waits on cursor pin S:

INST_ID EVENT         P1        SQL_ID        COUNT(*)
------- ------------- --------- ------------- -------- 
      2 cursor: pin S 356306711 2ndpau148y2by      14
      2 cursor: pin S 356306711 7tr4jwnamtmsr      55
      2 cursor: pin S 356306711                    23
      2 cursor: pin S 356306711 a3xkbsayc47kq      13

- Waits on cursor pin S:

INST_ID EVENT         P1        SQL_ID        COUNT(*) 
------- ------------- --------- ------------- -------- 
      2 cursor: pin S 356306711 5n3qfbb42gfdr        1
      2 cursor: pin S 356306711 2ndpau148y2by       11
      2 cursor: pin S 356306711 7tr4jwnamtmsr       84
      2 cursor: pin S 356306711                     42
      2 cursor: pin S 356306711 a3xkbsayc47kq       19

- Waits on cursor pin S:

INST_ID EVENT         P1        SQL_ID        COUNT(*)
------- ------------- --------- ------------- --------
      2 cursor: pin S 356306711 2ndpau148y2by        1
      2 cursor: pin S 356306711 7tr4jwnamtmsr        4
      2 cursor: pin S 356306711 a3xkbsayc47kq        6

4. The view gv$sql shows the sessions waiting on “cursor: pin S” when executing the statements in the SQL scripts (launch_sqlX.sql). You can see both the actual problematic SQL statement (SQL_ID 7tr4jwnamtmsr) and its calling statements waiting for the event “cursor: pin S.”

SQL> select sql_id, hash_value, sum(executions) executions, sql_text from gv$sql
  2  where hash_value=356306711
  3     or sql_id in ('2ndpau148y2by','7tr4jwnamtmsr','a3xkbsayc47kq','5n3qfbb42gfdr')
  4  group by sql_id, hash_value, sql_text order by 1;

SQL_ID        HASH_VALUE EXECUTIONS SQL_TEXT
------------- ---------- ---------- -----------------------------------------------------------------------------------------------------
2ndpau148y2by 1217333630        800 select job_name, count(*) from job_masters where job_id > fx_num(substr(owner,1,1)) group by job_name
5n3qfbb42gfdr 3358046647        800 select fx_num(substr(to_char(sysdate,'MON'),1,1)) from dual
7tr4jwnamtmsr 356306711     3854606 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
a3xkbsayc47kq 3166838358        800 select trunc(timestamp), completion_status, count(*) from job_masters where job_id <  
                                    fx_num(substr(job_name,1,1)) group by trunc(timestamp), completion_status

5. If you take a look into column P1, you’ll notice even though the sessions are running different statements (column SQL_ID) they are still trying to access the same shared mutex. All the sessions point to the same hash_value (356306711): the select statement in the function fx_num.

6. In addition, we can see the number of executions for the select statements in the function (sql_id 7tr4jwnamtmsr) is much higher than those from the select statements in the SQL scripts (launch_sqlX.sql). This makes total sense since the statements in the SQL scripts call the function fx_num for each row. This extremely high number of concurrent executions of the function is the root cause of the high waits on “cursor: pin S.” These sessions are struggling to access the shared mutex that protects the select statement in the function.

Implementing the recommended fix

The best approach to fix this performance issue is to mark the select statement in the function as hot. As stated earlier, this tells Oracle to split the executions on this statement among a number of “SQL-copies.” This will also split the load among different cursor objects protected by different shared mutexes.

1. The first step is to set the hidden parameter _kgl_hot_object_copies.

This parameter governs the number of “SQL-copies” that Oracle will create for each statement marked as hot. The recommendation is to set this parameter to half the amount of CPUs in the server. Keep in mind, you will need to reboot for this change to take effect, so plan accordingly.

SQL> alter system set '_kgl_hot_object_copies'=8 scope=spfile sid='*';

System altered

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup
ORACLE instance started.
(...)
Database mounted.
Database opened.
SQL> select ksppinm, ksppstvl from x$ksppi a, x$ksppsv b, v$instance
  2  where a.indx=b.indx AND substr(ksppinm,1,1) = '_' AND ksppinm = '_kgl_hot_object_copies';

KSPPINM                    KSPPSTVL
-------------------------- ----------
_kgl_hot_object_copies     8

2. Once you set the parameter _kgl_hot_object_copies and restart the database, you have to set the SQL suffering the waits on “cursor: pin S” to hot. Oracle provides the procedure dbms_shared_pool.markhot for this purpose.

3. You need to confirm that the cursor you’re planning to mark as hot is already in the library cache. You can use the following query to confirm this, and also to get the full_hash_value which you must provide to set the statement as hot.

SQL> select kglnahsh, kglnahsv from v$sql, x$kglob where kglhdadr=address and sql_id = '7tr4jwnamtmsr';

KGLNAHSH   KGLNAHSV
---------- --------------------------------
356306711  eb4cdceda1c495cd7cdc91e5153ccf17

4. The following statement marks the SQL as hot.

SQL> begin
  2    dbms_shared_pool.markhot(hash => 'eb4cdceda1c495cd7cdc91e5153ccf17', namespace => 0, global => true);
  3  end;
  4  /

PL/SQL procedure successfully completed.

5. You can query the view gv$db_object_cache to check on which instance the statement has been marked as hot by checking the PROPERTY column.

SQL> select inst_id, hash_value, namespace, child_latch, property, status, sum(executions) executions, name
  2  from gv$db_object_cache where hash_value = 356306711
  3  group by inst_id, hash_value, name, namespace, child_latch, property, status order by hash_value;

INST_ID HASH_VALUE NAMESPACE CHILD_LATCH PROPERTY STATUS EXECUTIONS NAME
------- ---------- --------- ----------- -------- ------ ---------- ----------------------------------------------------------------
      2 356306711  SQL AREA            0 HOT      VALID     3657733 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA        53015 HOT      VALID     3576660 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1

6. Keep in mind the procedure will only mark the statement as hot if it finds it in the library cache. Because of this, you can see that gv$db_object_cache shows the statement marked as hot only in instance two even though I executed the procedure dbms_shared_pool.markhot with the global argument set to true.

Running the test scenario one more time

1. With the statement set to hot, let’s try the same test one more time. Again, I run the check_waits.sql script on a while loop.

[oracle@oradb02 ddml]$ while true; do sqlplus -s '/ as sysdba' @chk; sleep 1; done

- Waits on cursor pin S:

no rows selected

- Waits on cursor pin S:

no rows selected

2. Same as before, let’s launch the test on a separate terminal.

[oracle@oradb02 ddml]$ ./launch_test.sh

3. Instance two now shows just a few sessions waiting on “cursor: pin S.” The total number of concurrent waits dropped to just eight sessions.

[oracle@oradb02 ddml]$ while true; do sqlplus -s '/ as sysdba' @chk; sleep 1; done

- Waits on cursor pin S:

no rows selected

- Waits on cursor pin S:

INST_ID EVENT         P1         SQL_ID        COUNT(*)
------- ------------- ---------- ------------- --------
      2 cursor: pin S 1835995903 ddttvdtqqy4rz        1
      2 cursor: pin S 939937426  2ndpau148y2by        2
      2 cursor: pin S 4244290431 2ndpau148y2by        1
      2 cursor: pin S 939937426                       1
      2 cursor: pin S 939937426 dg9n6z0w0cmnk         2
      2 cursor: pin S 4244290431 61m38g7ygpfvz        1

6 rows selected.

- Waits on cursor pin S: 

no rows selected

- Waits on cursor pin S: 

no rows selected

4. While the view gv$sql shows an increase on the number of executions of the select statements in the SQL scripts (launch_sqlX.sql), the number of executions for the SQL in the function fx_num (sqlid 7tr4jwnamtmsr) remains unchanged (still 3854606).

SQL> select sql_id, hash_value, sum(executions) executions, sql_text from gv$sql
  2  where hash_value=356306711
  3     or sql_id in ('2ndpau148y2by','7tr4jwnamtmsr','a3xkbsayc47kq','5n3qfbb42gfdr')
  4  group by sql_id, hash_value, sql_text order by 1;

SQL_ID        HASH_VALUE EXECUTIONS SQL_TEXT 
------------- ---------- ---------- ------------------------------------------------------------------------------------------------------
2ndpau148y2by 1217333630       1600 select job_name, count(*) from job_masters where job_id > fx_num(substr(owner,1,1)) group by job_name
5n3qfbb42gfdr 3358046647       1600 select fx_num(substr(to_char(sysdate,'MON'),1,1)) from dual
7tr4jwnamtmsr 356306711     3854606 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
a3xkbsayc47kq 3166838358       1600 select trunc(timestamp), completion_status, count(*) from job_masters where job_id < 
                                    fx_num(substr(job_name, 1,1)) group by trunc(timestamp), completion_status

5. If we filter by the SQL signature from the statement in the function fx_num, you can see there are now a group of new statements with a high number of executions. It seems these statements were executed as part of the second test. As a side note, the SQL signature for all these sql_id match because they all share the same SQL text.

SQL> select sql_id, hash_value, sum(executions) executions, sql_text from gv$sql
  2  where force_matching_signature = (select force_matching_signature from gv$sql where hash_value=356306711)
  3  group by sql_id, hash_value, sql_text order by 1;

SQL_ID        HASH_VALUE EXECUTIONS SQL_TEXT
------------- ---------- ---------- ------------------------------------------------------------------------------------------------------
0u9hxt3azayv5 3589634917     125688 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
1far8p5csfrmx 1502043773     142500 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
5bacfy8mwthrj 667730673       90651 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
61m38g7ygpfvz 4244290431     140775 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
7tr4jwnamtmsr 356306711     3854606 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
9pyrc45h3tgg9 1614593513     148272 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
arg5wx08suqm0 294476384      117839 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
ddttvdtqqy4rz 1835995903     104122 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
dg9n6z0w0cmnk 939937426      102875 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1

9 rows selected.

6. The view gv$db_object_cache shows these statements have been identified as HOTCOPY of the statement in the function fx_num. We have exactly eight hot copies, as defined by the hidden parameter _kgl_hot_object_copies.

SQL> select inst_id, hash_value, namespace, child_latch, property, status, sum(executions) executions, name
  2  from gv$db_object_cache where hash_value in (select distinct hash_value from gv$sql where force_matching_signature = 5054107626512424923 )
  3  group by inst_id, hash_value, name, namespace, child_latch, property, status order by hash_value;

INST_ID HASH_VALUE NAMESPACE CHILD_LATCH PROPERTY STATUS EXECUTIONS NAME
------- ---------- --------- ----------- -------- ------ ---------- ----------------------------------------------------------------------
      2 294476384  SQL AREA            0 HOTCOPY4 VALID      117082 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA        88672 HOTCOPY4 VALID      117159 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA            0 HOT      VALID     3657733 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA        53015 HOT      VALID     3576660 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA            0 HOTCOPY2 VALID       90188 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA        49905 HOTCOPY2 VALID       90218 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA            0 HOTCOPY8 VALID      102067 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA        20114 HOTCOPY8 VALID      101647 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA            0 HOTCOPY5 VALID      141347 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA        89725 HOTCOPY5 VALID      141440 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA            0 HOTCOPY6 VALID      147205 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA        48617 HOTCOPY6 VALID      147329 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA            0 HOTCOPY3 VALID      103453 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA        70399 HOTCOPY3 VALID      103484 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA            0 HOTCOPY1 VALID      124746 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA        97125 HOTCOPY1 VALID      125095 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA            0 HOTCOPY7 VALID      139691 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA        47999 HOTCOPY7 VALID      139803 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1

18 rows selected.

7. If we keep running this same test over and over again, we will confirm that the number of executions of the original sql_id (marked as hot) remains unchanged. At the same time the executions on the “SQL-copies” continue to grow. In other words, the “SQL-copies” now perform all executions of this statement.

[oracle@oradb02 ddml]$ ./launch_test.sh

SQL> select inst_id, hash_value, namespace, child_latch, property hot_flag, status, sum(executions) executions, name
  2  from gv$db_object_cache 
  3  where hash_value in (select distinct hash_value from gv$sql where force_matching_signature = 5054107626512424923 )
  4  group by inst_id, hash_value, name, namespace, child_latch, property, status order by hash_value;

INST_ID HASH_VALUE NAMESPACE CHILD_LATCH HOT_FLAG STATUS EXECUTIONS NAME 
------- ---------- --------- ----------- -------- ------ ---------- ---------------------------------------------------------------------- 
      2 294476384  SQL AREA             0 HOTCOPY4 VALID 240474     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA         88672 HOTCOPY4 VALID 240582     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA             0 HOT 0    VALID 3657733    SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA         53015 HOT 0    VALID 3576660    SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA             0 HOTCOPY2 VALID 206588     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA         49905 HOTCOPY2 VALID 206662     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA             0 HOTCOPY8 VALID 228203     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA         20114 HOTCOPY8 VALID 228118     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA             0 HOTCOPY5 VALID 275321     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA         89725 HOTCOPY5 VALID 275482     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA             0 HOTCOPY6 VALID 280358     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA         48617 HOTCOPY6 VALID 279606     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA             0 HOTCOPY3 VALID 201829     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA         70399 HOTCOPY3 VALID 201773     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA             0 HOTCOPY1 VALID 244295     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA         97125 HOTCOPY1 VALID 244809     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA             0 HOTCOPY7 VALID 253317     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA         47999 HOTCOPY7 VALID 253521     SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1

18 rows selected.

Keeping statements as hot on a RAC database

1. As mentioned earlier, statements cannot be marked as hot if they are not first loaded into the library cache. For this reason, the statement in our example was not set as hot in instance one.

2. So now let’s run the load test on instance one to see what happens.

[oracle@oradb01 ddml]$ ./launch_test.sh

3. Below is an example of what you’d see if the statement is used on both instances but only marked as hot on one of the instances (inst_id=2).

SQL> select inst_id, hash_value, namespace, child_latch, property hot_flag, status, sum(executions) executions, name
  2  from gv$db_object_cache 
  3  where hash_value in (select distinct hash_value from gv$sql where force_matching_signature = 5054107626512424923 )
  4  group by inst_id, hash_value, name, namespace, child_latch, property, status order by inst_id, hash_value;

INST_ID HASH_VALUE NAMESPACE CHILD_LATCH HOT_FLAG STATUS EXECUTIONS NAME 
------- ---------- --------- ----------- -------- ------ ---------- ----------------------------------------------------------------------
      1 356306711  SQL AREA            0          VALID      899301 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 356306711  SQL AREA        53015          VALID      832325 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA            0 HOTCOPY4 VALID      240474 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA        88672 HOTCOPY4 VALID      240582 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA            0 HOT      VALID     3657733 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA        53015 HOT      VALID     3576660 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA            0 HOTCOPY2 VALID      206588 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA        49905 HOTCOPY2 VALID      206662 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA            0 HOTCOPY8 VALID      228203 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA        20114 HOTCOPY8 VALID      228118 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA            0 HOTCOPY5 VALID      275321 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA        89725 HOTCOPY5 VALID      275482 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA            0 HOTCOPY6 VALID      280358 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA        48617 HOTCOPY6 VALID      279606 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA            0 HOTCOPY3 VALID      201829 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA        70399 HOTCOPY3 VALID      201773 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA            0 HOTCOPY1 VALID      244295 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA        97125 HOTCOPY1 VALID      244809 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA            0 HOTCOPY7 VALID      253317 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA        47999 HOTCOPY7 VALID      253521 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1

20 rows selected.

4. After running the procedure dbms_shared_pool.markhot one more time, we can now see that the fix is correctly working in both instances of the RAC.

SQL> select inst_id, hash_value, namespace, child_latch, property hot_flag, status, sum(executions) executions, name
  2  from gv$db_object_cache 
  3  where hash_value in (select distinct hash_value from gv$sql where force_matching_signature = 5054107626512424923 )
  4  group by inst_id, hash_value, name, namespace, child_latch, property, status order by inst_id, hash_value;

INST_ID HASH_VALUE NAMESPACE CHILD_LATCH HOT_FLAG STATUS EXECUTIONS NAME 
------- ---------- --------- ----------- -------- ------ ---------- ----------------------------------------------------------------------
      1 294476384  SQL AREA            0 HOTCOPY4 VALID      235617 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 294476384  SQL AREA        88672 HOTCOPY4 VALID      235190 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 356306711  SQL AREA            0 HOT      VALID      899301 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 356306711  SQL AREA        53015 HOT      VALID      832325 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 667730673  SQL AREA            0 HOTCOPY2 VALID      243111 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 667730673  SQL AREA        49905 HOTCOPY2 VALID      243114 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 939937426  SQL AREA            0 HOTCOPY8 VALID      272199 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 939937426  SQL AREA        20114 HOTCOPY8 VALID      272379 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1502043773 SQL AREA            0 HOTCOPY5 VALID      223078 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1502043773 SQL AREA        89725 HOTCOPY5 VALID      223230 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1614593513 SQL AREA            0 HOTCOPY6 VALID      254701 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1614593513 SQL AREA        48617 HOTCOPY6 VALID      254807 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1835995903 SQL AREA            0 HOTCOPY3 VALID      238938 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1835995903 SQL AREA        70399 HOTCOPY3 VALID      238361 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 3589634917 SQL AREA            0 HOTCOPY1 VALID      231836 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 3589634917 SQL AREA        97125 HOTCOPY1 VALID      230782 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 4244290431 SQL AREA            0 HOTCOPY7 VALID      225743 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 4244290431 SQL AREA        47999 HOTCOPY7 VALID      225810 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA            0 HOTCOPY4 VALID      129780 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA        88672 HOTCOPY4 VALID      129861 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA            0 HOT      VALID     4565713 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA        53015 HOT      VALID     4469646 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA            0 HOTCOPY2 VALID      120900 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA        49905 HOTCOPY2 VALID      121150 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA            0 HOTCOPY8 VALID      120642 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA        20114 HOTCOPY8 VALID      120730 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA            0 HOTCOPY5 VALID      117835 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA        89725 HOTCOPY5 VALID      117268 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA            0 HOTCOPY6 VALID      140124 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA        48617 HOTCOPY6 VALID      140374 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA            0 HOTCOPY3 VALID      118871 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA        70399 HOTCOPY3 VALID      118520 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA            0 HOTCOPY1 VALID       98116 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA        97125 HOTCOPY1 VALID       98091 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA            0 HOTCOPY7 VALID      119193 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA        47999 HOTCOPY7 VALID      119279 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1

36 rows selected.

Restarting the database

1. It is extremely important to note that the statement’s hot status will not survive a database / instance reboot. This means you need to mark it as hot again each time you restart it.

2. Here are the steps to follow to make sure this setup remains available after an instance / database reboot:

a. Make sure you still have the hidden parameter _kgl_hot_object_copies set to number of CPUs.

b. Place the cursor on the library cache in each instance. This means you need to run the statement with the exact same code on each instance, to ensure it uses the right full_hash_value.

c. Mark the instance as hot using the procedure dbms_shared_pool.markhot. As a side note, you can set the global argument to true if you want to mark the statement as hot on all instances but, as mentioned before, the statement will only be marked as hot on those instances were the statement is present on the library cache.

Unmarking hot statements

1. Finally, you can unmark the hot statement using the procedure dbms_shared_pool.unmarkhot.

SQL> begin
  2    dbms_shared_pool.unmarkhot(hash => 'eb4cdceda1c495cd7cdc91e5153ccf17', namespace => 0, global => true);
  3  end;
  4  /

PL/SQL procedure successfully completed.

2. You can confirm that you’ve unmarked the statement by querying the view gv$db_object_cache.

SQL> select inst_id, hash_value, namespace, child_latch, property hot_flag, status, sum(executions) executions, name
  2  from gv$db_object_cache where hash_value in (select distinct hash_value from gv$sql where force_matching_signature = 5054107626512424923 )
  3  group by inst_id, hash_value, name, namespace, child_latch, property, status order by inst_id, hash_value;

INST_ID HASH_VALUE NAMESPACE CHILD_LATCH HOT_FLAG STATUS EXECUTIONS NAME 
------- ---------- --------- ----------- -------- ------ ---------- ---------------------------------------------------------------------- 
      1 294476384  SQL AREA            0 OBSCOPY4 VALID      235617 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 294476384  SQL AREA        88672 OBSCOPY4 VALID      235190 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 356306711  SQL AREA            0          VALID      899301 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 356306711  SQL AREA        53015          VALID      832325 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 667730673  SQL AREA            0 OBSCOPY2 VALID      243111 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 667730673  SQL AREA        49905 OBSCOPY2 VALID      243114 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 939937426  SQL AREA            0 OBSCOPY8 VALID      272199 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 939937426  SQL AREA        20114 OBSCOPY8 VALID      272379 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1502043773 SQL AREA            0 OBSCOPY5 VALID      223078 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1502043773 SQL AREA        89725 OBSCOPY5 VALID      223230 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1614593513 SQL AREA            0 OBSCOPY6 VALID      254701 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1614593513 SQL AREA        48617 OBSCOPY6 VALID      254807 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1835995903 SQL AREA            0 OBSCOPY3 VALID      238938 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 1835995903 SQL AREA        70399 OBSCOPY3 VALID      238361 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 3589634917 SQL AREA            0 OBSCOPY1 VALID      231836 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 3589634917 SQL AREA        97125 OBSCOPY1 VALID      230782 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 4244290431 SQL AREA            0 OBSCOPY7 VALID      225743 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      1 4244290431 SQL AREA        47999 OBSCOPY7 VALID      225810 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA            0 OBSCOPY4 VALID      385506 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 294476384  SQL AREA        88672 OBSCOPY4 VALID      385414 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA            0          VALID     4565713 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 356306711  SQL AREA        53015          VALID     4469646 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA            0 OBSCOPY2 VALID      359549 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 667730673  SQL AREA        49905 OBSCOPY2 VALID      359951 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA            0 OBSCOPY8 VALID      376019 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 939937426  SQL AREA        20114 OBSCOPY8 VALID      376026 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA            0 OBSCOPY5 VALID      420406 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1502043773 SQL AREA        89725 OBSCOPY5 VALID      420031 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA            0 OBSCOPY6 VALID      455769 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1614593513 SQL AREA        48617 OBSCOPY6 VALID      455326 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA            0 OBSCOPY3 VALID      340070 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 1835995903 SQL AREA        70399 OBSCOPY3 VALID      339658 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA            0 OBSCOPY1 VALID      364612 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 3589634917 SQL AREA        97125 OBSCOPY1 VALID      365127 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA            0 OBSCOPY7 VALID      395629 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1
      2 4244290431 SQL AREA        47999 OBSCOPY7 VALID      395981 SELECT LOW_VALUE, HIGH_VALUE FROM CODE_TABLE WHERE CODE_NAME=:B1

36 rows selected.

3. You will notice that all hot copies will be marked as obsolete, meaning they are no longer valid since the original statement is no longer hot.

I hope this post gives you an idea of how to work around the bottlenecks that can be caused by “cursor: pin s.” Please let me know if you have any questions, or tips of your own.

Issues With Database Service Names Starting With Pluggable Database (PDB) Name

$
0
0

In this post I’ll describe a problem which occurs when creating and using a database service name which starts with the pluggable database’s (PDB) name, and has a dot (“.”) after it: “<pdb_name>.<optional_suffix>.” I noticed this issue when I was trying to reset an expired user’s password by connecting to that service.

I’ll also describe other anomalies, for example, being able to stop the service, but not start it, as well as the service remaining active even if the PDB gets dropped. I also acknowledge there may be other side effects in addition to those I describe here. The workaround is relatively simple — don’t create services with the format “<pdb_name>.<optional_suffix>.” Or, said differently, don’t start the service name with “<pdb_name>.”

It’s worth noting that the problem reproduces on 18c and 19c, but not on 12.1 and 12.2.

As of now, the development team has created a predefect — “Bug 32009641: ORA-01017 WHEN CONNECTING TO A PDB USING A SERVICE NAME WHICH STARTS WITH THE PDB” — though at the time of writing this post, it’s not published yet, as the team has to confirm whether it is a bug, and whether it can be published.

While it’s unlikely that many users will run into the issue described in this post, it’s good to be aware of it, and to avoid creating service names starting with “<pdb_name>.” until this issue is fixed. Mine is only a simple use case, and there could be other side effects not discussed here.

Setting up the test environment to reproduce the problem

I performed the steps below on a 19.8.0 DB; no ASM or GI managed resources are present. I start with a new PDB named PDB1, and open it:

SQL> create pluggable database pdb1 admin user pdb_adm identified by oracle create_file_dest='/u02/oradata/CDB1';

Pluggable database created.

The default CDB services are started, and PDB1’s default service “pdb1,” associated with CON_ID=3. This identifies the initial current container for the service:

CDB$ROOT:

SQL> select con_id, name from v$services order by con_id, name;

    CON_ID NAME
---------- ---------------
         1 CDB2
         1 CDB2XDB
         1 SYS$BACKGROUND
         1 SYS$USERS
         3 pdb1

SQL> select con_id, name from cdb_services order by con_id, name;

    CON_ID NAME
---------- -----------------
         1 CDB2
         1 CDB2.localdomain
         1 CDB2XDB
         1 SYS$BACKGROUND
         1 SYS$USERS
         3 PDB1

6 rows selected.

PDB1:

SQL> select con_id, name from v$services order by con_id, name;

    CON_ID NAME
---------- -----
         3 pdb1

SQL> select con_id, name from cdb_services order by con_id, name;

    CON_ID NAME
---------- -----
         3 PDB1

Next, I create and start services in PDB1. I chose service names to include various combinations of names using “pdb1” and “pdb1.”:

SQL> alter session set container=pdb1;

exec dbms_service.create_service(service_name=>'pdb1.test1',network_name=>'pdb1.test1');
exec dbms_service.create_service(service_name=>'test2.pdb1',network_name=>'test2.pdb1');
exec dbms_service.create_service(service_name=>'test3.pdb1.app',network_name=>'test3.pdb1.app');
exec dbms_service.create_service(service_name=>'pdb1_A',network_name=>'pdb1_A');
exec dbms_service.create_service(service_name=>'pdb1_B.app',network_name=>'pdb1_B.app');
exec dbms_service.create_service(service_name=>'custom_service.app',network_name=>'custom_service.app');

exec dbms_service.start_service(service_name=>'pdb1.test1');
exec dbms_service.start_service(service_name=>'test2.pdb1');
exec dbms_service.start_service(service_name=>'test3.pdb1.app');
exec dbms_service.start_service(service_name=>'pdb1_A');
exec dbms_service.start_service(service_name=>'pdb1_B.app');
exec dbms_service.start_service(service_name=>'custom_service.app');

I’ll be using the below script to display the services’ status:

SQL> !cat s.sql
set lin 500 pagesize 9999
col name for a50

show con_name
prompt
prompt ** Output from v$active_services: **
select con_id, name from v$active_services where lower(name) like '%pdb%' or lower(name) like '%app%' order by con_id, name;

prompt ** Output from cdb_services: **
select con_id, name from cdb_services where lower(name) like '%pdb%' or lower(name) like '%app%' order by con_id, name;

Let’s check the status of the above-created services:

CDB$ROOT:

** Output from v$active_services: **

    CON_ID NAME
---------- ------------------
         1 pdb1.test1        <<--
         3 custom_service.app
         3 pdb1
         3 pdb1_A
         3 pdb1_B.app
         3 test2.pdb1
         3 test3.pdb1.app

7 rows selected.

** Output from cdb_services: **

    CON_ID NAME
---------- ------------------
         3 PDB1
         3 custom_service.app
         3 pdb1.test1        <<--
         3 pdb1_A
         3 pdb1_B.app
         3 test2.pdb1
         3 test3.pdb1.app

PDB1:

** Output from v$active_services: **

    CON_ID NAME
---------- ------------------
         3 custom_service.app
         3 pdb1
         3 pdb1_A
         3 pdb1_B.app
         3 test2.pdb1
         3 test3.pdb1.app

6 rows selected.

** Output from cdb_services: **

    CON_ID NAME
---------- ------------------
         3 PDB1
         3 custom_service.app
         3 pdb1.test1
         3 pdb1_A
         3 pdb1_B.app
         3 test2.pdb1
         3 test3.pdb1.app

7 rows selected.

As you can see, the “pdb1.test1” service is apparently associated to the root container (v$active_services.con_id=1), and not with PDB1 as configured (cdb_services.con_id=3). All the other created services are being reported as running in PDB1 (v$active_services.con_id=3).

Why is this a problem?

Let’s create two local users in PDB1 — u1 and u2 — and set u1’s password to expired:

SQL> grant create session to u1 identified by oracle;
SQL> grant create session to u2 identified by oracle;
SQL> alter user u1 password expire;

If I try to connect as u2, everything works without problems:

$ sqlplus u2/oracle@localhost/pdb1.test1

SQL> select sys_context('userenv','cdb_name') cdb_name, sys_context('userenv','con_name') con_name, sys_context('userenv','service_name') service_name from dual;

CDB_NAME             CON_NAME             SERVICE_NAME
-------------------- -------------------- --------------------
CDB2                 PDB1                 pdb1.test1

However, if I try to connect to u1 and change its password, this is what happens:

$ sqlplus u1/oracle@localhost/pdb1.test1

SQL*Plus: Release 19.0.0.0.0 - Production on Mon Oct 26 11:57:56 2020
Version 19.8.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

ERROR:
ORA-28001: the password has expired


Changing password for u1
New password:
Retype new password:
ERROR:
ORA-01017: invalid username/password; logon denied


Password unchanged
Enter user-name:

Analysis

Tracing the session with oradebug before entering the new password (with events 10046 and 10079), the session gets established via the pdb1.test1 service, and it’s traced under “CDB$ROOT(1)”:

*** 2020-10-26T12:01:31.706897+01:00 (CDB$ROOT(1))                    <<-- traced at "CDB$ROOT(1)" level
*** SESSION ID:(269.54910) 2020-10-26T12:01:31.706934+01:00
*** SERVICE NAME:(pdb1.test1) 2020-10-26T12:01:31.706937+01:00        <<-- service pdb1.test1 was used when connecting
*** MODULE NAME:(sqlplus@hol.localdomain (TNS V1-V3)) 2020-10-26T12:01:31.706940+01:00
*** ACTION NAME:() 2020-10-26T12:01:31.706943+01:00
*** CONTAINER ID:(1) 2020-10-26T12:01:31.706945+01:00

Not going into the details of the trace, we might get a clue at what’s happening by checking the audit log.

From PDB1:

SQL> select * from (
select ntimestamp#, userid, userhost, terminal, action#, returncode, current_user, comment$text
from sys.aud$
where userid = 'U1'
  and returncode <> 0
order by ntimestamp# desc
) where rownum = 1;

NTIMESTAMP#                    USERID     USERHOST             TERMINAL          ACTION# RETURNCODE CURRENT_USER    COMMENT$TEXT
------------------------------ ---------- -------------------- --------------- --------- ---------- --------------- ----------------------------------------------------------------------
26-OCT-20 10.57.56.687258 AM   U1         hol.localdomain      pts/6                 100      28001 U1              Authenticated by: DATABASE;AUTHENTICATED IDENTITY: U1; Client address:
                                                                                                                     (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=58922))

From CDB$ROOT:

SQL> select * from (
select ntimestamp#, userid, userhost, terminal, action#, returncode, current_user, comment$text
from sys.aud$
where userid = 'U1'
  and returncode <> 0
order by ntimestamp# desc
) where rownum = 1;  2    3    4    5    6    7

NTIMESTAMP#                    USERID   USERHOST         TERMINAL   ACTION# RETURNCODE CURRENT_USER    COMMENT$TEXT
------------------------------ -------- ---------------- ---------- ------- ---------- --------------- -----------------------------------------------------------------
26-OCT-20 11.01.32.725820 AM   U1       hol.localdomain  pts/6          100       1017 U1              Authenticated by: DATABASE;AUTHENTICATED IDENTITY: U1; Client add
                                                                                                       ress: (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=58922))

As you can see, when entering the password for the u1 user, we get “ORA-28001: the password has expired,” which is recorded in the PDB’s audit log. Then, after changing the password, an attempt is made to connect to the root container as u1. Since u1 is not a common user, the login fails, and we get the “ORA-01017: invalid username/password; logon denied” error.

What happens if I try to repeat the same steps using the “test2.pdb1” service name, which according to v$active_services maps to CON_ID=3, so PDB1?

This time the password change succeeds:

$ sqlplus u1/oracle@localhost/test2.pdb1

SQL*Plus: Release 19.0.0.0.0 - Production on Mon Oct 26 12:38:04 2020
Version 19.8.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

ERROR:
ORA-28001: the password has expired


Changing password for u1
New password:
Retype new password:
Password changed

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.8.0.0.0

SQL>

Enabling tracing on the session before the initial password is provided, this time the trace gets generated in the context of “PDB1(3).” Notice the difference with the previous trace header, where “CDB$ROOT(1)” was reported:

*** 2020-10-26T12:39:18.045405+01:00 (PDB1(3))                     <<-- traced at "CDB$ROOT(1)" level
*** SESSION ID:(269.12918) 2020-10-26T12:39:18.045547+01:00
*** SERVICE NAME:(test2.pdb1) 2020-10-26T12:39:18.045557+01:00     <<-- service test2.pdb1 was used when connecting
*** MODULE NAME:(sqlplus@hol.localdomain (TNS V1-V3)) 2020-10-26T12:39:18.045564+01:00
*** ACTION NAME:() 2020-10-26T12:39:18.045570+01:00
*** CONTAINER ID:(3) 2020-10-26T12:39:18.045574+01:00

Checking PDB1’s audit log contents, there’s an initial ORA-28001, followed by RETURNCODE=0, thus a successful login:

SQL> select * from (
select ntimestamp#, userid, userhost, terminal, action#, returncode, current_user, comment$text
from sys.aud$
where userid = 'U1'
order by ntimestamp# desc
) where rownum <=2;  2    3    4    5    6

NTIMESTAMP#                    USERID   USERHOST         TERMINAL   ACTION# RETURNCODE CURRENT_USER    COMMENT$TEXT
------------------------------ -------- ---------------- ---------- ------- ---------- --------------- -----------------------------------------------------------------
26-OCT-20 11.39.18.069634 AM   U1       hol.localdomain  pts/6          100          0 U1              Authenticated by: DATABASE;AUTHENTICATED IDENTITY: U1; Client add
                                                                                                       ress: (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=59244))

26-OCT-20 11.38.04.992001 AM   U1       hol.localdomain  pts/6          100      28001 U1              Authenticated by: DATABASE;AUTHENTICATED IDENTITY: U1; Client add
                                                                                                       ress: (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=59244))

There are no new entries for the u1 user in CDB$ROOT’s audit log.

Other anomalies

Start / stop service

There are also other anomalies, e.g. the ‘pdb1.test1’ service can be started and stopped from PDB1, but then not started back:

From PDB1:

SQL> exec dbms_service.stop_service(service_name=>'pdb1.test1');

PL/SQL procedure successfully completed.

Querying v$active_services from PDB1 or CDB$ROOT, the service is not displayed. However, if I try to start it:

SQL> exec dbms_service.start_service(service_name=>'pdb1.test1');
BEGIN dbms_service.start_service(service_name=>'pdb1.test1'); END;

*
ERROR at line 1:
ORA-44773: Cannot perform requested service operation.
ORA-06512: at "SYS.DBMS_SERVICE_ERR", line 63
ORA-06512: at "SYS.DBMS_SERVICE", line 486
ORA-06512: at line 1


SQL>

Closing and / or dropping PDB1

Going to the initial setup, where all of the services I defined, except “pdb1.test1”, are mapped to PDB1:

SQL> @s

CON_NAME
------------------------------
CDB$ROOT

** Output from v$active_services: **

    CON_ID NAME
---------- -------------------
         1 pdb1.test1
         3 custom_service.app
         3 pdb1
         3 pdb1_A
         3 pdb1_B.app
         3 test2.pdb1
         3 test3.pdb1.app

7 rows selected.

** Output from cdb_services: **

    CON_ID NAME
---------- -------------------
         3 PDB1
         3 custom_service.app
         3 pdb1.test1
         3 pdb1_A
         3 pdb1_B.app
         3 test2.pdb1
         3 test3.pdb1.app

7 rows selected.

Let’s close PDB1, and check the services status:

SQL> show con_name

CON_NAME
------------------------------
CDB$ROOT

SQL> alter pluggable database pdb1 close immediate;

Pluggable database altered.

SQL> @s

CON_NAME
---------
CDB$ROOT

** Output from v$active_services: **

    CON_ID NAME
---------- -----------
         1 pdb1.test1
         3 pdb1

** Output from cdb_services: **

no rows selected

SQL>

All of the user created services were closed, except for pdb1.test1. If I try to connect to PDB1 as user u2, I get “ORA-01109: database not open.”

Let’s drop the PDB:

SQL> drop pluggable database pdb1 including datafiles;

Pluggable database dropped.

SQL> @s

CON_NAME
---------
CDB$ROOT

** Output from v$active_services: **

    CON_ID NAME
---------- -----------
         1 pdb1.test1

** Output from cdb_services: **

no rows selected

SQL>

Connecting to PDB1 as user u2 now returns “ORA-01017: invalid username/password; logon denied” which indicates that an authentication attempt is performed. Since only CDB$ROOT is open, let’s see if I can connect as system:

$ sqlplus system/oracle@localhost/pdb1.test1

SQL*Plus: Release 19.0.0.0.0 - Production on Mon Oct 26 13:05:59 2020
Version 19.8.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

Last Successful login time: Mon Oct 26 2020 11:38:01 +01:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.8.0.0.0

SQL> show con_name

CON_NAME
------------------------------
CDB$ROOT
SQL>

Conclusion

The above are relatively simple use cases, and until this issue is fixed, it’s advisable to avoid creating service names starting with “<pdb_name>.” as there could be other implications not discussed here. You may also want to keep an eye on the status of the predefect tracked as Bug 32009641.

Weighing the Pros and Cons of Oracle Autonomous Database

$
0
0

Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on August 20, 2019.

The next generation of managed database services is upon us, offering organizations of all sizes the ability to scale database workloads quickly without forking out huge investments in either hardware or specialized expertise. One of these services is Oracle Autonomous Database (ADB), marketed by the company as the “world’s first self-driving database” that uses machine learning to replace previously manual tasks.

Sounds pretty cool – but is ABD a good choice for you and your organization?

What exactly is the Oracle Autonomous Database?

If you’re in any way used to working in an on-premises Oracle system, you’ll be at home in ADB, as it’s built on the same underlying (and well-known) technology; Oracle Exadata hardware combined with software like Pluggable Database (PDB), which isolates each client to their own data neighborhood by restricting their ability to see outside their own PDB. These all run on Real Application Clusters (RAC) in a container database in one of Oracle’s cloud data centers with Oracle managing the whole thing, either on shared or dedicated underlying hardware.

It’s also worth pointing out that ADB is a relational database. That’s a clear differentiator between it and, say, Google BigQuery, a fantastic big data solution with the ability to scale to petabytes. ADB only scales to 128TB, but comes with relational database management system (RDBMS) functionality such as referential integrity (primary key / foreign-key relationships), server-side code objects such as procedures and functions and triggers.

One important thing to keep in mind, however, is that ADB isn’t a standalone product or service. It’s actually an umbrella term for the combination of two underlying Oracle services: Autonomous Data Warehouse (largely tuned for data warehouse workloads with data stored in columnar format) and the Autonomous Transaction Processing database (tuned for Online Transaction Processing-type workloads using the more traditional row storage format). 

Makes sense. But what else is so great about it?

Think of it as essentially a fully-fledged Oracle database that’s far easier and less costly to procure and scale up and down than an on-premises counterpart. And instead of your IT team managing various services, ADB automatically deals with things such as:

  • Monitoring
  • Addressing underlying component errors
  • Fine-tuning and query stabilization
  • Backups and recoveries

It also provides many features users have come to expect from one of the most powerful RDBMS systems in the world. These include everything from simple tables to the more advanced features such as server-side code (procedures, functions, triggers, types) and complex table and index types. That’s along with the ability to monitor query performance using tools to which users have grown accustomed.

Sounds impressive. But what are the drawbacks?

We all know the saying “you’ve got to give to get” and unfortunately, even though new features are constantly being added, it’s just as applicable to ADB as many other things in life. Any managed database is great for scaling up and improving efficiency, but in exchange ADB sacrifices certain benefits and features typically found in an Oracle on-premises system, for example:

  • Database Vault, OLAP, Text, Multimedia, Workspace Manager and other features are not available.
  • There’s no control over database initialization parameters or tablespaces.
  • There’s no ability to control database instance memory sizes or configurations.
  • You can’t monitor via Oracle Enterprise Manager.

PDB lockdown profiles also restrict certain commands such as ALTER TABLESPACE.

Interested in learning more about Oracle Autonomous Database? This post only scratches the surface of the deep dive that is our new white paper on ADB and whether it might be a good fit at your organization. 

Download the Oracle Autonomous Database White Paper to continue your evaluation of one of the most powerful cloud databases in the world.

Save Time With This Quick OCI CLI Installation

$
0
0

Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. We originally published today’s post on April 15, 2019.

This is more of a personal reminder than anything else, but hopefully it helps save some time for others trying to install the OCI CLI on their local Linux machines.

The idea is to have a very basic set of instructions to install the oci CLI on a Linux box and test it. This includes installing a newer version of Python and setting the system to use it by default. This is easy enough, but introduces an issue that has to be worked around.

The idea

I would usually keep these quite simple things to myself in a simple text file for my own reference. However, lately I’ve discovered that what I consider to be very basic Linux knowledge is either not-so-basic or not as widely spread among Oracle DBAs as I thought. So, this is not only to benefit my own memory but to try and help others out there that may struggle a bit with Linux-related stuff.

Installation

Both the automated and the manual installation procedures are well-documented but there is an interesting caveat I discovered while installing in my Oracle Linux 7.6 laboratory VM. I tried the automated procedure because I’m one for automating stuff and I didn’t want to deep dive into this particular procedure.

So, simply following the instructions, I proceeded to execute the curl command to download and install the CLI.

Please, note the highlighted line (34) in the code below:

[oracle@orapex ~]$ bash -c "$(curl -L https://raw.githubusercontent.com/oracle/oci-cli/master/scripts/install/install.sh)"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6283  100  6283    0     0  14095      0 --:--:-- --:--:-- --:--:-- 14087
Downloading Oracle Cloud Infrastructure CLI install script from https://raw.githubusercontent.com/oracle/oci-cli/6dc61e3b5fd2781c5afff2decb532c24969fa6bf/scripts/install/install.py to /tmp/oci_cli_install_tmp_LA0m.
######################################################################## 100.0%
Python3 not found on system PATH
Running install script.
python /tmp/oci_cli_install_tmp_LA0m  < /dev/tty -- Verifying Python version. -- Python version 2.7.5 okay. -- Verifying native dependencies. -- Unable to verify native dependencies. dist=oracle linux server, version=7.5. Continuing... ===> In what directory would you like to place the install? (leave blank to use '/home/oracle/lib/oracle-cli'): 
-- Creating directory '/u01/userhome/oracle/lib/oracle-cli'.
-- We will install at '/u01/userhome/oracle/lib/oracle-cli'.
===> In what directory would you like to place the 'oci' executable? (leave blank to use '/home/oracle/bin'): 
-- The executable will be in '/u01/userhome/oracle/bin'.
===> In what directory would you like to place the OCI scripts? (leave blank to use '/home/oracle/bin/oci-cli-scripts'): 
-- Creating directory '/u01/userhome/oracle/bin/oci-cli-scripts'.
-- The scripts will be in '/u01/userhome/oracle/bin/oci-cli-scripts'.
-- Downloading virtualenv package from https://github.com/pypa/virtualenv/archive/15.0.0.tar.gz.
-- Downloaded virtualenv package to /tmp/tmpRjNdhT/15.0.0.tar.gz.
-- Checksum of /tmp/tmpRjNdhT/15.0.0.tar.gz OK.
-- Extracting '/tmp/tmpRjNdhT/15.0.0.tar.gz' to '/tmp/tmpRjNdhT'.
-- Executing: ['/usr/bin/python', 'virtualenv.py', '--python', '/usr/bin/python', '/u01/userhome/oracle/lib/oracle-cli']
Already using interpreter /usr/bin/python
New python executable in /u01/userhome/oracle/lib/oracle-cli/bin/python
Installing setuptools, pip, wheel...done.
-- Executing: ['/u01/userhome/oracle/lib/oracle-cli/bin/pip', 'install', '--cache-dir', '/tmp/tmpRjNdhT', 'oci_cli', '--upgrade']
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Collecting oci_cli
  Downloading https://files.pythonhosted.org/packages/83/aa/922fc63808ffc79d114474bef9af8b4a066e4653898b6df427cfcf98f166/oci_cli-2.5.5-py2.py3-none-any.whl (3.8MB)
    100% |------------------------------| 3.8MB 6.4MB/s 
(:..)
  Downloading https://files.pythonhosted.org/packages/68/9e/49196946aee219aead1290e00d1e7fdeab8567783e83e1b9ab5585e6206a/pycparser-2.19.tar.gz (158kB)
    100% |------------------------------| 163kB 25.2MB/s 
Building wheels for collected packages: arrow, configparser, terminaltables, retrying, PyYAML, pycparser
  Building wheel for arrow (setup.py) ... done
  Stored in directory: /tmp/tmpRjNdhT/wheels/ce/4f/95/64541c7466fd88ffe72fda5164f8323c91d695c9a77072c574
  Building wheel for configparser (setup.py) ... done
  Stored in directory: /tmp/tmpRjNdhT/wheels/a3/61/79/424ef897a2f3b14684a7de5d89e8600b460b89663e6ce9d17c
  Building wheel for terminaltables (setup.py) ... done
  Stored in directory: /tmp/tmpRjNdhT/wheels/30/6b/50/6c75775b681fb36cdfac7f19799888ef9d8813aff9e379663e
  Building wheel for retrying (setup.py) ... done
  Stored in directory: /tmp/tmpRjNdhT/wheels/d7/a9/33/acc7b709e2a35caa7d4cae442f6fe6fbf2c43f80823d46460c
  Building wheel for PyYAML (setup.py) ... done
  Stored in directory: /tmp/tmpRjNdhT/wheels/ad/da/0c/74eb680767247273e2cf2723482cb9c924fe70af57c334513f
  Building wheel for pycparser (setup.py) ... done
  Stored in directory: /tmp/tmpRjNdhT/wheels/f2/9a/90/de94f8556265ddc9d9c8b271b0f63e57b26fb1d67a45564511
Successfully built arrow configparser terminaltables retrying PyYAML pycparser
Installing collected packages: idna, six, python-dateutil, arrow, jmespath, configparser, enum34, pycparser, cffi, asn1crypto, ipaddress, cryptography, pyOpenSSL, httpsig-cffi, terminaltables, click, cx-Oracle, retrying, pytz, PyYAML, certifi, oci, oci-cli
Successfully installed PyYAML-3.13 arrow-0.10.0 asn1crypto-0.24.0 certifi-2019.3.9 cffi-1.12.2 click-6.7 configparser-3.5.0 cryptography-2.4.2 cx-Oracle-7.0.0 enum34-1.1.6 httpsig-cffi-15.0.0 idna-2.6 ipaddress-1.0.22 jmespath-0.9.3 oci-2.2.4 oci-cli-2.5.5 pyOpenSSL-18.0.0 pycparser-2.19 python-dateutil-2.7.3 pytz-2016.10 retrying-1.3.3 six-1.11.0 terminaltables-3.1.0
===> Modify profile to update your $PATH and enable shell/tab completion now? (Y/n): Y
===> Enter a path to an rc file to update (leave blank to use '/home/oracle/.bashrc'): 
-- Backed up '/u01/userhome/oracle/.bashrc' to '/u01/userhome/oracle/.bashrc.backup'
-- Tab completion set up complete.
-- If tab completion is not activated, verify that '/u01/userhome/oracle/.bashrc' is sourced by your shell.
-- 
-- ** WARNING: Other 'oci' or 'bmcs' executables are on your $PATH. **
-- Conflicting paths: /home/oracle/bin/oci, /home/oracle/bin/bmcs, /home/oracle/bin/oci, /home/oracle/bin/bmcs, /home/oracle/bin/oci, /home/oracle/bin/bmcs, /home/oracle/bin/oci, /home/oracle/bin/bmcs
-- You can run this installation of the CLI with '/u01/userhome/oracle/bin/oci'.
-- 
-- ** Run `exec -l $SHELL` to restart your shell. **
-- 
-- Installation successful.
-- Run the CLI with /u01/userhome/oracle/bin/oci --help

The highlighted line offers a deprecation warning for the currently available Python version, so I decided to have Python 3 installed for the CLI to use.

Python installation

This is a simple call to yum to install the package python36.

[root@orapex ~]# yum install -y python36
Loaded plugins: langpacks, ulninfo
Resolving Dependencies
--> Running transaction check
---> Package python36.x86_64 0:3.6.6-2.el7 will be installed
--> Processing Dependency: python36-libs(x86-64) = 3.6.6-2.el7 for package: python36-3.6.6-2.el7.x86_64
--> Processing Dependency: libpython3.6m.so.1.0()(64bit) for package: python36-3.6.6-2.el7.x86_64
--> Running transaction check
---> Package python36-libs.x86_64 0:3.6.6-2.el7 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
===================================================================================================================================================================================================================
 Package                                             Arch                                         Version                                           Repository                                                Size
===================================================================================================================================================================================================================
Installing:
 python36                                            x86_64                                       3.6.6-2.el7                                       ol7_developer_EPEL                                        65 k
Installing for dependencies:
 python36-libs                                       x86_64                                       3.6.6-2.el7                                       ol7_developer_EPEL                                       8.5 M
Transaction Summary
===================================================================================================================================================================================================================
Install  1 Package (+1 Dependent package)
Total download size: 8.6 M
Installed size: 36 M
Downloading packages:
(1/2): python36-libs-3.6.6-2.el7.x86_64.rpm                                                                                                                                                 | 8.5 MB  00:00:01     
(2/2): python36-3.6.6-2.el7.x86_64.rpm                                                                                                                                                      |  65 kB  00:00:03     
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                                              2.6 MB/s | 8.6 MB  00:00:03     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : python36-libs-3.6.6-2.el7.x86_64                                                                                                                                                                1/2 
  Installing : python36-3.6.6-2.el7.x86_64                                                                                                                                                                     2/2 
  Verifying  : python36-3.6.6-2.el7.x86_64                                                                                                                                                                     1/2 
  Verifying  : python36-libs-3.6.6-2.el7.x86_64                                                                                                                                                                2/2 
Installed:
  python36.x86_64 0:3.6.6-2.el7                                                                                                                                                                                    
Dependency Installed:
  python36-libs.x86_64 0:3.6.6-2.el7                                                                                                                                                                               
Complete!
[root@orapex ~]# python -V
Python 2.7.5
[root@orapex ~]# python36 -V
Python 3.6.6

As you may have noted, in order to use the new version of Python, we would use the python36 command, while python will call version 2.7. This is a little problem when the oci CLI will call python and we want it to be the latest version.

In order to solve this little problem, I decided to go for Linux alternatives. This is a command to help maintain symbolic links to determine default commands system-wide. I’ve highlighted “system-wide” for a good reason we’ll see below.

Using alternatives is pretty straightforward. It requires a simple call defining the new symbolic link and the command it points to, plus a priority value. In my case, I opted for creating the same link for Python 2 and Python 3, giving a higher priority to Python 3. This means the system will first try Python 3, then Python 2 if the first one is not available.

[root@orapex ~]# alternatives --install /usr/bin/python python /bin/python36 2
[root@orapex ~]# alternatives --install /usr/bin/python python /bin/python 1
[root@orapex ~]# alternatives --display python
python - status is auto.
 link currently points to /bin/python36
/bin/python36 - priority 2
/bin/python - priority 1
Current 'best' version is /bin/python36.
[oracle@orapex ~]$ python -V
Python 3.6.6

System-wide change means trouble

At this point we have the OCI CLI installed and using Python 3 for its calls, so we are fine here. Fortunately, I was following the steps found in this blog post to configure alternatives properly and it includes an appendix. In this appendix, the post mentions a problem that appears in yum once the system-wide switch to Python 3 has completed. Also, fortunately, the post provides the solution / workaround for it, which I immediately applied. The solution is pretty simple; just modify the yum Python scripts to use the older Python version. There are two files to modify:

[root@orapex ~]# yum search python
  File "/bin/yum", line 30
    except KeyboardInterrupt, e:
                            ^
SyntaxError: invalid syntax
[root@orapex ~]# head -1 /usr/bin/yum
#!/usr/bin/python
[root@orapex ~]# vi /usr/bin/yum
[root@orapex ~]# head -1 /usr/bin/yum
#!/usr/bin/python2.7
[root@orapex ~]# yum search python |more
Loaded plugins: langpacks, ulninfo
============================= N/S matched: python ==============================
GitPython.noarch : Python Git Library
(...)
[root@orapex ~]# head -1 /usr/libexec/urlgrabber-ext-down
#! /usr/bin/python
[root@orapex ~]# vim /usr/libexec/urlgrabber-ext-down
[root@orapex ~]# head -1 /usr/libexec/urlgrabber-ext-down
#! /usr/bin/python2.7

With these simple steps, we have the OCI CLI installed and working with the latest Python version, until a Python 4 version comes in, but now we know how to use alternatives to fix it.

The next step is to use the CLI itself to create a configuration file to access our OCI tenancy resources and start working, but that will be in another blog post.

Thank you for reading.

Viewing all 301 articles
Browse latest View live