Quantcast
Channel: Oracle – Official Pythian® Blog
Viewing all 301 articles
Browse latest View live

ODA re-imaging could take anything between 20 and 120 mins

$
0
0

20 mins vs 2 hours

Recently I have noticed that re-imaging process on the second Oracle Database Appliance node took significantly shorter time comparing with the first node. The difference was so significant that I started to suspect that there were something wrong with either particular set of hardware or some of the re-imaging process steps have failed on the second node. On the first node the process has completed  in 120 minutes, but  on the second it took just 20 around minutes.

I spent quite a bit of time to understand that exactly has been happening. But before I tell you, can I ask you what theoretical options would you come with given the behavior I just described? Please share those with me in the comment section below, please :)

Any mystery can be solved

Question is if we are ready to pay for it. Sometimes it takes quite a bit of effort to get to the truth and very often we don’t have time or interest or budget to find it. In this particular case I was so curious that I have spent good part of my my weekend looking for a clue. On the way I had  to learn a bit of “Anaconda (installer)“, SquashFS file system, how to rebuild ISO image and the way ODA re-imaging process works. The purpose of this paragraph is to encourage you to be curious and don’t leave mysteries unresolved. Invest  some time and you will learn a lot on the way :)

NOTE: I will try to share the way I have troubleshot this problem in my future blog posts.

Bug in the “post-install” script

It appears that the problem is in the way the ISO:/Extras/setupodaovm.sh post install script checks if software RAID have completed re-synchronization of 4 internal HDD partitions (md devices) in between 2 physical disks. There are the following check at the very end of the script:
mdadm --wait /dev/md1
mdadm --wait /dev/md2
mdadm --wait /dev/md3

Each of the lines designed to check if the software RAID completed synchronizing an md device (partition). The following is part of man page for mdadm utility

       -W, --wait
              For  each  md  device  given, wait for any resync, recovery, or reshape activity to finish before returning.  mdadm will return with success if it actually waited for
              every device listed, otherwise it will return failure.

During the re-imaging process all 4 volumes got to be rebuild and need to be synchronized by the software RAID. It worth mentioning that software RAID on ODA configured to re-synchronise  one device at the time. Other devices just seating and waiting they turn in the status DELAYED.  The problem is that if a device is in the state resync=DELAYED the “mdadm –wait” check will not stop and wait for it. Therefore just one of the mdadm checks will wait until re-synchronization process finishes others successfully pass even if a device isn’t synchronized yet (resync=DELAYED). Now let’s have a look on the devices’ sizes and associated synchronization times:

Name Size  Function Sych-time
md0 60M /boot few seconds
md1 17G / 10 mins
md2 217G /OVS 90 mins
md3 4G swap ~2min

Just to make life a bit more interesting the software RAID picks up a device to be re-synchronized next randomly. That means it is just matter of luck what device will get processed next. If it is md1 device (17GB) then the whole re-imaging process takes 20 minutes. However if the software RAID synchronises md2 device (217GB) during the execution of the mdadm check then the re-imaging process takes ~120 minutes.

A way to fix the problem

I am not a great expert in the Linux System Administration area (I am an Oracle DBA after all) and would rather let Oracle folks make the final call, but it seems to me that in order to make sure that all 4 devices got re-synchronized before the re-imaging process finishes the check should look like the following.

mdadm --wait /dev/md0 /dev/md1 /dev/md2 /dev/md3

Conclusion

To conclude until the issue is fixed know that

  1. you may face different ODA nodes’ re-imaging times
  2. to be on the safe side check if md devices’ re-synchronization process  is finished by running “cat /proc/mdstat” command before running any business critical processes on your ODA.

Yury
View Yury Velikanov's profile on LinkedIn

PS “Stay Hungry Stay Foolish” - Steve Jobs


EBS Forms compilation errors in large terminal windows. Size does matter!

$
0
0

During a recent customer environment cloning activity I got myself up to the point where CUSTOM.plx was required to be recompiled. Nothing difficult you may say, right? I thought the same. But that activity just killed lots of troubleshooting hours for me.

frmcmp_batch.sh call was just failing with “Terminal map initialization failed.”

[oracle@appslab01 ~]$ frmcmp_batch.sh module=CUSTOM.pll userid=apps/apps output_file=CUSTOM.plx module_type=LIBRARY compile_all=YES
Terminal map initialization failed.
API: could not initialize character-mode driver.
FRM-91500: Unable to start/complete the build.
[oracle@appslab01 ~]$

After some short troubleshooting I thought that just setting the DISPLAY variable and running a manual compilation should be ok. And it worked actually.

[oracle@appslab01 ~]$ export DISPLAY=:1
[oracle@appslab01 ~]$ frmcmp.sh module=CUSTOM.pll userid=apps/apps output_file=CUSTOM.plx module_type=LIBRARY compile_all=YES
Forms 10.1 (Form Compiler) Version 10.1.2.3.0 (Production)

Forms 10.1 (Form Compiler): Release – Production

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
PL/SQL Version 10.1.0.5.0 (Production)
Oracle Procedure Builder V10.1.2.3.0 – Production
Oracle Virtual Graphics System Version 10.1.2.0.0 (Production)
Oracle Multimedia Version 10.1.2.0.2 (Production)
Oracle Tools Integration Version 10.1.2.0.2 (Production)
Oracle Tools Common Area Version 10.1.2.0.2
Oracle CORE 10.1.0.5.0 Production
Compiling library CUSTOM…





Done.
[oracle@appslab01 ~]$

My victory didn’t last too long. During one of the later steps, I was recompiling Form objects for several products using the ADADMIN, and all of these jobs were failing too. When I started to look into worker logs, I found that frmcmp_batch.sh is being executed, of course, and the logs were full of “Terminal map initialization failed” messages.
Lots of hours passed troubleshooting this. I didn’t find any clue or known issue in MyOracleSupport and Google/Bing searches also didn’t find anything that could guide me to a solution. So I start “digging” myself.

Referring to Oracle Support Note [ID 1085526.1] for a generic FRM-91500 troubleshooting gave me good hints on possible issues with the fmrcvt220.res terminal mapping resource file and interaction with TERM/ORACLE_TERM environment variables. Getting no results here, I got an idea to try another terminal connection using the Mac default Terminal.app (I was using SecureCRT prior to that).
And it worked!!! I saw no issues with frmcmp_batch.sh, and initiated ADADMIN Forms object compilation, which also proceeded successfully.

Having a small Terminal.app window on the screen opened by default and 1920×1200 resolution on the screen visibility wasn’t too good, so I maximized the window by clicking on the plus icon.
As soon as my window was maximized, all running ADADMIN jobs started to fail. And what do you think I found in worker logs? Exactly! The same “Terminal map initialization failed” error.

So the reason for all these failures was just my “too large” terminal window size. I remembered “Terminal too wide” VIM text editor issues due to the same reason.

This can be easily reproduced. I resized my terminal to half-size and ran ADADMIN and initiated Forms compilation for all products. While workers processed the compilation jobs, I started to resize the window using the lower-right corner.
It was possible to clearly see how all workers started to fail, and again started to successfully compile when I was resizing the terminal window back to half-size.

I have just reproduced it on my lab instance while I was writing this blog post. And it’s not only happening on more exclusive platforms like HP-UX or AIX. It’s also a generic Linux issue, which is most commonly used for E-Business Suite.

– Maximized Terminal window

[oracle@appslab01 ~]$ frmcmp_batch.sh module=CUSTOM.pll userid=apps/apps output_file=CUSTOM.plx module_type=LIBRARY compile_all=YES
Terminal map initialization failed.
API: could not initialize character-mode driver.
FRM-91500: Unable to start/complete the build.
[oracle@appslab01 ~]$

– Resized it a bit and running same command.

[oracle@appslab01 ~]$ frmcmp_batch.sh module=CUSTOM.pll userid= apps/apps output_file=CUSTOM.plx module_type=LIBRARY compile_all=YES
Forms 10.1 (Form Compiler) Version 10.1.2.3.0 (Production)

Forms 10.1 (Form Compiler): Release – Production

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
PL/SQL Version 10.1.0.5.0 (Production)
Oracle Procedure Builder V10.1.2.3.0 – Production
Oracle Virtual Graphics System Version 10.1.2.0.0 (Production)
Oracle Multimedia Version 10.1.2.0.2 (Production)
Oracle Tools Integration Version 10.1.2.0.2 (Production)
Oracle Tools Common Area Version 10.1.2.0.2
Oracle CORE 10.1.0.5.0 Production
Compiling library CUSTOM…



Done.
[oracle@appslab01 ~]$

Conclusion

This blog post, definitely, is not about a common issue many of my colleagues all over the world might face, but it’s a good starting point and, I hope, it will save lots of troubleshooting time or Severity 1 SR’s for someone as soon as the search engines process this post.
High-resolution displays are slowly getting a more wide usage and good old 1024×768 isn’t always appropriate anymore. Who knows what else besides “Terminal too wide” and this one might await us.

Update

Referencing to one of my comment replies.
The actual problem is not with resolution itself, but more with row/column count your terminal has.
This problem is starting to show up as soon as you are crossing 255 column width.
I didn’t get the row count to the same number with the highest resolution on my Retina, even having the minimal font size I got only 123 rows. It’s just my personal guess that having more than 255 rows there might be the same error message.
So… there is a workaround. If you would like still to use your terminal window at the full scale and not worry about such possible issues, get your font size configured properly in the terminal session so that cols/rows wouldn’t cross 255.

Installing Oracle VM Manager 3.2.x under Dom0 host

$
0
0

Some of you know that I have published how to install Oracle VM Manager (OVMM) on a Dom0 host since Oracle released the Oracle VM 3. I have described why you possibly may want to do it in my very first post. Just want to mention here that it should be used for sandbox configurations only. You can find the previous post on how to install 3.1.1 OVMM version under Dom0 here. This time I talk about 3.2.2 version.

NOTE: At the time of writing ORACLE VM 3.2.3 SERVER (Patch 16410428) and ORACLE VM 3.2.3 MANAGER (Patch 16410417) became available. I didn’t have time to install the latest versions yet. However I do not expect that installation are significantly different from the versions I have used for this blog post. I will update this blog post if I find that there is something else you need to know in order to install the latest versions. Please feel free to ask questions or provide your hints under comments section of that post below.

MySQL repository

There is one significant difference in the OVMM 3.2.x version’s technology stack comparing to 3.1.x versions. Starting from 3.2 version Oracle  use an MySQL database instead of Oracle Express database for an OVMM repository by default. It made easier to install OVMM under Dom0 host than before. However there are still a few things you need to know in order to get it working. This is how you do it.

Pre install steps

  • Install Oracle VM server release 3.2.2 (or latest OVS version available)
    • Most answers to the installation questions are obvious and simple
    • This time we don’t need to customize swap size as before
    • Note that it will make your life much easier if you configure OVS network with an Internet access right from the beginning
  • After you got OVS up and running change /etc/redhat-release file
    • This will make OVMM installation work (MySQL part of it)
cp -rp /etc/redhat-release ~/redhat-release.orig
echo "Red Hat Enterprise Linux Server release 5.5 (Tikanga)" > /etc/redhat-release
  • Configure Oracle Public Yum repository
    • Use “Oracle Linux 5″ version
  • Install additional RPMs
yum install xz-devel
yum install zip

Installation steps

  •  Install OVMM 3.2.1 (or latest release available)
    • Copy the installation ISO file to the OVS file system
    • Mount it as virtual CDROM
    • Use “runInstaller.sh -n” to ignore some other prerequisites
mount -o loop /u01/SOFTWARE/OracleVM-Manager-3.2.1-installer-OracleLinux.iso /mnt/cdrom
/mnt/cdrom/runInstaller.sh -n
  •  Upgrade OVMM to 3.2.2 version (or latest version available)
    • Patch 16410417: ORACLE VM 3.2.2 MANAGER UPGRADE ISO RELEASE
    • Follow the readme instructions
    • Put an ISO file on an OVS filesystem and mount it the same way as in the previous step
    • Install the update (are are 2 steps to be executed, see README.txt for the details)

Post install steps

  • Change /etc/redhat-release back
cp -rp /etc/redhat-release.orig ~/redhat-release

At this stage you are ready to access Oracle VM Manager Web interface.

Enjoy,

Yury
View Yury Velikanov's profile on LinkedIn

A most simple cloud: Is Amazon RDS for Oracle right for you?

$
0
0

Amazon Web Services has offered Relational Database Service as part of their cloud offering since 2011.  These days, RDS provides easy to deploy, on-demand database-as-a-service for MySQL, Oracle and SQL Server.  When you compare it to essentially any other method of hosting and licensing Oracle, RDS seems to have a variety of really appealing qualities:

With RDS/Oracle, you don’t really need a DBA to take care of your database. With the notable exception of tuning, most of the DBA tasks, such as database creation and configuration, backups, upgrades and disaster recovery are simply features of the service.

Oracle databases on RDS can be created with “license included.” This means that for certain Oracle editions, you can pay as you go for your license. Currently this is limited to Standard Edition One, but rumours abound that higher editions, including Enterprise Edition, will be available with license-included in the near future.

The Oracle versions available on RDS are limited to a few modern, stable releases. This keeps customers from encountering oddball bugs and version quirks.

So far, RDS seems like a clean, simple elegant solution, and it is. It clearly has a place with certain enterprises that use or want to use Oracle. So the question you might have is, “Is it right for me?” Since no solution is perfect for every deployment, it is helpful to explore the factors that can help you decide of RDS/Oracle will fit your needs.

Availability of technical personnel: If you already run an enterprise that employs DBAs, there may not be as great an upside to deploying a largely DBA-free solution like RDS. On the other hand, if your in-house database expertise is not deep, RDS has the advantage of low technical barriers to entry. With RDS/Oracle, provisioning, backups, monitoring, upgrades and patching are managed and controlled via the web API. The major missing component is tuning. With RDS/Oracle, you still need to have some knowledge of Oracle and SQL tuning to run a successful RDS service.

Tuning: While we are on the topic, let’s discuss Oracle tuning and RDS. As with conventionally-hosted databases, diagnostic pack (and ASH / AWR) is available and supported, as long as you are running Enterprise Edition and you are licensed for those options. AWS even provides Enterprise Manager DB Control as an option. For all other editions however, there is a major hitch. Statspack, the tried and true alternative to AWR, is not supported on RDS. You can still query the v$ views to access current and aggregated wait event information, but the lack of Statspack support is a big stumbling block. SQL tracing and event 10046 (and many other diagnostic events) are available on RDS, and a recent enhancement has provided access to these files via the web API. Previously, access to alert and trace files was via external tables and SQL only.

Privileges: RDS grants you limited management privileges for Oracle, but it stops short of the SYSDBA role, which would have total control over the system. Some applications require SYSDBA, especially during schema installation. If you have an application that absolutely cannot function without SYSDBA privileges, RDS is off the table. On the other hand, most of the application schema deployment scripts that purport to need SYSDBA privilege actually need no such thing. In many of these cases, minor changes to schema build scripts would make them RDS-compatible.

Loading/migration: Most Oracle customers are accustomed to migrating their databases from one hosting solution to another via datafile copy. In the case of vary large databases, migration by physical standby switchover is the method of choice. With RDS, there is no access to the underlying filesystem, so datafile-based migration methods are impossible. Since the only access to RDS/Oracle is via SQL Net, data must be loaded using database links. This means using Data Pump, DML over database link, or materialized views. This final option is particularly interesting. If they first pre-create all of the tables and indexes to be migrated in RDS, customers can then build fast-refresh materialized views on the tables, and continually refresh them from the source system. When the customer wants to cut over to RDS, it can be accomplished simply by stopping application access to the source, refreshing all MVs one more time, and converting the MVs in RDS back to tables by dropping the MV objects. While this method is prone to problems stemming from schema design, high rate of change and large transactions, it is likely the best and only solution to a minimal-downtime migrations to RDS/Oracle.

Database versions: If you are planning to migrate to RDS from a conventional hosting solution, and you don’t already use one of the Oracle versions supported by RDS, your migration to RDS will also amount to a database upgrade. There is nothing fundamentally wrong with this, since you will be moving to a version well tested and certified by Amazon. On the other hand, any third party (or homegrown) software will have to be checked and tested to make sure it is runs and is supported on one of the available versions under RDS. Also be aware that database upgrade can come with their share of issues. The most common upgrade issues are small numbers of SQL statements that perform worse after upgrade, because of optimizer regressions.

Backup and recovery: RDS/Oracle backs up the database using storage snapshots, and boasts point-in-time recovery. There are some clear advantages to this method. Backups complete quickly, and you can execute them as often as you want. On the other hand, because Recovery Manager is not supported, some of the nice things you can do with that tool are missing from RDS. For instance, simple small repairs such as single block, single datafile, or single tablespace recovery are impossible with RDS. Recovery using storage snapshots is an all-or-nothing proposition.

High availability and disaster recovery: On the plus side, RDS/Oracle provides a very easy way to set up redundancy across Amazon availability zones (which you can think of as separate datacenters in the same region). This configuration, called multi-zone, provides synchronous storage replication between your production RDS database and a replica in one of the other zones within the same region. For the MySQL version of RDS, the replica is readable. However, this is not so for Oracle or SQL Server. So multi-zone RDS provides redundancy for Oracle, but not a read replica. Significantly, because nearly all viable replication options are unsupported, including Data Guard (standby database) and Streams, RDS does not provide customers with a cross-region DR solution.

Limitations on features, parameters and character sets: RDS/Oracle has enabled and supports a broad range of Oracle features, parameter settings and character sets. However, a subset of each of these categories is not supported, either because of how RDS is architected, or because Amazon has not seen the demand for those things to be great enough to merit the engineering effort involved in supporting them. Depending on the needs of the application, any limitations arising from the following lists may or may not affect you.

Features supported / not supported on RDS/Oracle (note that RAC is not supported on EC2 either)
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Oracle.html#Oracle.Concepts.FeatureSupport

Character sets supported (note that this list does not include UTF8 or WE8ISO8859P1, AKA Latin-1, both very common)
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.OracleCharacterSets.html

The available parameters and allowed settings are dictated by the edition, options and version of Oracle you have deployed. The complete list can be obtained via the RDS web API.

To summarize, Amazon RDS for Oracle is a really exciting option. The right application with limited requirements might find RDS to be a totally suitable platform for hosting a database. However, as the enterprise requirements and resulting degree of complexity of the database solution increase, RDS is gradually ruled out as an option. For larger and more complex deployments, the issues around tuning, migration and HA/DR completely eliminate RDS as a solution. For these more complex cases, Amazon’s Elastic Compute Cloud (EC2) can provide a much broader range of possible solutions. I would never be one to discount anything Amazon Web Services offering. Any deficiencies I call out in a blog posting like this one will probably be made obsolete as Amazon announces new RDS capabilities.

Would you deploy your databases on Amazon RDS for Oracle?  Why or why not?

Hard drive Inner or Outer

$
0
0

To be precise I wonder if OUTERmost tracks of a spinning HDD are faster than INNERmost tracks? Should we put physical IO performance secretive data to OUTERmost parts of  the disk and less critical data to INNERmost parts as several vendors suggests? Well I didn’t find a better solution than grab all HDDs I have had and start testing :) Yes! It is a work in progress project ….

My HDDs

Disclaimer

Before you start criticize my testing results I would like to make a few points clear:

  • I am not an expert in the hardware space
    • I am just a curios Oracle Administrator opened to any suggestions on how to improve testing results and get closer to clarifying things
  • Some of you will say that it is a useless exercise as nowadays we don’t have any control over what areas of a single HDD data is placed
    • Well in some rare configurations like Exadata or Oracle Database Appliance we actually have this power and can possibly impact the IO performance
    • In other cases sometimes it is good to understand how things works to explain why there is a certain performance impact
  • As I am an Oracle DBA and Oracle databases most often are random IO bounded, I have focused my attention on random (8k) IO testing
  • I do believe there is a better ways to test HDDs. Unfortunately I don’t have enough knowledge about other options. I am open to your suggestions on how to do it in a better way.
    • Just keep in mind that I have Windows 7 (64 bit), Dell Latitude E6410 for this testing
  • At the moment I am waiting on SATA adapter to arrive. I will re-run some of the tests I did to confirm or adjust the testing results.
  • This is a work in progress project and I am not ready to make a final conclusions ( if will be ready to make those at all ;)

My expectations

Based on previous experience I expect that

  1. OUTERmost tracks will not be too faster than INNERmost tracks
  2. The worst performance should be when data is accessed from both OUTERmost and INNERmost tracks at the same time

I have focused my attention to 3 tests:

  • Data on OUTERmost tracks
  • Data on INNERmost tracks
  • Data distributed equally through full HDD surface

Note: I have made a fourth test where I run random IO test accessing data from OUTERmost and INNERmost tracks at the same time. The test results were very close to the full surface tests. Therefore I do not provide those in this blog post.

How and what did I test

Hardware

To start I took 7 HDDs that I happened to have and 2 SATA to USB adapters.

SATA to USB
Software

I have used 2 options to test and confirm IOPS results:

  • Windows 7 have a nice little (; silly ;) utility winsat. It didn’t take me too long to figure out how to make it do random 8k reads
winsat disk -ran -ransize 8192 -read -drive E
  • Oracle 11GR2 comes with Oracle native orion (ORacle IO Numbers) utility. I just installed it and used the command below to test random IOs
orion -testname e_hdd -duration 20 -cache_size 0 -write 0 -num_disks 1 -run advanced -size_small 8 -matrix row -num_large 0
# e_hdd.lun
\\.\e:

Assumption

To make things simple I have assumed that HDD controller (or windows, or who ever, … remember I am not an expert) allocates space starting from OUTERmost (fastest) tracks of the disk. Therefore I have allocated 1GB of unformatted space to E: drive first, then filled all space but last 1GB with an empty partition and created a 1GB G: drive. My assumption is that partition/drive E: is located on OUTERmoston tracks but partition/drive G: on the INNERmost (slowest) tracks. As you will see from the results below this assumption isn’t correct for all HDDs.

2013-06-11_2236To test full surface I have deleted all partitions and created one big partition.

Please let me know (using comment section bellow) if there is a better way to ensure that a partition is created on OUTERmost tracks of HDD or at least how to check what tracks the partition is created on (if it is possible).

Results

“Good” results

The following reflects IOPS testing for 3 HDDs that confirmed my expectations. It is clear that there is no less than 10% gain between OUTERmost and INNERmost tracks. However there is a significant performance impact if a HDD’s head should move across whole HDD surface to access data.

HDD Name Outer Inner Full Outer vs Inner Outer vs Full
WD 2.5″ 1TB 5400RPM / WDBBEP0010BRD 127 121 64 4.96% 98.44%
Hitachi 3.5″ 320GB 7200RPM / HDT725032VLA380 133 124 58 7.26% 129.31%
WD  2.5″ 160 GB 5400RPM / WD1600BEVT 112 103 61 8.74% 83.61%
“Other” results

A careful reader would notice that I didn’t provide all test results so far and you are right. The reason being the rest of the results doesn’t confirm my theory :). Have a look on the other 3 HDDs test results below (Note one of the 7 HDDs have had data on it therefore I have excluded it from OUTERmost and INNERmost tracks testing).

HDD Name Outer Inner Full Outer vs Inner Outer vs Full
HGST 2.5” 1TB 7200RPM / AT-0J22423 107 132 69 -18.94% 55.07%
Segate 2.5” 250GB 7200RPM / ST9250410AS 88 71 63 23.94% 39.68%
Seagate 3.5″ 1TB 7200RPM / ST31000333AS 141 92 69 53.26% 104.35%

The 3 HDDs above ether did show results that I can’t explain (AT-0J22423) as of now or performance difference in between OUTERmost and INNERmost tracks is more significant than for the first set of HDDs. However for both sets it is kind of clear that if a HDD moves head across whole surface there is a significant performance penalty. Those are expected result, aren’t those?

Intermediate conclusions

There are some HDDs models where there is no significant performance difference between accessing data located on OUTERmost or INNERmost tracks. However in some cases IO performance could be by 130% slower if HDD header travels across whole surface to return data.

This have 2 possible practical implications:

  • If someone states that a partition is created on a OUTERmost tracks of a HDD it isn’t necessary mean IO are significant faster from that partition than from any other region of the HDD
  • It could significantly slowdown IO operations if data to be accessed from both  OUTERmost and INNERmost tracks (e.g. if DATA is located on OUTERmost tracks but FRA on INNERmost tracks)
  • You may find that your storage performance degrades more “active” data you put on your hard drives (fill hard drives)

Keep in mind that there are possible exceptions. Based on my initial tests there some HDDs models where the difference between OUTERmost and INNERmost tracks is significant.

Please help to improve the test results

As I stated at the begging of this post a) I am not an expert in this space b) This is a work in progress project and I am looking for better ways to test random IOs c) I need your help to understand why there are exceptions and suggestions on how to adjust the HDDs testing  process to improve the test results and get closer to good conclusions.

Yury

Oracle Database Load Testing Tools – Overview

$
0
0

Someone just asked at Pythian’s internal forum the following question: “The team here is evaluating DB load testing tools (Hammerdb, Orion, SLOB, and Swingbench) and was wondering about our experience in using different tools and what is our opinion?” I have experience using some the tools mentioned. Therefore decided to answer using this public blog post as it could be useful for you too and easier for me to reference it in the future :)

Right tool for the right job

Most of the load testing tools could be grouped in the following categories:

  • IO subsystem testing tools
  • RDBMS Level testing tools
  • Application Level testing tools (DB Side)
  • Application Level testing tools (Apps Side)

Each of the next categories addresses the wider range of business testing problems/tasks. However each of the next category will requiter slightly more resources and involves a bit more challenges. You should clearly define your testing goals before starting to consider the right tool for the job. I would suggest that you consider tolls from the first category first than move to next category if tools in the first category doesn’t satisfy your business requirements. I hope that the overview will give you enough input to start putting your testing plans together.

IO subsystem testing tools

Those tools are easy to configure and run. However those are doing nothing but sending “simple/meaningless/no real data” IO requests to a storage device and measure response times. Most tools have the abilities to run read only tests, read write tests, random IO tests, sequential IO test, increase readers/writers count, warm up storage’s cache, and do a mixture of test mentioned.

Pros:

  • Easy to run
  • Short  learning curve ( add 2-4 hours to first few runs for an initial learning)
  • Low budget (4-16 hours should be enough for a testing project)
  • Easy to get and reproduce results
  • Easy to compare results with other platform

Cons:

  • May not represent your RDBMS or Application IO patterns
  • Doesn’t test anything else but Physical IO (fair enough)

Representatives:

  • orion (ORacle IO Numbers) – This tool is developed and maintained by Oracle. it is part of the latest Oracle DB distributions (e.g. 11.2.0.3). Orion was created to to test Oracle DB IO workload. It uses some of the typical system calls that Oracle database kernel uses to retrieve and write data from/to data files.
  • winsatModern Microsoft Windows distributions have winsat utility that is capable to do some basic IO testing.
    • Experience: I have used this utility to confirm orion results on Windows platform
    • References:
  • iometerMy friend (Artur Gaffanov) suggested iometer as alternative tool from the same category. This utility originally has been developed by Intel and is currently maintained  by an Open Source Community.
    • Experience: I do not have personal experience with the tool. However a quick search though Pythian Knowledge Base (Support Track) retrieved several references. I would say there is some experiences at Pythian.
    • References: http://www.iometer.org/

RDBMS Level testing tools

This group of tools works from within an Oracle Instance. Therefore some people (I bet if they read this post they know exactly who I am referencing  to ;) say these groups of tool could reproduce the RDBMS IO calls much closer than the fist group. I may not 100% agree that the first group doesn’t imitate RDBMS calls  close enough, but there is other benefits that this group of tools provides. Some of them could be used to test other system resources’ performance like CPU and memory.

Pros:

  • Use RDBMS calls to test  system resources (uses database the same way as your application does)
  • Can be used to test all main system’s resources (CPU, Memory, IO)
  • Relatively easy to run
  • Short  learning curve ( add 4-8 hours to first few runs for an initial learning)
  • Relatively low budget (1-3 days should be enough for a testing project)

Cons:

  • May not represent your Application IO patterns
  • Need a good basic understanding of RDBMS concepts
  • You may need to spent a bit more time verifying and adjusting an instance/test’s configuration to ensure that it tests exactly what you want to test
  • It would take you more time to run the same number of test than using a tool from the first group.
  • You may need to spend a bit more time to configure your Oracle Instance to test performance the way you want
  • You need an Oracle Database to be up and running to use those testing tools

Representatives:

  • SLOB – The Silly Little Oracle Benchmark from Kevin Closson. The framework uses a simple and typical database operation to put some load on a system. Depending on the amount of memory you allocate to data cache it would test ether CPU and memory (Logical IOs) or Storage (Physical IO). The tool uses index range scan and table blocks look-ups.
    • Experience: I did spent quite a bit of time running tests using SLOB utility. There are quite several folks around the world who are using SLOB for testing Oracle systems’ performance. From my experiences I should say that it takes a bit of time to understand the things to be careful with during testing. However as soon as you know what are you doing it takes less time time to test and compare different systems’ performance.
    • References:
  • Oracle Database I/O calibration feature - Some people  knows it as DBMS_RESOURCE_MANAGER.CALIBRATE_IO. Strictly speaking the procedure tests physical IO only. However if you add DBMS_STATS.GATHER_SYSTEM_STATS that collaborates CPU speed with a certain skepticism we can add those tools to this category of testing tools. As both procedures works within an Oracle Instance. The biggest disadvantage here is the lack of details on how the tools run tests. Those are limited from configuration perspective too. As for example you don’t have a good control over what data files the IOs are issues against.

Application Level testing tools (DB based)

Before going any further I must mention that this category includes tools that requires more time to setup, test and get comparable results. I have been part of several projects that targeted to mimic application behavior. Depending on a tool, application complexity and results you would want to archive a project from this category may take anything from ~20 hours to several months.

For the interest of keeping this blog post reasonably short I mention some of the tools from this category below with a few comments. If you consider using one of the tools from this category I would encourage you to run a pre-study that can take from several hours to several days itself.

  • Hammerora – this is a free multi databases testing tool. Originally it have been created to run TPC-C and TPCH application workloads. Today it supports application activity replay functionality (for Oracle Databases). I have used this tool to run TPC-C tests in the past.  Typical use case would be to run set of test on two different platforms to get some general ideal on difference in performance characteristics. You may not get as precise results as with previous group of tools. However you may get a reasonably good idea on general performance comparison.
  • Swingbench – from Dominic Giles (Oracle UK). This is a free tool similar to Hammerora that has a set of Supplied Benchmarks you may use to test performance of your Oracle based system. It allows you to build your own basic testing scripts. Originally the tool was developed to test Oracle RAC based systems. Dominic’s presentation gives good overview of the tool. Several folks at Pythian have been using this tool. My team members configured and ran the tests recently to compare source (old Solaris) and target (VM based Linux) system.
  • Simora – from James Morle and Scale Abilities. This tool mines Oracle SQL Trace files and Generates SQL to be executed to reproduce the load. Obviously you can take and replay the application load on a copy of the system where the trace files have been generated. As a good use case you may consider using it for testing application across Database version upgrades. As with Hammerora such testing projects need careful planning (how and when to recover database to get the right data, how to synchronize data updates to make sure that no application constraints are violated etc). I would estimate such testing project in anything starting from a week to 8 weeks depending on the complexity and other parameters.
  • Oracle Real Application Testing –  This is an Enterprise Database option from Oracle that allows to record a load on source system and reply on the destination environment. I have tried this functionality several times in a test environments. It works well. However it RAT based projects have common challenges as other two tools. It worth mentioning that the product isn’t free and needs additional licences. Saying that I have heard about  few clients who successfully used the product in their testing/migration projects.

Application Level testing tools (Apps based)

Testing tools in this category mimics end users’ behavior as users would work with the system from their workstations. That means that tools like HP LoadRunner interact with application servers using variety of protocols, e.g. HTTP, SHTTP, Oracle Forms, etc to test all components of the system at the same time. Needless to say that in order to do it a test scenarios need to be scripted, test data prepared, tested and maintained based on the changes in the application side. I personally was part of such project. The project length was several months. However we have archive a good results and spotted quite several critical inefficiency in custom application code.

Conclusions

As I mentioned at the beginning of this post you need to find the right testing tool for the task you have given.  I hope that this overview help you to get some idea on options available and resources you may lineup for your testing activities. I would appreciate if you share your experience with the any of the tools mentioned or, even better, mention some good tools that I didn’t listed in this blog post in the comments section below.

Yury

View Yury Velikanov's profile on LinkedIn

Managing Oracle on Windows: Where’s my oratab?

$
0
0

If you manage Oracle on Windows, you probably have wondered why it is so difficult to work out what Oracle instances are running and which ORACLE_HOMEs they use. On Unix or Linux, this is a very simple task.  Oracle services and their ORACLE_HOMEs are listed in the oratab file, located in /etc/ on most platforms, and in /var/opt/oracle/ on Solaris.  To find what is running, we would usually use the ‘ps’ command, and pipe it through grep to find and running PMON processes.

On Windows, it just isn’t this easy. Each Oracle instance runs in a single monolithic oracle.exe process. Nothing about the process indicates the name of the instance. When we want to find all of the configured Oracle services, we can use the ‘sc’ command, and pipe the results through find (I have added emphasis to the ASM and database instances:

C:\> sc query state= all | find "SERVICE_NAME" | find "Oracle"
SERVICE_NAME: Oracle Object Service
SERVICE_NAME: OracleASMService+ASM1
SERVICE_NAME: OracleClusterVolumeService
SERVICE_NAME: OracleCRService
SERVICE_NAME: OracleCSService
SERVICE_NAME: OracleDBConsoleorcl1
SERVICE_NAME: OracleEVMService
SERVICE_NAME: OracleJobSchedulerORCL1
SERVICE_NAME: OracleOraAsm11g_homeTNSListener
SERVICE_NAME: OracleProcessManager
SERVICE_NAME: OracleServiceORCL1
SERVICE_NAME: OracleVssWriterORCL1

For any one of these services, you can get the current state with ‘sc query’, and the path of the ORACLE_HOME it is using with ‘sc qc’.

C:\> sc query OracleServiceORCL1

SERVICE_NAME: OracleServiceORCL1
        TYPE               : 10  WIN32_OWN_PROCESS
        STATE              : 4   RUNNING
                                (STOPPABLE, PAUSABLE, ACCEPTS_SHUTDOWN)
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x0

C:\> sc qc OracleServiceORCL1

SERVICE_NAME: OracleServiceORCL1
        TYPE               : 10  WIN32_OWN_PROCESS
        START_TYPE         : 3   DEMAND_START
        ERROR_CONTROL      : 1   NORMAL
        BINARY_PATH_NAME   : c:\oracle\product\11.2.0\db\bin\ORACLE.EXE ORCL1
        LOAD_ORDER_GROUP   :
        TAG                : 0
        DISPLAY_NAME       : OracleServiceORCL1
        DEPENDENCIES       :
        SERVICE_START_NAME : LocalSystem

As you can see, the ORACLE_HOME, and SID are visible on the line labeled ‘BINARY_PATH_NAME’. Once you have this information, you can set your environment accordingly. It might even be worth your time to write a simple script to do this for you. Maybe you could call it ‘oraenv’!

C:\> set ORACLE_SID=ORCL1
C:\> set ORACLE_HOME=c:\oracle\product\11.2.0\db
C:\> set PATH=%ORACLE_HOME%\bin;%PATH%

Why upgrade to Oracle 12c

$
0
0

Oracle 12c: Making the Impossible, Possible?

Pythian’s Oracle team has had the opportunity over this last year to explore, test, and assess every aspect of Oracle’s latest database release. Oracle 12c, and we have to say, it’s very impressive.

Oracle 12c is by far the most important Oracle release in the last 10 years. Its advanced capabilities promote better performance, increased scalability, and easier data management. For the enterprise, this translates into significant cost savings, reduced risk, and increased flexibility.

We put together a list of what we think are the Top 5 Reasons to Upgrade to Oracle 12c.

Do you agree? What would be on your list of reasons to upgrade?


Oracle 12c RAC On your laptop, Step by Step Guide

$
0
0

Ok back to school folks, back to school :) As with any new product our long learning journey starts with installation on a sandbox environment. To get you up to speed with Oracle RAC 12c and let you focus on an important stuff (e.g. features research) I put together a detailed steps by step instillation instruction.

Oracle 12c Grid Infrastructure

You can find it here => Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0

You will need a laptop or a computer with 8GB RAM and 50GB HDD. The instruction is written based on Windows 64 bit and Oracle Virtual Box.

I know for a fact that several Oracle RAC 12c installation guides from other great community contributors are on the way. There are several friends of mine working on 12c RAC posts at the moment. I would like to mention just a few of them and how their guides will be different from mine (based on my expectations):

  1. Tim Hall aka Oracle Base maybe working on 12c RAC guide for Mac OS. If you are a Mac OS user I would advise you to check his web site. If it isn’t there yet then I am sure it will come in the nearest time. I would like to mention that I have used Tim’s 11g RAC implementation guide in my work a lot and adopted it to Windows and Oracle 12c version.
  2. Seth Miller is busy putting things together to famous RAC Attack. This is going to be a very good study guide. Based on previous version it going to be focused on teaching you how to implement, patch, operate, customize etc cluster environments. I strongly suggest you to check the web site and start using RAC Attack guide as soon as it is available.

Alright folks. Enough talking/writing/reading. Back to school! I hope my guide will help you to go through the installation task quickly and let enjoy great learning.

Yury

Right before you go let me ask you to get back to this blog post and share your experience as soon as you complete the installation. I am always looking for your feedback :)

Please spent 2 minutes to complete 3 questions survey.

Twitter: @yvelik

FaceBook: yury.velikanov

LinkedIn: yuryvelikanov

SlideShow: yvelikanov

G+: Yury Velikanov

Blog: Pythian

Amazon Elastic Load Balancer With Oracle E-Business Suite

$
0
0

This blog post outlines a couple of issues we faced in our Oracle eBusiness consulting group when configuring amazon elastic load-balancer with oracle e-business suite 11i. Current IT industry trend is to virtualize all servers, And Oracle E-Business suite servers are no exception. Among current cloud operators, Amazon cloud infrastructure is a favorite destination, especially for Small & Medium Enterprises.

The main advantage of Amazon cloud with Oracle E-Business Suite is, Oracle Licenses being compatible with amazon virtual servers. Just last week there was announcement that Oracle will certify Microsoft Azure Cloud as well. This should bring new options to customers who want to move servers running oracle to cloud. We cannot just virtualize servers running oracle as we wish, because of how oracle licenses its software. Oracle licensing on virtual servers is a big controversial topic. If you want some piece of that pie, just google “oracle licensing vmware”. Lets get back into the technical stuff.

As part of high availability configuration, Most enterprise customers use load balancer from either Cisco or F5 networks with multi middle-tier oracle e-business suite. Amazon loadbalancer offering is called elastic load balancer (ELB). The feature set of ELB, pretty much comparable with enterprise grade loadbalancer from Cisco and F5.

Oracle E-Business Suite 11i has 2 types of traffic between end users and middle tier servers. First, is the web HTTP traffic, which is served by all java based self service applications. Second is the Oracle Forms traffic, which is used by Oracle Forms Java applet when forms server is running in socket mode. Amazon loadbalancer is pretty good at handling the HTTP traffic. It has all required features for oracle e-business suite, like cookie based sticky session with timeout of 10hrs. The catch is  this is only available for HTTP/HTTPS traffic. For oracle forms traffic, we have to use basic TCP loadbalancing option in Amazon ELB. This ELB TCP loadbalancing doesn’t have any of the sticky session features of http traffic loadbancning. This causes strange issues with oracle forms like

FRM-40010: Cannot read from /path/to/appltop/fnd/11.5.0/forms/US/FNDSCSON

This issue stems from TCP loadbalancing not having sticky features, seems to be causing issues to a lot of other software deployments on amazon servers which require a TCP connection to be open all the time. Amazon should add this feature to ELB to make it a complete loadbalancer offering.

On the other hand the fix for the issue is to either convert forms server to servlet mode, which makes the forms traffic piggyback on the Web http traffic, or make oracle form server clients connect directly to the servers, instead of going through loadbalancer.

Another issue to look out for is, timeout at the loadbalancer level. All connections to Amazon ELB timeout after 60 seconds. If you have a webpage or form that takes a while to load, then you need to increase this timeout. This timeout cannot be increased via self service amazon management console. you have to raise a ticket with amazon to increase this timeout for your loadbalancer. Some discussion on this issue is at this amazon forum. The maximum value of the timeout that amazon can increase it to is 17mins. I recommend setting this timeout to 6 mins, as apache automatically times out http connections around 5 mins.

These are my lessons learnt from the rendezvous with Amazon Elastic Load balancer. I will be glad to hear your experiences as well !!!

DUPLICATE DATABASE … FROM ACTIVE DATABASE just works

$
0
0

My friend Øyvind Isene suggested to me that I should store a DUPLICATE RMAN script that works in a safe place in this tweet.


I didn’t find a safer place for the script than the Pythian blog :). Here goes the DUPLICATE DATABASE … FROM ACTIVE DATABASE script that works for me beautifully.

I like the fact that we don’t need to worry about any time consuming tasks anymore. For example you don’t need init.ora parameter on the destination side. Oracle creates it all for us. If you want to change any parameters (e.g. reduce memory footprint) you just specify it within the DUPLICATE command (e.g. set sga_target=4G).

Details

version = 11.2.0.3 on both sides
source db = prod
destination db = test
on the source (prod)
– tns aliases to be created to point to prod and test databases
on the destination (test)
– the same version of Oracle SW installed
– directory structure created
– copy $ORACLE_HOME/dbs/orapwprod (from prod) to $ORACLE_HOME/dbs/orapwtest
– configure static listener configuration (allow to connect as sysdba from prod)
– start an empty test instance “export ORACLE_HOME=….; export ORACLE_SID=….; sqlplus => statup nomount;”
notes:
- no SPFILE is needed. It will be taken case of while running DUPLICATE
- prod db files located under /u01/oradata/prod
- test db files to be located under /u02/oradata/test
- it doesn’t matter where from you execute the command (prod or test)

Script

cat run_active_duplicate_prod_test_01.sh
. prod.env
echo $ORACLE_HOME $ORACLE_SID $TNS_ADMIN
export NLS_DATE_FORMAT="YYYY/MM/DD HH24:MI:SS"
rman TARGET sys/password@prod AUXILIARY sys/password@test
DUPLICATE DATABASE TO test
FROM ACTIVE DATABASE
SPFILE parameter_value_convert 'prod','test'
set db_file_name_convert='u01/oradata/prod','u02/oradata/test'
set log_file_name_convert='u01/oradata/prod','u02/oradata/test'
set control_files='/u02/oradata/test/cntrl01.dbf';
EOF

Kick off

nohup ./run_active_duplicate_prod_test_01.sh 2>&1 \
1>./run_active_duplicate_prod_test_01.`date +%Y%m%d_%H%M%S`.log &
ls -lptr ./run_active_duplicate_prod_test_01.`date +%Y%m%d_%H%M%S`.log

Verification


tail -f ./run_active_duplicate_prod_test_01.*.log
...
contents of Memory Script:
{
Alter clone database open resetlogs;
}
executing Memory Script

database opened
Finished Duplicate Db at 2013/07/01 05:42:30

RMAN

Recovery Manager complete.

SQL for Pattern Matching in Oracle 12c

$
0
0

Oracle 12c is out!
And with it, a myriad of new features that we’ll be learning and playing with in the months and years to come. Paraphrasing Iggy Fernandez’s blog: “So many Oracle manuals, so little time…” The new features abound and we need to cherry pick some interesting ones to delve into.

There are the basic ones, like installing the new database software, which Yury Velikanov promptly taught us how do it just one day after the 12c release. And there are others, buried in the manuals, that may go unnoticed by the majority for some time. They may not be essential for a DBA or developer, but sometimes they can save us a great deal of work and time.

One of these “buried” features of Oracle 12c that caught my interest was SQL for Pattern Matching. It’s a extension to the syntax of the SELECT statement, using the MATCH_RECOGNIZE clause, that allows us to to identify patters across sequences of rows.

It’s just another way of doing the same things

In SQL, there are always many different ways to solve a single problem. MATCH_RECOGNIZE just adds to the set of possible solutions. However, when the problem at hand is to detect patterns in sequences of rows, it may simplify the job immensely and save us a lot of time and lines of code. That doesn’t mean, though, that it wasn’t possible to do pattern matching before; it probably just required more work and convoluted queries.

So, let’s look at one example. I’ll look for patterns in the archived log history of a database rather than venturing through stock market tickers, or other fields’ data that I’m less familiar with. The idea, though, can be applied to a variety of scenarios.

Let’s say that we want to find out periods of high archived log generation for a database. We want to know in which periods of 24 hours the database generated more than 30gb, for example. An easy way to do this is to use the query below, which all DBAs have certainly written before:

select trunc(completion_time) day, sum(blocks*block_size/1024/1024) mbytes
from v_archived_log
group by trunc(completion_time)
having sum(blocks*block_size/1024/1024) > 30000;

Which returns:

DAY         MBYTES
--------- --------
03-JUL-13   31,666
04-JUL-13   30,078

The query above shows all the days when the database generated more than 30gb of archived logs. There’s a problem, though. What if, for example, in a particular occasion the database generated 40gb of archived logs between noon the previous day and noon the current day, being quiet the rest of the time?

This query could miss this fact since there would be only 20gb of archived logs each day.

Using MATCH_RECOGNIZE

To write a query that considers any period of 24 hours, rather than calendar days, is more complicated. Using MATCH_RECOGNIZE we can solve that with query below:

SELECT *
FROM v_archived_log MATCH_RECOGNIZE (
     ORDER BY completion_time
     MEASURES to_char(FIRST (A.completion_time),'yyyy-mm-dd hh24:mi:ss') AS start_time,
              to_char(LAST (A.completion_time),'yyyy-mm-dd hh24:mi:ss') AS end_time,
              sum(A.blocks*A.block_size/1024/1024) as mbytes
     ONE ROW PER MATCH
     AFTER MATCH SKIP PAST LAST ROW
     PATTERN (Y+ Z)
       SUBSET A = (Y, Z)
     DEFINE
        Y AS (Y.completion_time - FIRST(Y.completion_time)) <= 1,
        Z AS (Z.completion_time - FIRST(Y.completion_time)) <= 1 and sum(A.blocks*A.block_size)/1024/1024 >= 30000
);

With the query above, we now can see that there were more periods of high activity than we thought before:

START_TIME          END_TIME              MBYTES
------------------- ------------------- --------
2013-06-19 05:00:06 2013-06-20 05:00:01   30,424
2013-06-20 05:12:25 2013-06-21 05:08:07   30,163
2013-06-25 05:04:12 2013-06-26 05:04:04   30,529
2013-07-02 18:51:42 2013-07-03 18:45:55   30,221
2013-07-03 18:52:42 2013-07-04 18:43:14   31,352

No way! It’s too complicated

It does looks scary at first sight! But, if we break it down into smaller pieces, it gets easier to understand.

MATCH_RECOGNIZE allows us to search for a pattern in a sequence of rows. It’s important that the sequence of rows be ordered, otherwise the results wouldn’t be deterministic. The order of the rows is specified by the ORDER BY clause within MATCH_RECOGNIZE.

In this example, the pattern that we want to find is “a sequence of rows spanning a period of not more than 24 hours for which the sum of the archived log sizes is greater than 30gb“. It is specified as a regular-expression in the PATTERN clause as: “Y+ Z“, which means “one or more rows that match the Y condition, followed by one row that matches the Z condition”, where Y and Z are the conditions specified in the DEFINE clause.

The condition Y specifies that the timestamps of any rows labelled as Y must not be more than 24 hours apart. There’s no restriction on the maximum number of Y rows, just that they all be within a 24-hour period.

The only row that matches the Z condition must also be within 24 hours from the first occurrence of Y and must be such that the sum of the archived log volumes of all Y‘s and Z‘s are greater than 30gb. The SUBSET clause defines the set A as the union of all Y‘s and Z‘s, to simplify the calculation of the total volume.

For each match found in the sequence of rows, MATCH_RECOGNIZE can return either all rows within the match, i.e. all rows that matched the Y and Z conditions, or a single row summarizing it. In this case since we just want the sum of all the archived log volumes, I specified that I wanted only ONE ROW PER MATCH. I also told MATCH_RECOGNIZE to resume the search after each match by starting from the first row after the match (AFTER MATCH SKIP clause, with the PAST LAST ROW option). This clause also permits resuming from the next row or from a specific occurrence of a pattern variable (conditions defined above).

Finally, in the MEASURES clause we define the value that will be calculated for each match found. Here we use the FIRST and LAST navigation functions to retrieve the start and end dates for each match and calculate the sum of the archived logs volume.

The uses for MATCH_RECOGNIZE are many, from finding patterns in audit trails and logs to analyzing stock market variations. This is one more nice tool for the utility belt of DBAs and developers a the fact that it uses the power of regular-expressions for specifying the patterns is a great touch and gives it a great potential.

Do you see potential interesting uses of this new feature? I’d love to hear them!

Read more…

If you’re interested, I’d suggest reading the manual to learn all the available options. Learning and mastering regular-expressions is also paramount and it’s a skill that has many other applications:

SQL for Pattern Matching (Database Data Warehousing Guide 12c Release 1)
Regular Expressions

EM CLI with scripting option in EM12cR3

$
0
0

New release of #EM12c came with advanced EM CLI providing interactive and script modes to enhance standard command line functionality. EM CLI includes Jython interpreter and all verbs presented as functions allowing to use verb arguments as parameters.

Managing several thousands targets in one of Pythian clients environment and having different scripts and user-defined metrics to trace specific conditions I was excited when I saw that functionality since it would allow not only to join all checks together but also add more functionality and make it flexible for changes. It is definitely additional weigthy argument to migrate from EM 11g to 12c.

I installed the new release of OEM and using examples adjusted to my environment installed advanced kit:


[oracle@em12 ~]$ echo $JAVA_HOME

/u02/app/oracle/product/em12c3/jdk16/jdk
 [oracle@em12 ~]$ java -version
 java version "1.6.0_43"
 Java(TM) SE Runtime Environment (build 1.6.0_43-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)
 [oracle@em12 emcli]$ pwd
 /u02/app/oracle/product/em12c3/oms/emcli
 [oracle@em12 emcli]$ java -jar /u02/app/oracle/product/em12c3/oms/sysman/jlib/emcliadvancedkit.jar client -install_dir=/u02/app/oracle/product/em12c3/oms/emcli
 Oracle Enterprise Manager 12c Release 3.
 Copyright (c) 2012, 2013 Oracle Corporation.  All rights reserved.

EM CLI Advanced install completed successfully.

Execution of first commands using Jython language in EM CLI interactive mode went well:


emcli>print 'Hello EMCLI'
Hello EMCLI
emcli>version()
Oracle Enterprise Manager 12c EMCLI Version 12.1.0.3.0

However surprisingly enough I was not able to get output when executed file emcli_json_processing.py created by pasting contents. Finally when I looked through several books about Jython (thank you Pythian for providing subscription to Safari Books Online) I got an idea about the language and that formatting is very important in Python and prepared the file with proper formatting options. Going further through Python I was able to do simple manipulation with file, sent e-mail and using EM CLI functions was able to retrieve information from OEM repository database.

I put all those pieces together and created a script that based on a text file (which defines relations between targets and groups) checks targets belonging and if a target is not in a group it sends e-mail. The functionality is simple but it is a start point for further complex and so necessary checks:

[oracle@em12 emcli]$ cat vals.txt
d1:PROD
d12:TEST
[oracle@em12 emcli]$ cat f.py
import smtplib
from email.mime.text import MIMEText
me = 'oracle@em12.home'
you = 'user@site.com'

from emcli import *
def format(str):
    if str is None:
        return ""
    return str

def get_targets_in_group(target_name, group_name):
    l_sql = "select count(*) cnt from MGMT$TARGET_MEMBERS where AGGREGATE_TARGET_NAME = '" + group_name + "' and MEMBER_TARGET_NAME = '" + target_name + "'"
    obj = list(sql=l_sql)
    return obj

def check_targets(file_name):
    for l in open(file_name, 'r').readlines():
        s = l.split('n')
        s = s[0].split(':')
        r = get_targets_in_group(s[0], s[1])
        for o in r.out()['data']:
            cnt = o['CNT']
            if cnt == '0':
                msg = MIMEText('Please check target ' + s[0] + ' in group ' + s[1])
                msg['Subject'] = 'Target ' + s[0] + ' not in a group ' + s[1]
                msg['From'] = me
                msg['To'] = you
                s = smtplib.SMTP('localhost')
                s.sendmail(me, [you], msg.as_string())

set_client_property('EMCLI_OMS_URL','https://em12.home:7799/em')
set_client_property('EMCLI_TRUSTALL','true')
login(username='sysman',password='sysman_pwd')
check_targets('vals.txt')

Have a good day and Happy EM CLI and Jython scripting!

Connection resets when importing from Oracle with Sqoop

$
0
0

I’ve been using Sqoop to load data into HDFS from Oracle. I’m using version 1.4.3 of Sqoop, running on a Linux machine and using the Oracle JDBC driver with JDK 1.6.

I was getting intermittent connection resets when trying to import data. After much troubleshooting, I eventually found the problem to be related to a known issue with the JDBC driver and found a way to work around it, which is described in the post

The problem

I noticed that when I was importing data at times where the machine I was running the sqoop client at was mostly idle, everything would run just fine. However, at times when others started to work on the same machine and it became a bit busier, I would start to get the errors below intermittently:

[araujo@client ~]$ sqoop import --connect jdbc:oracle:thin:user/pwd@host/orcl -m 1 --query 'select 1 from dual where $CONDITIONS' --target-dir test
13/07/12 09:35:39 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/12 09:35:39 INFO tool.CodeGenTool: Beginning code generation
13/07/12 09:37:53
ERROR manager.SqlManager: Error executing statement: java.sql.SQLRecoverableException: IO Error: Connection reset
	at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:467)
	at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:546)
        ...
Caused by: java.net.SocketException: Connection reset
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
	... 24 more

After some troubleshooting and googling, I found that the problem seemed to be related to the issue described in the following articles:

http://stackoverflow.com/questions/2327220/oracle-jdbc-intermittent-connection-issue/
https://forums.oracle.com/message/3701989/

Confirming the problem

To ensure the problem was the same as the one described in the articles, and not something else intrinsic to Sqoop, I created a small Java program that simply connected to the database. I was able to reproduce the issue using it:

[araujo@client TestConn]$ time java TestConn
Exception in thread "main" java.sql.SQLRecoverableException: IO Error: Connection reset
...
Caused by: java.net.SocketException: Connection reset
...
	... 8 more

real    1m20.481s
user	0m0.491s
sys	0m0.051s

The workaround suggested in the articles also worked:

[araujo@client TestConn]$ time java -Djava.security.egd=file:/dev/../dev/urandom TestConn
Connection successful!

real    0m0.419s
user	0m0.498s
sys	0m0.036s

Applying the fix to Sqoop

It took me a while to figure out how to use the workaround above with Sqoop. Many tentatives to specify the parameter in the Sqoop command line, in many different forms, didn’t work as expected.

The articles mention that the java.security.egd parameter can be centrally set in the $JAVA_HOME/jre/lib/security/java.security file. Unfortunately, this didn’t work for me. Using strace, I confirmed that Sqoop was actually reading the java.security file but the setting just didn’t take effect. I couldn’t figure out why not and eventually gave up that alternative.

After a bit more of stracing and troubleshooting, though, I finally figured a way out.

Sqoop seems to use the JDBC driver in two different ways:

  • First, it connects to the Oracle database directly. It does that to gather more information about the tables (or query) from where the data is extracted and generate the map reduce job that it will run.
  • Second, the map reduce job generated by Sqoop uses the JDBC driver to connect to the database and perform the actual data import.

I was hitting the problem in the first case above, but I believe in both cases there’s a potential for the problem to occur. So, ideally, we should apply the workaround to both cases.

The Sqoop documentation clearly gives us an option to address the second case: using the following parameter to Sqoop allows us to pass Java command line options to the map reduce job:

sqoop import -D mapred.child.java.opts="\-Djava.security.egd=file:/dev/../dev/urandom" ...

Even though I couldn’t fully prove the above, since I couldn’t consistently reproduce the problem for the map reduce tasks, I believe (and hope) it should work well.

The Sqoop direct connection to Oracle

The problem with the direct connection from Sqoop to Oracle, though, wasn’t resolved by that option. Trying to pass the “-Djava.security.egd=file:/dev/../dev/urandom” option directly to Sqoop didn’t work either.

After digging up a bit I found that the sqoop command eventually calls ${HADOOP_COMMON_HOME}/bin/hadoop to execute the org.apache.sqoop.Sqoop class. Since the hadoop executable is used, it accepts Java command line options through the HADOOP_OPTS environment variable.

A quick test confirmed that the case was closed:

[araujo@client STimport]$ export HADOOP_OPTS=-Djava.security.egd=file:/dev/../dev/urandom
[araujo@client STimport]$ sqoop import -D mapred.child.java.opts="\-Djava.security.egd=file:/dev/../dev/urandom" --connect jdbc:oracle:thin:user/pwd@host/orcl -m 1 --query 'select 1 from dual where $CONDITIONS' --target-dir test 
13/07/12 10:08:17 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/12 10:08:17 INFO tool.CodeGenTool: Beginning code generation
13/07/12 10:08:18 INFO manager.OracleManager: Time zone has been set to GMT
13/07/12 10:08:18 INFO manager.SqlManager: Executing SQL statement: select 1 from dual where  (1 = 0) 
13/07/12 10:08:18 INFO manager.SqlManager: Executing SQL statement: select 1 from dual where  (1 = 0) 
13/07/12 10:08:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/bin/../lib/hadoop-0.20-mapreduce
13/07/12 10:08:18 INFO orm.CompilationManager: Found hadoop core jar at: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/bin/../lib/hadoop-0.20-mapreduce/hadoop-core.jar
Note: /tmp/sqoop-araujo/compile/02ed1ccf04debf4769910b93ca67d2ba/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/07/12 10:08:19 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-araujo/compile/02ed1ccf04debf4769910b93ca67d2ba/QueryResult.jar
13/07/12 10:08:19 INFO mapreduce.ImportJobBase: Beginning query import.
13/07/12 10:08:19 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 10:08:21 INFO mapred.JobClient: Running job: job_201306141710_0075
13/07/12 10:08:22 INFO mapred.JobClient:  map 0% reduce 0%
13/07/12 10:08:31 INFO mapred.JobClient:  map 100% reduce 0%
13/07/12 10:08:33 INFO mapred.JobClient: Job complete: job_201306141710_0075
13/07/12 10:08:33 INFO mapred.JobClient: Counters: 23
13/07/12 10:08:33 INFO mapred.JobClient:   File System Counters
13/07/12 10:08:33 INFO mapred.JobClient:     FILE: Number of bytes read=0
13/07/12 10:08:33 INFO mapred.JobClient:     FILE: Number of bytes written=179438
13/07/12 10:08:33 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/12 10:08:33 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/12 10:08:33 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/12 10:08:33 INFO mapred.JobClient:     HDFS: Number of bytes read=87
13/07/12 10:08:33 INFO mapred.JobClient:     HDFS: Number of bytes written=2
13/07/12 10:08:33 INFO mapred.JobClient:     HDFS: Number of read operations=1
13/07/12 10:08:33 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/12 10:08:33 INFO mapred.JobClient:     HDFS: Number of write operations=1
13/07/12 10:08:33 INFO mapred.JobClient:   Job Counters 
13/07/12 10:08:33 INFO mapred.JobClient:     Launched map tasks=1
13/07/12 10:08:33 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=7182
13/07/12 10:08:33 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
13/07/12 10:08:33 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 10:08:33 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 10:08:33 INFO mapred.JobClient:   Map-Reduce Framework
13/07/12 10:08:33 INFO mapred.JobClient:     Map input records=1
13/07/12 10:08:33 INFO mapred.JobClient:     Map output records=1
13/07/12 10:08:33 INFO mapred.JobClient:     Input split bytes=87
13/07/12 10:08:33 INFO mapred.JobClient:     Spilled Records=0
13/07/12 10:08:33 INFO mapred.JobClient:     CPU time spent (ms)=940
13/07/12 10:08:33 INFO mapred.JobClient:     Physical memory (bytes) snapshot=236580864
13/07/12 10:08:33 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=34998603776
13/07/12 10:08:33 INFO mapred.JobClient:     Total committed heap usage (bytes)=1013252096
13/07/12 10:08:33 INFO mapreduce.ImportJobBase: Transferred 2 bytes in 14.4198 seconds (0.1387 bytes/sec)
13/07/12 10:08:33 INFO mapreduce.ImportJobBase: Retrieved 1 records.

OEM notification on generated trace files

$
0
0

There is predefined metric “Dump Area Used (%)” that can monitor space consumption for dump destination of oracle database however it just triggers on percentage occupied and if there are several databases on the host using the same file system for trace files destination the metric notifies DBA for all databases at once.
To have more clarity over the notification I decided to create user-defined metric to gather information about generated trace files to have that information in the OEM repository database for further reporting on accumulated data. I only took 11g databases since information from ADR can be read through internal tables in a database and for versions below it will be required to build additional procedures to access trace files.
I checked Oracle 11g documentation but there were no even v$diag… views described, however search in 11.2.0.3 database showed 88 of them. Going through them I found that I was looking for – V$DIAG_DIR_EXT (about V$DIAG_INFO I knew already). Finally I built SQL to calculate number of newly created trace files for last hour and put it as user-defined metric to OEM:

select key_value, value from (
with sql_trc as (
/*+ all trace files for last 2 hours*/
select /*+ MATERIALIZE */PHYSICAL_PATH, CREATION_TIMESTAMP, PHYSICAL_FILENAME FROM V$DIAG_DIR_EXT
where PHYSICAL_FILENAME like '%trc'
and CREATION_TIMESTAMP > trunc(sysdate, 'HH') - interval '2' hour
),
/*+ only trace files related to rdbms */
sql_all as (
select distinct CREATION_TIMESTAMP, PHYSICAL_FILENAME from sql_trc
where PHYSICAL_PATH = (
select value from v$diag_info where name = 'Diag Trace')
),
/*+ trace files created within 1 hour ago */
sql_new1 as (
select count(*) cnt1 from sql_all
where CREATION_TIMESTAMP between trunc(sysdate, 'HH') - 1/24 and trunc(sysdate, 'HH') - 1/86400
),
/*+ trace files created within 2 hours ago */
sql_new2 as (
select count(*) cnt2 from sql_all
where CREATION_TIMESTAMP between trunc(sysdate, 'HH') - 2/24 and trunc(sysdate, 'HH') - 1 - 1/86400
)
/*+ count of files */
select 'CNT' key_value, cnt1 value from sql_new1
union all
/*+ percentage of growth */
select 'PCT', round(100*(cnt1/decode(cnt2, 0, 1, cnt2)), 2) from sql_new1, sql_new2
)

While I was creating the SQL I got stuck with one problem – if value for physical path was taken from v$diag_info in subquery it returned no rows

SQL> select distinct  PHYSICAL_PATH  from V$DIAG_DIR_EXT
where PHYSICAL_FILENAME like 'TEST_ora_25089.trc'
and PHYSICAL_PATH = '/u01/oracle/diag/rdbms/test/TEST/trace';

PHYSICAL_PATH
-----------------------------------------------------------
/u01/oracle/diag/rdbms/test/TEST/trace

SQL> select value from v$diag_info where name = 'Diag Trace';

VALUE
-----------------------------------------------------------
/u01/oracle/diag/rdbms/test/TEST/trace

SQL> select distinct PHYSICAL_PATH from V$DIAG_DIR_EXT
where PHYSICAL_FILENAME like 'TEST_ora_25089.trc' and PHYSICAL_PATH =
(select value from v$diag_info where name = 'Diag Trace')
/

no rows selected

To avoid subquery, I used WITH statement and MATERIALIZE hints to fill internal temporary tables for further resulting output.

When the query to get number of generated trace files was ready, I had to apply a template only to 11g databases building the query to generate emcli verbs for specific databases. The template had the metric which was taken from one of the databases:

select './emcli apply_template -name="UDMs" -targets="'||target_name||':oracle_database" -input_file="FILE1:/home/oracle/udms_creds.txt"',
target_name, db_ver, host_name, dg_stat from (
select target_name, max(db_ver) db_ver, max(host_name) host_name, max(dg_stat) dg_stat from (
select target_name,
(case when property_name = 'DBVersion' then property_value end) db_ver,
(case when property_name = 'MachineName' then property_value end) host_name,
(case when property_name = 'DataGuardStatus' then decode(property_value, ' ', 'Primary', property_value) end) dg_stat
from MGMT$TARGET_PROPERTIES
where target_type = 'oracle_database'
and property_name in ('DBVersion', 'MachineName', 'DataGuardStatus')
)
group by target_name)
where db_ver like '%11%' and dg_stat = 'Primary'
order by target_name, host_name

and file udms_creds.txt had DBSNMP credentials for the metric. After some time having metrics running I got information about trace files growth from OEM repository:

select target_name, tm, cnt, pct from (
select target_name, tm, max(cnt) cnt, max(pct) pct from (
select target_name, to_char(rollup_timestamp, 'DD-MON-YY HH24:MI') tm,
decode(key_value2, 'CNT', to_number(average)) cnt,
decode(key_value2, 'PCT', to_number(average)) pct
from mgmt$metric_hourly
where target_type = 'oracle_database'
and metric_name = 'SQLUDMNUM'
and column_label = 'UDM_trc_files'
and key_value = 'UDM_trc_files'
and rollup_timestamp > sysdate - 1
) group by target_name, tm
) order by target_name, tm

Have a good day and enjoy adding new metrics to OEM repository!


SLOB2 Kick start

$
0
0

Few of my friends asked me to assist them with getting up to speed with SLOB2 testings. I have decided to publish this quick blog post and give people to a few hints.

Additional Scripts

Here comes some of the scripts I have used to simplify SLOB testing management for my testing efforts:

SLOB2 add ons from Yury - the SLOB2.zip file provides set of additional or modified scripts that I used in some of my SLOB tests.
For all the test I am talking about in this blog post I kept slob.conf=> UPDATE_PCT=00 to make those read only (less complexity, there have been enough challenge to interpret results without writes). Please note that those scripts are just small portion of the tests I have been executing. It doesn’t represents all the sets of scripts/test I would recommend to run (it depends on your setup and testing goals). In fact as father of the SLOB utility, Kevin Closson declares SLOB is just a framework for your testing. It doesn’t provide a set of tests that you run to measure your’s system performance. Tests’ configuration is your responsibility and you are the only person who knows most of the details of your setup (hopfuly).

Physical IO testing

– “slob_setup_for_PIO_test_01.txt” modifications of the spfile parameters for PIO testing
– “snap_data.sh” script to collect all the results right after a test run. This script allows us to run set of scripts in a batch mode without stopping for collecting results (see the next script)
– “run_t22.sh” example of a test run script. It runs the tests gradually increasing readers numbers. The idea here is to run the script and collect all results later on after the set of tests is completed
– “slob.conf” WARRING! just an example you need to adjust for your testing needs (as any of the scripts I have provided)
– “run_the_lio_test_01.txt” a command to start the test

Logical IO testing

note: the set of scripts bellow have been adjusted for a single/shared table read-only LIO testing. The size of the table (slob.conf=> SCALE= XXXX ) should be close to size of your max buffer cache you can afford in your systems.
– “run_t23.sh” kick off script with instance restart and cache warm-up code
– “runit.1.sh” – modified runit.sh script to divert all readers to 1 schema/table only
– “slob_1.sql” – modified slob.sql script for 1 schema/table only

Quick tests verification

I would strongly suggest to verify  the AWR reports generated for your tests to make sure that the tests represent the tests you expected to run. Just to give you a quick simple examples:
  • For LIO testing: the starting point could be LIOPS and CPU time in TOP 5 Events section. LIOPS should be ~700-800k per CPU/Process, CPU should be 99% from the TOP Events. If one of the parameters doesn’t reflect expected values than something may be wrong.
  • For PIO “db file sequential read” must be in the range ~98-99.999%. If not you should fix possible issues and rerun the tests

Check cr_tab_and_load.out

One last warning. For some of you it could be obvious but I have missed  it during the first few runs and lost a bit of testing time because of the issue.  Check cr_tab_and_load.out file each time you run ./setup.sh to create new test data. Kevin put a warning in the setup.sh. Do not ignore it!

The ./setup.sh script have the following message:

NOTIFY: ./setup.sh: Loading procedure complete (130 seconds). 

Please check ./cr_tab_and_load.out for any errors

You may not be testing what you think you are.

Yury
View Yury Velikanov's profile on LinkedIn

Do You Know If Your Database Is Slow?

$
0
0

The time to respond

There was a question at Pythian a while ago on how to monitor Oracle database instance performance and alert if there is significant degradation. That got me thinking, while there are different approaches that different DBAs would take to interactively measure current instance performance, here we would need something simple. It would need to give a decisive answer and be able to say that “current performance is not acceptable” or “current performance is within normal (expected) limits”.

Going to the basics of how database performance can be described, we can simply say that database performance is either the response time of the operations the end-user do and/or  the amount of work the database instance does in a certain time period – throughput.

We can easily find these metrics in from the v$sysmetric dynamic view:

SQL> select to_char(begin_time,'hh24:mi') time, round( value * 10, 2) "Response Time (ms)"
     from v$sysmetric
     where metric_name='SQL Service Response Time'

TIME              Response Time (ms)
---------------   ------------------
07:20             .32

So this is the last-minute response time for user calls (here in ms). We can check the throughput by checking the amount of logical blocks (it includes the physical blocks) being read, plus we can add direct reads (last minute and last several seconds output here for a database with 8 KB block):

SQL> select a.begin_time, a.end_time, round(((a.value + b.value)/131072),2) "GB per sec"

from v$sysmetric a, v$sysmetric b

where a.metric_name = 'Logical Reads Per Sec'

and b.metric_name = 'Physical Reads Direct Per Sec'

and a.begin_time = b.begin_time

/

BEGIN_TIME     END_TIME  GB per sec

-------------------- -------------------- ----------

16-jun-2013 08:51:36 16-jun-2013 08:52:37 .01

16-jun-2013 08:52:22 16-jun-2013 08:52:37 .01

We can check more historical values through v$sysmetric_summary, v$sysmetric_history and dba_hist_ssysmetric_summary.

So did these queries answer the basic question “Do we have bad performance?”? 100 MB/sec throughput and 0.32 ms for a user call? We have seen better performance, but is it bad enough that we should alert the on-call DBA to investigate in more detail and look for the reason why we are seeing this kind of values? We cannot say. We need something to compare these values to so that we can determine if they are too low or too high. It is somewhat like being in a train that passes next to another moving train, going in same direction but at a different speed. We don’t know the speed of our train, and we don’t know the speed of the other train, so we cannot answer the question “Are we going very fast?”. If we turn to the other side and see a tree passing on the other side of the train, we will be able to estimate the speed of the train (also taking into account our experience of what is very fast for a train…). So we need something that has an absolute value. In the case of the tree, we know that the tree has speed of 0 (Ok, it is not completely absolute, but we had to simplify now :) ).

So we understand that we need an absolute value or base-line, which we know represents having  “bad”, “normal”, or “good” performance. How do we find these values?

Bad, normal and good

One way to establish these absolutes is to just experiment, establish when the database instance provides acceptable performance by going to the applications that uses the database and checking its response time, or run the queries that the application runs directly and determine if they complete in acceptable time (defined by the business requirements) – when you reach these results, check the database instance response time and current throughput, and carve them in stone as absolutes that can be used to compare future measurements.

The approach above may sometimes work, but when you start measuring response time, you will notice that it might go up and down wildly. You will need to define some bounds around the value you think is a “normal”  response time. So a response time above this bound can be called “bad”, and we can alert that we have performance degradation.

To define this point more accurately, I would suggest using another strategy. We can make an “educated” guess on these values by analyzing them historically from the DBA_HIST_SYSMETRIC_SUMMARY view. We just need to have enough history in there.

We can find the average response time and more importantly the standard deviation of the values – this would tell us what is a “normal” response time and everything above that, a “bad” one:

The graph represents an example of a response time values distribution, while the points A and B represent the standard deviation bounds – bounds where we can say the response time is normal. Here is an example how we can determine the A and B points i.e. “normal” boundaries:

SQL> with epsilon
as
(select avg(average - STANDARD_DEVIATION ) m1,
        avg(average +  STANDARD_DEVIATION ) m2
from dba_hist_sysmetric_summary
where metric_name='User Calls Per Sec')
select avg(a.average -  a.STANDARD_DEVIATION) "A - Good",
       avg(a.average) "Average",
       avg(a.average + a.STANDARD_DEVIATION)  "B - Bad"
from dba_hist_sysmetric_summary a,
dba_hist_sysmetric_summary b,
epsilon e
where a.metric_name='SQL Service Response Time'
and b.metric_name='User Calls Per Sec'
and a.snap_id = b.snap_id
and b.average between e.m1 and e.m2
/

A - Good    Average    B - Bad</p>
----------  ---------- ----------
.026797584  .04644541  .066093237

Please note the subquery called epsilon. I have used it here to limit the history from which we are learning to a subset of AWR snapshots where there was more meaningful work done on the database. It does not take into account times of very low activity and times of very high (abnormally) high activity, which don’t necessarily show a representative load from which we can extract our “normal” response time behavior.

So now when we check the current response time:


SQL> select to_char(begin_time,'hh24:mi') time,  value "Response Time"
from v$sysmetric
where metric_name='SQL Service Response Time'
/

TIME       Response Time
---------- -------------
02:23      .036560192

If it goes above point B (over .066093237), we might have a reason for concern.

Throughput

But what about determining if we have a normal or bad throughout? For some applications this might be a more useful metric to determine current performance. So we can use the same method above, but just change the metric we are monitoring to Physical Reads Direct Per Sec and Logical Reads Per Sec.

Specific Response Time

When looking into response time and throughput, we see that they are actually dependent on each other. Increased response time will lead to decreased throughput and increased throughput might eventually lead to increased response time due to the system resources (CPUs, I/O subsystems…) becoming saturated and ultimately overloaded.

So I was thinking that we could not just compare response time at one point to another without taking into account both of these metrics at the same time. We could use a new, so called “specific response time” per 1 GB/sec throughput. I calculated it like this:

sRT = Response Time (in ms) / Throughput (in GB/sec)

So we can calculate baseline points A and B (for an 8 KB block database):

SQL> with epsilon
as
(select avg(average - STANDARD_DEVIATION ) m1,
avg(average +  STANDARD_DEVIATION ) m2
from dba_hist_sysmetric_summary
 where metric_name='User Calls Per Sec')
select avg( ((a.average-a.standard_deviation)*10)
/
(((c.average-c.standard_deviation)  + (d.average-d.standard_deviation))/131072)) A
,
avg( (a.average*10)
/
((c.average + d.average)/131072)) "Average"
,
avg( ((a.average+a.standard_deviation)*10)
/
(((c.average+c.standard_deviation)  + (d.average+d.standard_deviation))/131072)) B
from dba_hist_sysmetric_summary a,
 dba_hist_sysmetric_summary b,
 dba_hist_sysmetric_summary c,
 dba_hist_sysmetric_summary d,
 epsilon e
where a.metric_name='SQL Service Response Time'
and b.metric_name='User Calls Per Sec'
and c.metric_name='Logical Reads Per Sec'
and d.metric_name='Physical Reads Direct Per Sec'
and a.snap_id = b.snap_id
and a.snap_id = c.snap_id
and a.snap_id = d.snap_id
and b.average between e.m1 and e.m2
order by 1
/

A          Average    B
---------- ---------- ----------
.066348184 .095471353 .116012419

Trend

Since these are moving window baselines (meaning they will change as time goes by), it is a good idea to compare them to each other periodically. This process will show the trend in the database usage and performance. As I’ve said before, to count for a possible increase in demand put on the database, we can use the specific response time to monitor the trend. From the graph below, we can see the trend line in a spec. response time vs. time graph (I used Excel to draw the graph and draw the trendline):

There is one more thing: Database Efficiency

There is one more  thing we need  to ask when monitoring performance:

“Can we make the database run faster on the same hardware?” Or it can be translated to: “What percentage of the hardware are we using directly towards executing user calls”? If we say that the database server machine is actually just the CPU(s) and the RAM memory and we want to use these components as much as possible towards end-user calls to minimize time spent on disk, network, SSD, and most importantly wasted end-user time (such as sleeping while waiting for a latch, lock, or a mutex to be free), we can translate it once more to the DBA language like ‘Percentage of DB time spent on CPU”. DB time, as we know, is the sum of all the time end-user sessions (foreground processes) were in active state. If the process is “on CPU”, it should mean that it is actively using the, as I would call it, “primary hardware”, being the CPU and RAM. In other words, it is getting the most out of the hardware on which we are using the database instance. A latch spinning is certainly a CPU operation, but it is reported as wait time in the database and not getting in the CPU time metric. So, we can say that if more of the DB time is spent on CPU, the DB instance is more efficient. Of course, we need to also consider CPU load as well. If it goes too high, it means that we have reached the hardware limits.

There is a metric for this that we can monitor, but as with the (specific) response time, we need to establish what will be called good, normal and bad efficiency. Inspired by the energy efficiency ranking graphic with colors, which we can see on different electric appliances and for cars as well, we can also rank database instance effeciency in a similar way:

Again, we can establish some values (for example, from the standard ranking for efficiency as shown in the image above, and go by these values). Or, we can create moving baselines as previously from the history of the particular DB instance usage by using the query (though with not that much ranks):

with epsilon
as
(select avg(average - STANDARD_DEVIATION ) m1,
avg(average +  STANDARD_DEVIATION ) m2
from dba_hist_sysmetric_summary
where metric_name='User Calls Per Sec')
select avg(round(a.average + a.STANDARD_DEVIATION)) + stddev(round(a.average + a.STANDARD_DEVIATION)) A,
avg(round(a.average + (a.STANDARD_DEVIATION/2))) + stddev(round(a.average + (a.STANDARD_DEVIATION/2))) B,
avg(round(a.average)) C,
avg(round(a.average - (a.STANDARD_DEVIATION/2))) - stddev(round(a.average - (a.STANDARD_DEVIATION/2))) D,
avg(round(a.average - a.STANDARD_DEVIATION)) - stddev(round(a.average - a.STANDARD_DEVIATION)) E
from dba_hist_sysmetric_summary a,
dba_hist_sysmetric_summary b,
epsilon e
where a.metric_name='Database CPU Time Ratio'
and b.metric_name='User Calls Per Sec'
and a.snap_id = b.snap_id
and b.average between e.m1 and e.m2
/

A          B          C          D        E
---------- ---------- ---------- ---------- ----------
73.2758612  68.301602 55.1180124 40.8703584 36.8510341

So the C value is just the average CPU % in the DB time metric we have managed to have until now. We can consider having a “normal” efficiency if the current value is between the points B and D. Here are these points as represented in a distribution graph:

You may notice that I have increased the size of the region with normal DB efficency (from B to D) by taking the outer bounds of the subset. The is the average of the AWR snapshot (30 minutes here) and I add/subtract this standard deviation value, but then I average over all AWR snapshots, and I add/subtract the standard deviation of this range as well.

avg ( avg_per_AWR_snapshot +/- standard_devaition ) +/- stddev(  avg_per_AWR_snapshot +/- standard_devaition )

I am looking to get a bigger range of values in which I will put the values that I consider to be OK (normal) and won’t alert so often for short transient efficiency degradation.

Why You Should Combine Your OEM Repository and SQL Developer

$
0
0

Why do we love data? Because it brings us INFORMATION, a central element in today’s computer-driven world. One of the countless sources of data is an OEM repository. Gathered metrics placed together for monitored targets can become priceless in a DBA’s hands. Moreover, that information can be presented to other technical and business groups for a wide analysis.

Thanks to SQL Developer, converting DATA to INFORMATION is easy nowadays. The tool is updated with new features and enhances existing ones like reporting. The recently issued SQL Developer 4.0 Early Adopter 1 has new types of graphs and an improved reporting drill down feature, not to mention the big list of other features described by Jeff Smith and on the download page.

Even though you can do reporting in OEM itself and develop your own application to view repository data with Oracle APEX, developing reports to present OEM repository data is much easier in SQL Developer. Moreover, these reports can be shared between DBAs and used with different OEM repository databases without any installation in a database. SQL Developer can even export reports to dynamic HTML and PDFs so you can easily send them to others.

Furthermore, command line tool, a feature included in the latest version, can automate the report execution. DBAs can then have many reports ready for analysis in the morning and know how databases behave overnight.

To help manage several hundred databases, I have built many reports using binds and drill down features, as well as the new “Combination” charts. Here is a video in which I “drill” down from groups to particular metrics gathered for targets in the OEM repository, and a zipped XML (oem_analysis) for the reports attached to the blog post.

I have also created a report (load_activity) based on the metric ‘Wait Bottlenecks’ and the cpu_count of Oracle Database, which shows the top loaded database in the last day. Reviewing it periodically can help identify issues and see load patterns for supported databases.

load_pic2

And these are only a few things you can do when combining SQL Developer’s functionality and data from an OEM repository.

You will LOVE it!

Have a good day!

There’s Always Another Bug Hiding Just Around the Corner

$
0
0

We were using a 10.2.0.3 database, and it had been running without any issues for several years. What could possibly go wrong? Anything! Suddenly, we started getting “ORA-07445: exception encountered: core dump [qercoStart()+156] [SIGSEGV] [Address not mapped to object] ” a few times a minute in the alert log. A closer investigation revealed that one of the popular SQLs in the application couldn’t complete anymore. It looked like a bug, since only the SQL was failing.

We found a few references for various releases with the same conditions: ORA-07445 + qercoStart(). This list summarizes the possible causes for the error I found on My Oracle Support:

  • Using ROWNUM < x condition in the where clause
  • Using ROWNUM condition and FULL OUTER joins
  • Using ROWNUM condition with UNION ALL set operation

The strange thing was that this started suddenly; no changes to the code were made. Moreover, the SQL didn’t contain FULL OUTER join operations or UNION ALL  set operations:

SELECT CBMD.CBMD_BASE_MDL_NUMBER,
 MFG_GROUP MFG_ID,
 MFG_NAME,
 CBMD_CATALOG_MODEL_NUMBER,
 CBMD_CATALOG_MODEL_SHORT_DESC,
 CBMD_IMAGE_PATH,
 SUM (CSMD_AVAILABLE_QTY) QTY
FROM CATALOG_BASE_MODEL_DATA CBMD,
 CATALOG_BASE_MODEL_CATEGORY CBMC,
 CATALOG_SUB_MODEL_DATA CSMD
WHERE CBMD.CBMD_BASE_MDL_NUMBER = CSMD.CBMD_BASE_MDL_NUMBER
 AND CBMD.CBMD_BASE_MDL_NUMBER = CBMC.CBMD_BASE_MDL_NUMBER
 AND (
 (CSMD.CSMD_AVAILABLE_FOR_SALE_FLAG = 'N'
 AND CBMC.DC_DIVISION_CODE = 1)
 OR (CBMC.DC_DIVISION_CODE = 2)
 )
 AND CBMD_PUT_ON_WEB_FLAG = 'Y'
 AND CSMD_AVAIL_FOR_WEB_DISP_FLAG = 'Y'
 AND CBMC.DC_DIVISION_CODE = :B1
 AND ROWNUM < 5
GROUP BY CBMD.CBMD_BASE_MDL_NUMBER,
 MFG_GROUP,
 MFG_NAME,
 CBMD_CATALOG_MODEL_NUMBER,
 CBMD_CATALOG_MODEL_SHORT_DESC,
 CBMD_IMAGE_PATH,
 CBMD_IMAGE_NAME
ORDER BY QTY DESC

We also tried all the possible workarounds listed in the bug descriptions, but nothing helped:

  • Flushed the shared pool
  • setting “_complex_view_merging”=false
  • bouncing the database

As raising a SR for our 10.2.0.3 DB was unlikely to help, I decided to dig deeper. I knew something had changed, and that change was what triggered the bug.  I didn’t know where to start, so I decided to look more closely in the bug descriptions in My Oracle Support. All the bugs listed examples of SQL statements containing “ROWNUM < X” condition. The second similarity was harder to notice. Here are some examples – I’ve highlighted the interesting lines:

  1. from bug 7704557 on 10.2.0.4
    select jsp1.name name , jsp1.value value
    from "SYSJCS".jcs_scheduler_parameters jsp1
    where jsp1.name in ('database_name', 'global_names', 'scheduler_hostname', 'remote_start_port', 'scheduler_connect_string', 'oracle_sid', 'listener_port', 'remote_http_output','remote_http_port')
    and rownum <= 9
    and scheduler_name = nvl (:scheduler, scheduler_name);
    
  2. from bug 7528596 on 10.2.0.3
    SELECT /*+ FIRST_ROWS(200) */ rv.STATUS_NAME H_STATUS_ID
    ,rv.DESCRIPTION H_DESCRIPTION  ,rv.PRIORITY_MEANING H_PRIORITY_CODE
    ,rv.CREATED_BY_NAME H_CREATED_BY  , rv.CREATED_BY_EMAIL H_CREATED_BY_E ,
    H_CREATED_BY_N  , rv.CREATION_DATE H_CREATION_DATE  ,rv.ASSIGNED_TO_NAME
    H_ASSIGNED_TO_USER_ID  , rv.ASSIGNED_TO_EMAIL H_ASSIGNED_TO_USER_ID_E ,
    rv.ASSIGNED_TO_USERNAME H_ASSIGNED_TO_USER_ID_N  ,rv.REQUEST_ID
    H_REQUEST_ID
    FROM itgadm.kcrt_requests_v rv,
         itgadm.kcrt_req_header_details rh WHERE (1=1
    AND (rv.batch_number =1 OR rv.batch_number is null)
    AND rv.REQUEST_TYPE_ID in (30593)
    AND rv.REQUEST_TYPE_ID in (30593)
    AND ( rv.STATUS_CODE NOT LIKE 'CLOSED%'
    AND rv.STATUS_CODE NOT LIKE 'CANCEL%' )
    AND exists(SELECT /*+ NO_UNNEST */
    pcv.REQUEST_ID FROM itgadm.KCRT_PARTICIPANT_CHECK_V pcv WHERE
    pcv.request_id = rv.request_id and pcv.user_id = 30481)
    AND rh.request_id = rv.request_id )
    AND ( rh.PARAMETER22 = 30481 OR rv.ASSIGNED_TO_USER_ID = 30481 )
    AND ROWNUM <= 200
    ORDER BY rv.REQUEST_ID DESC;
  3. from bug 7416171 on 10.2.0.3
    SELECT COUNT(*) FROM (
      SELECT (
        SELECT DECODE(COUNT(*),0,0,1) isStemData
        FROM ccd2.vw_policy_admins q
        WHERE q.system_id = a.system_id
          AND q.policy_no = a.policy_no
          AND ((q.policy_admin_id in (440502405,440502499)))
          AND rownum < 2)isStemData
      FROM ccd2.vw_ordf_partners_cnt a
      WHERE a.PARTNER_ID = 36977489
      /*AND a.PARTNER_ID2 = a.PARTNER_ID */
      AND a.ORDF_ID = 2
      AND 1=1) b
    WHERE isStemData=1
    AND rownum < 300
  4. from bug 3211315 on 9.2.0.4
    select dummy  from
      (SELECT dummy from dual where rownum < 2)  FULL OUTER JOIN
      (SELECT dummy from dual where rownum < 2)
    using (dummy)

The first 3 SQLs contain “IN” or “OR” operators, and the last one contains a FULL OUTER JOIN set operation that was said to have issues. Knowing a bit of theory helped me identify some similarities:

  • Oracle introduced native FULL OUTER JOIN operation in 10.2.0.5. Before that, it was implemented using the UNION ALL operation. (Cristian Antognini explains it here and gives some examples.)
  • “OR” and “IN” predicates can sometimes be optimized by applying the “OR Expansion” transformation, which acquires the result set of each disjunction condition separately and then combines them using the set operations, i.e. UNION ALL. (Maria Colgan explains it here better than anyone else could.)

At that moment, I started suspecting this could be our case too because the SQL had an “OR” predicate. It was easy to check and confirm by looking at the execution plan. The highlighted line contained the CONCATENATION operation, which is the same as UNION ALL:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------
Plan hash value: 132832423

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                         |   434 | 69874 |    61   (4)| 00:00:01 |
|   1 |  SORT ORDER BY                     |                         |   434 | 69874 |    61   (4)| 00:00:01 |
|   2 |   HASH GROUP BY                    |                         |   434 | 69874 |            |          |
|*  3 |    COUNT STOPKEY                   |                         |       |       |            |          |
|   4 |     CONCATENATION                  |                         |       |       |            |          |
|*  5 |      FILTER                        |                         |       |       |            |          |
|   6 |       NESTED LOOPS                 |                         |     1 |   161 |    30   (0)| 00:00:01 |
|   7 |        NESTED LOOPS                |                         |     7 |  1008 |    23   (0)| 00:00:01 |
|*  8 |         TABLE ACCESS FULL          | CATALOG_SUB_MODEL_DATA  |    21 |   420 |     2   (0)| 00:00:01 |
|*  9 |         TABLE ACCESS BY INDEX ROWID| CATALOG_BASE_MODEL_DATA |     1 |   124 |     1   (0)| 00:00:01 |
|* 10 |          INDEX UNIQUE SCAN         | CBMD_C1_1_PK            |     1 |       |     0   (0)| 00:00:01 |
|* 11 |        INDEX RANGE SCAN            | CBMC_C1_1_PK            |     1 |    17 |     1   (0)| 00:00:01 |
|* 12 |      FILTER                        |                         |       |       |            |          |
|  13 |       NESTED LOOPS                 |                         |     2 |   322 |    29   (0)| 00:00:01 |
|  14 |        NESTED LOOPS                |                         |     7 |  1008 |    22   (0)| 00:00:01 |
|* 15 |         TABLE ACCESS FULL          | CATALOG_SUB_MODEL_DATA  |    20 |   400 |     2   (0)| 00:00:01 |
|* 16 |         TABLE ACCESS BY INDEX ROWID| CATALOG_BASE_MODEL_DATA |     1 |   124 |     1   (0)| 00:00:01 |
|* 17 |          INDEX UNIQUE SCAN         | CBMD_C1_1_PK            |     1 |       |     0   (0)| 00:00:01 |
|* 18 |        INDEX RANGE SCAN            | CBMC_C1_1_PK            |     1 |    17 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

A quick Google search gave me a NO_EXPAND hint, which disables the OR expansion. However, I couldn’t use it since it required a code change. I knew that the behavior of the optimizer was controlled by a large number of hidden parameters that are also listed in the 10053 trace:

SQL> ALTER SESSION SET EVENTS '10053 trace name context forever,level 1';

Session altered.

SQL> alter session set tracefile_identifier=CR758708_2;

Session altered.

SQL> alter session set max_dump_file_size=unlimited;

Session altered.

SQL> explain plan for
 SELECT CBMD.CBMD_BASE_MDL_NUMBER,
 /*removed some lines for readability*/
Explained.

SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

$ more test_ora_17805_CR758708_2.trc
/*removed some lines for readability*/
...
***************************************
PARAMETERS USED BY THE OPTIMIZER
********************************
...
 *************************************
 PARAMETERS WITH DEFAULT VALUES
 ******************************
...
 _fast_full_scan_enabled = true
 _optim_enhance_nnull_detection = true
 _parallel_broadcast_enabled = true
 _px_broadcast_fudge_factor = 100
 _ordered_nested_loop = true
 _no_or_expansion = false
 optimizer_index_cost_adj = 100
 optimizer_index_caching = 0
 _system_index_caching = 0
 _disable_datalayer_sampling = false
...

I disabled the OR expansion by setting the parameter _no_or_expansion = true, checked the execution plan, and confirmed that the query transformation didn’t happen:

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------
Plan hash value: 1045847658

-----------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                         |     4 |   644 |   324   (3)| 00:00:04 |
|   1 |  SORT ORDER BY            |                         |     4 |   644 |   324   (3)| 00:00:04 |
|   2 |   HASH GROUP BY           |                         |     4 |   644 |   324   (3)| 00:00:04 |
|*  3 |    COUNT STOPKEY          |                         |       |       |            |          |
|*  4 |     HASH JOIN             |                         |   434 | 69874 |   322   (2)| 00:00:04 |
|*  5 |      TABLE ACCESS FULL    | CATALOG_SUB_MODEL_DATA  |  3777 | 75540 |    62   (4)| 00:00:01 |
|*  6 |      HASH JOIN            |                         |  3087 |   425K|   260   (2)| 00:00:04 |
|*  7 |       INDEX FAST FULL SCAN| CBMC_C1_1_PK            |  1759 | 29903 |    13   (8)| 00:00:01 |
|*  8 |       TABLE ACCESS FULL   | CATALOG_BASE_MODEL_DATA |  2685 |   325K|   247   (2)| 00:00:03 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(ROWNUM<5)
   4 - access("CBMD"."CBMD_BASE_MDL_NUMBER"="CSMD"."CBMD_BASE_MDL_NUMBER")
       filter("CSMD"."CSMD_AVAILABLE_FOR_SALE_FLAG"='N' AND "CBMC"."DC_DIVISION_CODE"=1 OR
              "CBMC"."DC_DIVISION_CODE"=2)
   5 - filter("CSMD_AVAIL_FOR_WEB_DISP_FLAG"='Y')
   6 - access("CBMD"."CBMD_BASE_MDL_NUMBER"="CBMC"."CBMD_BASE_MDL_NUMBER")
   7 - filter(("CBMC"."DC_DIVISION_CODE"=1 OR "CBMC"."DC_DIVISION_CODE"=2) AND
              "CBMC"."DC_DIVISION_CODE"=TO_NUMBER(:B1))
   8 - filter("CBMD_PUT_ON_WEB_FLAG"='Y')

28 rows selected.

In our case, the optimizer had changed the execution plan after fresh statistics were collected – this was the change that triggered the bug. We set the parameter to disable the OR expansion until we upgrade to 11.2.

I wanted to share this story with you because it’s interesting how different things (IN and OR predicates, UNION ALL and FULL OUTER JOIN set operations, etc.) transform behind the scenes into the same conditions to trigger the same bug. I think this incident has also changed the way I’ll read bug descriptions on My Oracle Support in the future – there is information hidden between the lines.

Oracle Database 12c: Network Recovery in RMAN

$
0
0

Over the past 8 months, I have had the pleasure of working on a soon-to-be-released update for our popular Beginner’s Guide for Oracle Database 12c. The publisher was looking for a “new brand” for this book, which has been published for Oracle 7, Oracle 8, Oracle 8i, Oracle 9, Database 10g, and Database 11g. The works have been translated into 15 languages; I have always wanted to get a translation done back into English and see what it says :).

The new work is entitled Oracle Database 12c: Install, Configure, and Maintain Like a Professional. I have had the pleasure of sharing the author responsibilities with colleagues in and outside of Pythian. Ian Abramson, Michelle Malcher, and Mike Corey are the primary authors alongside yours truly. Michael McKee, Fahd Mirza, and Marc Fielding of Pythian are contributing authors to this latest work.


Many of the new features have fascinated me and my peers; rman has been a friend and good companion since a business case in 2000 provided the opportunity to get “dirty” with this product. (Thanks Steve Jacobs, wherever you are.) I have thirsted in particular for the following handful of rman enhancements that are bundled with Database 12c:

  • Multisection incremental backups where the data files can be broken up into smaller chunks that can be backed up in parallel across multiple channels.
  • Network-enabled restore – copying of one or more database files from a primary to a physical standby or vice-versa over the network. The work can be done using compression and the new multi-section feature.

Not long before intensive work began on the new book, Pythian invested a huge amount of money and time in setting up a Private Cloud environment called Delphic Lab. With its emergence under the auspices of the Office of the CTO and dedicated volunteers, it was bye-bye to local Linux VMs on my two MacBooks for me. I do not miss them in the least.

Prep of VM

The setup to house the databases used to thrash the new features covered in this post is as follows:

O/S

Oracle Linux Server release 6.4 Linux dlabvm46.dlab.pythian.com

Disk space 

Filesystem   1K-blocks    Used Available Use% Mounted on 
/dev/xvda2     9823392 1206832   8118056  13% / 
tmpfs          1005964       0   1005964   0% /dev/shm 
/dev/xvda1      497829   63408    408719  14% /boot 
/dev/xvdb      1113160 7553396  40963364  `6% /u01 
Database creation (dbca.cmd)
/u01/app/oracle/product/12.1.0/db_1/bin/dbca -silent \
-createDatabase \
-templateName General_Purpose.dbc \
-gdbName pythian \
-sid pythian \
-createAsContainerDatabase false \
-SysPassword manager \
-SystemPassword manager \
-emConfiguration NONE \
-datafileDestination /u01/oradata \
-storageType FS \
-characterSet AL32UTF8 \
-memoryPercentage 40 \

When dbca.cmd was run, the output displayed was:

oracle@dlabvm46.dlab.pythian.com --> (pythian)
/home/oracle> ./pythian.cmd
Copying database files
1% complete
2% complete
8% complete
13% complete
19% complete
24% complete
27% complete
Creating and starting Oracle instance
29% complete
32% complete
33% complete
34% complete
38% complete
42% complete
43% complete
45% complete
Completing Database Creation
48% complete
51% complete
53% complete
62% complete
70% complete
72% complete
78% complete
83% complete
100% complete
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/pythian/pythian.log" for further details.

To perform the testing, I put the database in archivelog mode as follows:

oracle@dlabvm46.dlab.pythian.com --> (pythian)
/home/oracle> sqlplus / as sysdba
SQL*Plus: Release 12.1.0.1.0 Production on Sat Aug 3 11:44:08 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
 Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
 With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL> archive log list
Database log mode              No Archive Mode
Automatic archival             Disabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     13
Current log sequence           15
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.
Total System Global Area  822579200 bytes
Fixed Size                  2293736 bytes
Variable Size             595591192 bytes
Database Buffers          218103808 bytes
Redo Buffers                6590464 bytes
Database mounted.
SQL> alter database archivelog;
Database altered.
SQL> alter database open;
Database altered
SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     13
Next log sequence to archive   15
Current log sequence           15

SQL>

The following Oracle*Net files were created:

********************************
* Oracle*Net configuration files
********************************

** on the primary

# listener.ora

LISTENER=
  (DESCRIPTION=
    (ADDRESS_LIST=
      (ADDRESS=(PROTOCOL=tcp)(HOST=dlabvm46)(PORT=1521))
      (ADDRESS=(PROTOCOL=ipc)(KEY=extproc))))

SID_LIST_LISTENER=
  (SID_LIST=
    (SID_DESC=
      (GLOBAL_DBNAME=pythian)
      (ORACLE_HOME=/u01/app/oracle/product/12.1.0/db_1)
      (SID_NAME=pythian)
    )
  )

# tnsnames.ora

PYTHIAN =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS =  
        (PROTOCOL = TCP)
        (HOST = dlabvm46.dlab.pythian.com)    
        (PORT = 1521)
      )
    )
    (CONNECT_DATA =
      (SERVICE_NAME = pythian) 
    )
  )

PYTHIANSB =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS =  
        (PROTOCOL = TCP)
        (HOST = dlabvm48.dlab.pythian.com)    
        (PORT = 1521)
      )
    )
    (CONNECT_DATA =
      (SERVICE_NAME = pythian) 
    )
  )

** on the standby

# listener.ora

LISTENER=
  (DESCRIPTION=
    (ADDRESS_LIST=
      (ADDRESS=(PROTOCOL=tcp)(HOST=dlabvm48)(PORT=1521))
      (ADDRESS=(PROTOCOL=ipc)(KEY=extproc))))

SID_LIST_LISTENER=
  (SID_LIST=
    (SID_DESC=
      (GLOBAL_DBNAME=pythian)
      (ORACLE_HOME=/u01/app/oracle/product/12.1.0/db_1)
      (SID_NAME=pythian)
    )
  )

# tnsnames.ora

PYTHIAN =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS =  
        (PROTOCOL = TCP)
        (HOST = dlabvm48.dlab.pythian.com)    
        (PORT = 1521)
      )
    )
    (CONNECT_DATA =
      (SERVICE_NAME = pythian) 
      (UR = A)
    )
  )

PYTHIANPR =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS =  
        (PROTOCOL = TCP)
        (HOST = dlabvm46.dlab.pythian.com)    
        (PORT = 1521)
      )
    )
    (CONNECT_DATA =
      (SERVICE_NAME = pythian) 
    )
  )

Build the physical standby

Next, the standby was prepared to house the duplicated primary.

** Prepare standby for duplication
oracle@dlabvm48.dlab.pythian.com--> (pythian) ** Standby **
/u01/app/oracle/product/12.1.0/db_1/network/admin> sqlplus / as sysdba

SQL*Plus: Release 12.1.0.1.0 Production on Wed Aug 7 18:42:31 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> startup nomount
ORACLE instance started.

Total System Global Area  409194496 bytes
Fixed Size                  2288968 bytes
Variable Size             331350712 bytes
Database Buffers           71303168 bytes
Redo Buffers                4251648 bytes
SQL>

Time to create the standby: first, a status reality check.

Status of primary database: OPEN
Status of standby database: NOMOUNT

The duplicate performed from the primary site:

oracle@dlabvm46.dlab.pythian.com--> (pythian) ** Master **
/home/oracle> rman

Recovery Manager: Release 12.1.0.1.0 - Production on Wed Aug 7 18:47:38 2013

Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target sys/******

connected to target database: PYTHIAN (DBID=2086712234)

RMAN> connect auxiliary sys/******@pythiansb

connected to auxiliary database: PYTHIAN (not mounted)

RMAN> duplicate target database for standby from active database nofilenamecheck;

Starting Duplicate Db at 07-AUG-13
using target database control file instead of recovery catalog
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=20 device type=DISK

contents of Memory Script:
{
   backup as copy reuse
   targetfile  '/u01/app/oracle/product/12.1.0/db_1/dbs/orapwpythian' auxiliary format 
 '/u01/app/oracle/product/12.1.0/db_1/dbs/orapwpythian'   ;
}
executing Memory Script

Starting backup at 07-AUG-13
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=51 device type=DISK
Finished backup at 07-AUG-13

contents of Memory Script:
{
   backup as copy current controlfile for standby auxiliary format  '/u01/oradata/pythian/control01.ctl';
   restore clone controlfile to  '/u01/app/oracle/fast_recovery_area/pythian/control02.ctl' from 
 '/u01/oradata/pythian/control01.ctl';
}
executing Memory Script

Starting backup at 07-AUG-13
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile copy
copying standby control file
output file name=/u01/app/oracle/product/12.1.0/db_1/dbs/snapcf_pythian.f tag=TAG20130807T184803
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
Finished backup at 07-AUG-13

Starting restore at 07-AUG-13
using channel ORA_AUX_DISK_1

channel ORA_AUX_DISK_1: copied control file copy
Finished restore at 07-AUG-13

contents of Memory Script:
{
   sql clone 'alter database mount standby database';
}
executing Memory Script

sql statement: alter database mount standby database

contents of Memory Script:
{
   set newname for tempfile  1 to 
 "/u01/oradata/pythian/temp01.dbf";
   switch clone tempfile all;
   set newname for datafile  1 to 
 "/u01/oradata/pythian/system01.dbf";
   set newname for datafile  3 to 
 "/u01/oradata/pythian/sysaux01.dbf";
   set newname for datafile  4 to 
 "/u01/oradata/pythian/undotbs01.dbf";
   set newname for datafile  6 to 
 "/u01/oradata/pythian/users01.dbf";
   backup as copy reuse
   datafile  1 auxiliary format 
 "/u01/oradata/pythian/system01.dbf"   datafile
 3 auxiliary format 
 "/u01/oradata/pythian/sysaux01.dbf"   datafile 
 4 auxiliary format 
 "/u01/oradata/pythian/undotbs01.dbf"   datafile 
 6 auxiliary format 
 "/u01/oradata/pythian/users01.dbf"   ;
   sql 'alter system archive log current';
}
executing Memory Script

executing command: SET NEWNAME

renamed tempfile 1 to /u01/oradata/pythian/temp01.dbf in control file

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

Starting backup at 07-AUG-13
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile copy
input datafile file number=00001 name=/u01/oradata/pythian/system01.dbf
output file name=/u01/oradata/pythian/system01.dbf tag=TAG20130807T184815
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:01:05
channel ORA_DISK_1: starting datafile copy
input datafile file number=00003 name=/u01/oradata/pythian/sysaux01.dbf
output file name=/u01/oradata/pythian/sysaux01.dbf tag=TAG20130807T184815
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:01:05
channel ORA_DISK_1: starting datafile copy
input datafile file number=00004 name=/u01/oradata/pythian/undotbs01.dbf
output file name=/u01/oradata/pythian/undotbs01.dbf tag=TAG20130807T184815
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile copy
input datafile file number=00006 name=/u01/oradata/pythian/users01.dbf
output file name=/u01/oradata/pythian/users01.dbf tag=TAG20130807T184815
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:01
Finished backup at 07-AUG-13

sql statement: alter system archive log current

contents of Memory Script:
{
   switch clone datafile all;
}
executing Memory Script

datafile 1 switched to datafile copy
input datafile copy RECID=3 STAMP=822855043 file name=/u01/oradata/pythian/system01.dbf
datafile 3 switched to datafile copy
input datafile copy RECID=4 STAMP=822855043 file name=/u01/oradata/pythian/sysaux01.dbf
datafile 4 switched to datafile copy
input datafile copy RECID=5 STAMP=822855043 file name=/u01/oradata/pythian/undotbs01.dbf
datafile 6 switched to datafile copy
input datafile copy RECID=6 STAMP=822855043 file name=/u01/oradata/pythian/users01.dbf
Finished Duplicate Db at 07-AUG-13

RMAN>

In most cases, the next step would be to start managed recovery on the standby site. However, we will do some manual recovery to ensure log transport services are working as expected. We will test the arrival and manual application of archived redo on the standby site as follows:

  1. Switch logfile a few times on the primary.
  2. Run command to discover the status of redo to ascertain:
    • oldest online log sequence
    • next log sequence to archive
    • current log sequence
  3. Recover the standby database to confirm arrival of archived redo.
  4. Allow recovery to abend when it runs out of archived redo.

Manual recovery test for proof-of-concept:

** Switch logfiles on primary

oracle@dlabvm46.dlab.pythian.com--> (pythian) ** Master **
/u01/app/oracle/fast_recovery_area/PYTHIAN/archivelog/2013_08_07> !sqlp
sqlplus / as sysdba

SQL*Plus: Release 12.1.0.1.0 Production on Wed Aug 7 18:58:16 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> archive log list    
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     19
Next log sequence to archive   21
Current log sequence           22

SQL> alter system switch logfile;

System altered.

SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     20
Next log sequence to archive   22
Current log sequence           22

SQL>

If all goes well, we should be able to toddle off to the standby and perform manual recovery expecting successful application of archived log sequences 19 and 20. The envelope, please…

Recovery test on the standby:

** Recover standby manually

oracle@dlabvm48.dlab.pythian.com--> (pythian) ** Standby **
/u01/app/oracle> sqlplus / as sysdba

SQL*Plus: Release 12.1.0.1.0 Production on Wed Aug 7 18:58:03 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> recover standby database
ORA-00279: change 1882090 generated at 08/07/2013 18:57:22 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/fast_recovery_area/PYTHIAN/archivelog/2013_08_07/o1_mf_1_20_%u_.
arc
ORA-00280: change 1882090 for thread 1 is in sequence #20

Specify log: {=suggested | filename | AUTO | CANCEL}

ORA-00279: change 1882266 generated at 08/07/2013 18:58:54 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/fast_recovery_area/PYTHIAN/archivelog/2013_08_07/o1_mf_1_21_%u_.
arc
ORA-00280: change 1882266 for thread 1 is in sequence #21
ORA-00278: log file
'/u01/app/oracle/fast_recovery_area/PYTHIAN/archivelog/2013_08_07/o1_mf_1_20_905
n6lw5_.arc' no longer needed for this recovery

Specify log: {=suggested | filename | AUTO | CANCEL}

ORA-00279: change 1884132 generated at 08/07/2013 19:09:24 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/fast_recovery_area/PYTHIAN/archivelog/2013_08_07/o1_mf_1_22_%u_.
arc
ORA-00280: change 1884132 for thread 1 is in sequence #22
ORA-00278: log file
'/u01/app/oracle/fast_recovery_area/PYTHIAN/archivelog/2013_08_07/o1_mf_1_21_905
n9gfc_.arc' no longer needed for this recovery

Specify log: {=suggested | filename | AUTO | CANCEL}

ORA-00279: change 1884132 generated at 08/07/2013 19:09:24 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/fast_recovery_area/PYTHIAN/archivelog/2013_08_07/o1_mf_1_22_%u_.
arc
ORA-00280: change 1884132 for thread 1 is in sequence #22
ORA-00278: log file
'/u01/app/oracle/fast_recovery_area/PYTHIAN/archivelog/2013_08_07/o1_mf_1_21_905
n9gfc_.arc' no longer needed for this recovery

Specify log: {=suggested | filename | AUTO | CANCEL}

ORA-16145: archival for thread# 1 sequence# 22 in progress

Network-based recovery

Looks like the stage is set to try recovering the standby from the primary. A few tasks are to be performed beforehand, then away we go. Each task is outlined in the following code:

Task #1: get the service name for the standby database
From standby site

sqlplus / as sysdba
show parameters service

Task #2: verify the open mode and that this is a physical standby
From standby site

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
MOUNTED

SQL> select database_role from v$database;

DATABASE_ROLE
----------------
PHYSICAL STANDBY

Task #3: verify managed recovery is not running
From standby site

SQL> alter database recover managed standby database cancel;
alter database recover managed standby database cancel
*
ERROR at line 1:
ORA-16136: Managed Standby Recovery not active
SQL>

Task #4: create a new table on the primary site
From primary site

SQL> create table tester1 as select * from obj_1

Table created.

Task #5: switch a few logfiles
From primary site

SQL> alter system switch logfile;

System altered.
SQL> alter system switch logfile;

System altered.
SQL>

Task #6: verify TESTER1 table not there
From standby site

SQL> alter database open read only;

Database altered.

SQL> desc pythian.tester1
ERROR:
ORA-04043: object pythian.tester1 does not exist

Task #7: put standby back in MOUNT mode
From standby site

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup nomount
ORACLE instance started.

Total System Global Area  409194496 bytes
Fixed Size                  2288968 bytes
Variable Size             331350712 bytes
Database Buffers           71303168 bytes
Redo Buffers                4251648 bytes
SQL> alter database mount standby database;

Database altered.

SQL> select open_mode,database_role from v$database;

OPEN_MODE            DATABASE_ROLE
-------------------- ----------------
MOUNT                PHYSICAL STANDBY

Task #8: perform network-based recovery
From primary site

oracle@dlabvm46.dlab.pythian.com--> (pythian) ** Master **
/home/oracle> rman

Recovery Manager: Release 12.1.0.1.0 - Production on Tue Aug 13 16:22:51 2013

Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target "sys/m3n3g3r@pythiansb as sysdba"

connected to target database: PYTHIAN (DBID=2086712234, not open)

RMAN> recover database
2> from service pythian
3> section size 120m
4> using compressed backupset;

Starting recover at 13-AUG-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=27 device type=DISK
skipping datafile 1; already restored to SCN 1997353
skipping datafile 3; already restored to SCN 1997353
skipping datafile 4; already restored to SCN 1997353
skipping datafile 6; already restored to SCN 1997353

starting media recovery

archived log for thread 1 with sequence 30 is already on disk as file /u01/app/oracle/fast_recovery_area/PYTHIAN/archivelog/2013_08_13/o1_mf_1_30_90o56dp6_.arc
archived log for thread 1 with sequence 31 is already on disk as file /u01/app/oracle/fast_recovery_area/PYTHIAN/archivelog/2013_08_13/o1_mf_1_31_90o56dyj_.arc
archived log for thread 1 with sequence 32 is already on disk as file /u01/app/oracle/fast_recovery_area/PYTHIAN/archivelog/2013_08_13/o1_mf_1_32_90o56k6p_.arc
archived log file name=/u01/app/oracle/fast_recovery_area/PYTHIAN/archivelog/2013_08_13/o1_mf_1_30_90o56dp6_.arc thread=1 sequence=30
archived log file name=/u01/app/oracle/fast_recovery_area/PYTHIAN/archivelog/2013_08_13/o1_mf_1_31_90o56dyj_.arc thread=1 sequence=31
archived log file name=/u01/app/oracle/fast_recovery_area/PYTHIAN/archivelog/2013_08_13/o1_mf_1_32_90o56k6p_.arc thread=1 sequence=32
media recovery complete, elapsed time: 00:00:01
Finished recover at 13-AUG-13

Task #9: open standby read only and verify TESTER1 is there
From standby site

SQL> alter database open read only;

Database altered.

SQL> desc pythian.tester1
Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
OBJ#                                      NOT NULL NUMBER
DATAOBJ#                                           NUMBER
OWNER#                                    NOT NULL NUMBER
NAME                                      NOT NULL VARCHAR2(128)
NAMESPACE                                 NOT NULL NUMBER
SUBNAME                                            VARCHAR2(128)
TYPE#                                     NOT NULL NUMBER
CTIME                                     NOT NULL DATE
MTIME                                     NOT NULL DATE
STIME                                     NOT NULL DATE
STATUS                                    NOT NULL NUMBER
REMOTEOWNER                                        VARCHAR2(128)
LINKNAME                                           VARCHAR2(128)
FLAGS                                              NUMBER
OID$                                               RAW(16)
SPARE1                                             NUMBER
SPARE2                                             NUMBER
SPARE3                                             NUMBER
SPARE4                                             VARCHAR2(1000)
SPARE5                                             VARCHAR2(1000)
SPARE6                                             DATE
SIGNATURE                                          RAW(16)
SPARE7                                             NUMBER
SPARE8                                             NUMBER
SPARE9                                             NUMBER

SQL> select count(*) from pythian.tester1;

COUNT(*)
----------
90775

SQL>

Closing remarks

Only testing and time will confirm the usability and time saving of this new-fangled feature. This blog post has shown an example of an alternative way to recover a physical standby database. Up until 12c, archived redo was the only way to do it other than shutting down the standby and refreshing its database files. Gone are the days of the forever-flashing cursor as a recovery exercise plows through days of archived redo. This example showed how a standby is caught up with its primary. Network-based recovery can be used as well to replace missing datafiles, control files, or tablespaces on the primary using the corresponding entity from the physical standby.

Viewing all 301 articles
Browse latest View live