berxblog

This morning I read a twitter message by Jeff Smith (aka @hillibillitoad):

OH: "Once we get our system running, we don't touch it." Yeah, that generally works pretty good.

I like Jeffs tweets, blogs and comments as he is a very smart guy and still keeps his mind open for other ideas.

In this particular case I have to contradict, but I was not able to condense it into 140 chars.

Most people in the IT business seems to follow the sentence "never touch a running system" like a commandment. For me this often sounds like the "please don't touch" in an art museum.

This brings me to an interesting question: Do these 'most people' see themselves as artists, and their work as art?

Let me give you another picture:

Imagine a farmer which seeds the crops in spring, did really everything right and then sits down and does not touch his running system. You might guess his harvest?

So what's the big difference here?

Artwork most of the time is first created for a dedicated purpose. As long as the purpose does not change, it's expected to satisfy this purpose.

The purpose of an artwork slowly changes, I'm quite sure most of the time it can be measured in decades or centuries.

Also plants are seeded for a dedicated purpose. But their purpose is the rapid change. To live, to growth, to get harvested after some time. So the farmer has to look after his crops all the time. In the best case, he even can improve and steer the change to his advance.

For me the big difference between an artist ('don't touch') and a farmer ('care and steer') is the timescale of the changes. If you expect your work to be never changed (and in a definition of 'life' things which do not change are just dead) prevent them from any interaction.

I prefer living systems. So I take the duty and care for them.

There is a kind of discussion ongoing who should have access to a database, and under what circumstances.
I have never done either of Jeffs or Chets job, so I cannot write anything reasonable about these. But I am sometimes a kind of DBA. This makes me my best source of knowledge about this job, and the attitude I created during the years.

I developed a certain expectation about the differences between an operation guy and others, who are not responsible for productive systems. I even have my private opinion about a sales related job, but that's not my topic today.

From my point of view it's all about control.

For an operational focused person, my work is about control. If there is uncontrolled behaviour somewhere, this will lead to an incident. And that will causes a call. As I like to sleep at least 8 hours per day, that means one out of 3 calls will disturb this sleep (on average).

Of course every incident potentially costs my company money. Or reputation. This will come down to me again. Writing reports. Doing management presentations. Nonproductive paperwork.

After all this additional work, I will do my best to avoid this situation. Analyzing what went out of bounds, led to this uncontrolled behaviour. At the end I even try to change the environment so this will not happen again.

It's all about control. Freak!

So who should get access to my systems? Only those whose I can control. Or at least I can trust?

It's hard to decide whom to trust. At least to which specific level.

Jeffs driving license seems to be a try to formalise this need. But similar to a driving license, it only shows you have the allowance to drive. It rarely tells anything about the skills.

Talking about skills brings one more dimension here: People living in an ecosystem where every action is controlled by a QualityAssurance team are used to go to the limits and beyond. That's great, it's what is expected from them. Otherwise they would do their job bad.

It's just not what I want in a productive environment. There boundaries are, to never be reached.

I put a more pragmatic approach: If someone takes responsibility for the work, it's fine for me. As an example: if the person who added a big bug during a small hotfix at 5pm is called at 2am the next morning to fix this bug, I'm fine.

And just one more dimension: I favour to grant access to people who know what they can break with this: Just ask me for any access to v$ views in my DB; you will get it, just after you show and explain me how you can halt the application if you abuse it.

Need a short summary?

I like to control access to my database, so I limit it to people who I trust.

To gain it, show me your responsibility and knowledge.

Today I got an email by a developer about an error on a test database.
(I shortened the conversation, but tried not to change the content):

Dev:

We get an ORA-02393: exceeded call limit on CPU usage almost immediatedly after executing the following stmt: ...Could you pls adjust the limits?

After some hours of meetings I replied:
Me:

These limits where introduced in test to find exactly those statements which are running too slow for an OLTP application. Can we help you in tuning the statement?

Dev:

I already exchanged details on the issue with [another DBA] and found the cause of the 'longer than usual' execution time (index missing ;-)
Nonetheless 100ms max exec time is a bit too strict for a dev platform imho - but as the ... is going to be replaced soon we won't request any changes to a system/db in the last chapter of its life cycle.

I did not reply yet. Just try to explain my world.

Is a CPU_PER_CALL limit really needed on a test system?

In theory: no.

If the developers implement a perfect code instrumentation, they would know the runtime of every statement. In test it would be evaluated for a proper per-statement performance as well as for an over all performance.

But if they would do so, I would never have received this email. So the world is not perfect. The code isn't, also.

Are 100ms enough time? seems really tough!

Yes, they are.

Even on an 8 years old UltraSparc III+ you can do a lot of CPU cycles within 100ms. And only CPU is count there, no time for IO or other WAITs.

But the biggest argument for this limit: I never got any complaint about it by a good statement. Only by those which needed urgent suport, anyhow.

Some times, I'd like to have a transaction open, even there is a DDL somewhere in between.
Oracle might have some reasons for the implicit COMMIT, but I have mine to avoid them in my transactions.

Here a small example, how I do it (don't just copy it, it's really small, just an example)!

Let's prepare a package and two tables


CREATE OR REPLACE PACKAGE my_ddl_package AS
FUNCTION do_ddl(p_ddl varchar2)  return varchar2;
END my_ddl_package;
/

CREATE OR REPLACE PACKAGE BODY my_ddl_package AS
function do_ddl(p_ddl varchar2) return varchar2
as
PRAGMA AUTONOMOUS_TRANSACTION;
begin

execute immediate p_ddl;
return NULL;
end;
END my_ddl_package;
/

create table t1 as select * from dual;

create table t2 as select * from dual;

And now the testcase

I try to update t1, drop t2 and afterwards rollback the update on t1.
Let's give it a try.


SQL> update t1 set dummy ='U';

1 row updated.

SQL> select my_ddl_package.do_ddl('drop table t2') from dual;

MY_DDL_PACKAGE.DO_DDL('DROPTABLET2')
--------------------------------------------------------------------------------


SQL> select * from t1;

D
-
U

SQL> rollback;

Rollback complete.

SQL> select * from t1;

D
-
X

SQL> select * from t2;
select * from t2
              *
ERROR at line 1:
ORA-00942: table or view does not exist

Everything works as expected: update - DDL - rollback

Did I miss something?

If you did not want to read (or think) too much, just the shortcut:
What saved my day? It's PRAGMA AUTONOMOUS_TRANSACTION!

Sometimes I am searching for any method to solve a problem. And after some investigations, mailing lists, direct contact of much smarter people, I come to the conclusion:

It's just not possible!

(Or at least not within reasonable effort).

One of these problems, or more precise questions is:

How likely is the current explain plan for a given SQL statement to change?

I call this

estimated plan stability

Unfortunately there is currently no such feature but at least I can give some examples, what I would like to have:

E-rows vs. A-rows
If they differ a lot (in any line of the execution plan) it might be a hint the plan is far away from reality, or in risk to change?
Of course for A-rows gather_plan_statistics or similar is needed.

Best so far in 10053 trace

If you ever have analysed a 10053 trace, you might know the line starting with Best so far ....
If the 2nd best is not far from the 1st, I assume small changes in the data might lead to a different execution plan.

Binds outside histogram boundaries

If a bind variable is outside of the min/max values of a histogram, the optimiser tries to guess how many rows it will get from this predicate/filter. Of course this can be horrible wrong, and should be also shown by my 1st suggestion.

These are only 3 possibilities. They should show some areas of information where I'd like Oracle to collect and provide more data than they do at the moment. Probably they would also be valuable for others? Any other suggestions out there?

As I have to dig into srvctl more than I liked to do, I figured the documentation is not complete (at least for my installation of 11.2.0.2):
the Documentation for srvctl upgrade claims

The srvctl upgrade database command upgrades the configuration of a database and all of its services to the version of the database home from where this command is run.

But there is a 2nd option missing totally:

Usage: srvctl upgrade model -s <source-version> -d <destination-version> -p {first|last} [-e <name>=<value>[,<name>=<value>, ...]

in more detail:

srvctl upgrade model -h

Upgrade the Oracle Clusterware resource types and resources.

Usage: srvctl upgrade model -s <source-version> -d <destination-version> -p {first|last} [-e <name>=<value>[,<name>=<value>, ...]
   -s <source-version>      Version from which to upgrade
   -d <destination-version> Version to which to upgrade
   -p {first|last}          Whether the command is called from the first upgraded node or the last upgraded node
   -e <name>=<value>[,<name>=<value>, ...] Additional name value pairs for upgrade
   -h                       Print usage

In general thsi should only be needed during an CRS upgrade, as part of root.sh script. Nevertheless, as it's there it should be documented. Especially the -e parameter seems to be worth more information than the -h docu provides.

I had a complex problem today: I tried to setup a connection manager, but unlike Arup, I did not like to to use SOURCE_ROUTE. So I had to make the pmon register itself to the cman. As we have already an entry in spfile for remote_listener=REMOTE, I just enhanced this alias in tnsnames.ora by the additional line for the cmans HOST and PORT.
Unfortunately the services did not show up in the cmans show services. Not even an alter system register; did any good, still no service.
After checking with tcpdump (there where really no communication to the cman) and oradebug event 10246 I had still no clue how to find out why my pmon does not like to contact the cman. At a short ask for help on twitter, Martin Nash pointed me to the Note How to Trace Dynamic Registration from PMON ? [ID 787055.1]. There I found the event

alter system set events='immediate trace name listener_registration level 3';

With this, (beside a lot of other useful information) I found the pmon just not knowing about the new entries.
As a solution I had to tell it about the new entries in tnsnames.ora by

alter system set remote_listener=REMOTE;

This made pmon to re-read the tnsnames.ora and accept the new values. All my services shows up in cman now.
Yong Huang has some more Informations about the different trace levels here:

Actually, trc_level 4 (user) is enough to show info about load stats. Other levels are:
0,1: off
2,3: err
4,5: user
6-14:admin
15: dev
16-: support

Oracles Hybrid Columnar Compression was one of the big new features of Oracle 11.2. Unfortunately someone decided this feature should only be available in Exadata. Even Oracle tried to explain it with technical limitations, it was more or less obvious that's just wrong. There are some reasons for this:

The Database is doing the HCC compression all the time
The Database must be able doing the HCC decompression in case the storage cell can not or want not.
Beta-testers where very sure, HCC worked there without any special hardware.
Jonathan Lewis shows there are situations, where also an ordinary database creates tables with HCC

But fact is: Oracle decided to disable HCC for general usage. As there is no different database software in Exadata database servers, the decision whether process the statement or throw a ORA-64307: hybrid columnar compression is only supported in tablespaces residing on Exadata storage must be done by any switch within the software.

Here my collection of informations I have about these switch:

Kerry Osborne described in the book Expert Oracle Exadata in pages 46 to 48 how the ASM DiskGroup attribute cell.smart_scan_capable=TRUE is only possible on Exadata, and necessary for any kind of smart scan - so also for HCC.
Cern has published a paper about Compression in Oracle - in the Appendix (pages 42 to 44) they show how to change this attribute. Not so easy and it corrupts the ASM DG.
Jonathan Lewis mentioned there might be a switch in DBMS_COMPRESSION.GET_COMPRESSION_RATIO which disables the switch for the purpose of the temporary compressed tables. He did not go into details, but I decided to investigate into that direction.

The package DBMS_COMPRESSION uses prvt_compression, and there in GET_COMPRESSION_RATIO it calls PRVT_COMPRESSION.CHECK_HLI(1); to disable this switch and PRVT_COMPRESSION.CHECK_HLI(0); to enables it at the end again. CHECK_HLI just calls the kernel function KDZCHECKHI with it's parameter, nothing more. Unfortunately it can not be called from outside of PRVT_COMPRESSION. That's the way I started to investigate: I removed the line PROCEDURE CHECK_HLI (HLID IN NUMBER); from the package body and inserted it into the package header. (by doing so, I left the path of supported system - don't do this if you care your system!). Now I can call CHECK_HLI:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production 
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,Data Mining and Real Application Testing options

SQL> exec sys.prvt_compression.CHECK_HLI(1);

PL/SQL procedure successfully completed.

SQL> create table bx_dba_objects Compress For Archive Low as select * from dba_objects;

Table created.

prvt_compression.CHECK_HLI works per session; so a logon-trigger comes to my mind.
To enable the check again, the parameter is 0 instead of 1.

Update:
parallel processes does not inherit this feature:
(a slightly different testcase, but same setup in general)

SQL> select /*+ PARALLEL 8 */ count(*) from test_user.DBMS_TABCOMP_TEMP_CMP; 
select /*+ PARALLEL 8 */ count(*) from test_user.DBMS_TABCOMP_TEMP_CMP
*
ERROR at line 1:
ORA-12801: error signaled in parallel query server P008, instance av2l904t:VAX1
(1)
ORA-64307: hybrid columnar compression is only supported in tablespaces
residing on Exadata storage

To check the performance of RMAN backup I recently started to trace it a little bit. As most of the time was not spent in any reading from disk or writing to media manager library event, it was on CPU. It's good to know the CPUs are of any good, but as I still want to know what's going on I tried to dig any deeper. CPU cycles are not just a magic black box where we put in a problem and the answer comes out after some times. At an abstraction layer it's a chain of functions where one is called by another, and only the last is the one doing anything. There is not much information in that fact per se, but developers are humans also, and they are giving the functions they code meaningful names.

So I had just to find these names (and where most of the time is spent) to figure out what's going on. To save my time I remembered Tanel Poders Advanced Oracle Troubleshooting Guide, Part 9 – Process stack profiling from sqlplus using OStackProf. There he described his tool ostackprof. This did all the job for me, I just had to find a rman session.

Here's the shortstack where most of the time was spent:
(This backup was done with COMPRESSION ALGORITHM ‘BASIC’)
->__libc_start_main()->main()->ssthrdmain()->opimai_real()->sou2o()->opidrv()->opiodr()->opiino()->opitsk()->ttcpip()->opiodr()->kporpc()->kkxrpc()->prient()->prient2()->pricbr()->pricar()->plsql_run()->pfrrun()->pfrrun_no_tool()->pfrinstr_ICAL()->pevm_icd_call_common()->krbibpc()->krbbpc()->krbb3crw()->krbbcdo()->kgccdo()->kgccbz2pseudodo()->kgccbz2do()->kgccm()->kgccbuf()->kgccgmtf()->__sighandler()->->

The naming convention for functions is not public documented by oracle, but for some reasons I'm sure functions starting with krb are related to backup, whereas kgcc is used for compression. Especially the working function kgccgmtf reads like generate Move To Front.

At that point I had a lot more information than before, still I had no way how to improve the backup speed. As we have licensed advanced compression for that particular node, we tested with different other compression methods. LOW and MEDIUM where faster, with less compression than our previous BASIC. But HIGH was even slower!

So again I used ostackprof and that's the topmost stack trace - for HIGH:
->__libc_start_main()->main()->ssthrdmain()->opimai_real()->sou2o()->opidrv()->opiodr()->opiino()->opitsk()->ttcpip()->opiodr()->kporpc()->kkxrpc()->prient()->prient2()->pricbr()->pricar()->plsql_run()->pfrrun()->pfrrun_no_tool()->pfrinstr_ICAL()->pevm_icd_call_common()->krbibpc()->krbbpc()->krbb3crw()->krbbcdo()->kgccdo()->__PGOSF209_kgccbzip2pseudodo()->kgccbzip2do()->BZ2_bzCompress()->handle_compress()->BZ2_compressBlock()->generateMTFValues()->__sighandler()->->

Do you see the difference? Until kgccdo there is no! And even afterwards, the functions are somewhat similar. One more thing is worth to mention: the bzip2 implementation for HIGH does not use oracle internal naming convention. So it's worth to search for these names on the internet. one of my best hits was a compress.c File Reference.

Did Oracle reinvent the wheel? No. For me it looks as if they tried their best first (by doing their own kgcc implementation) and afterwards preferred simple copy&paste. Maybe they should just skip either of these 2 - they still can use parameters to achieve different compression quality.

If someone is interested in our results:
for a single datafile of 30GB (with 100% usage) we achieved on a production system - with all it ongoing tasks:

Type	min	backup-size
BASIC	13:32	5.8
LOW	5:17	8
MEDIUM	8:52	6.14
HIGH	65:29	4.25

We decided to choose MEDIUM.

Figure 2-7
Connection to
a Dedicated
Server Process

For some reason I was really curios who created that process. It's not about a particular process in detail, mir a well known kind of processes. At least well known for DBAs.

Which process?

It's one of these:

oracle   13096     1  0 20:05 ?        00:00:00 oracleTTT071 (LOCAL=NO)

Yes, it's a simple server process, nothing spectacular. Nevertheless, the Concepts guide is not very specific, who created that process. So I tried to find out in more detail.
On my linux sandbox the first column of ps -ef shows the UID, the second is the PID, the third is the PPID. Unfortunately it's 1 here, and I'm quite sure, this process was not created by init. So this proces is somewhat orphaned, as the direct parent disappeared. Very sad!
I decided to follow Figure 2-7 from the concepts guide. I used strace -f -p <PID_of_listener> to see what's going on. -f follows all forks, so also their actions are traced.
The first 3 lines are

Process 2979 attached with 3 threads - interrupt to quit
[pid  2981] futex(0xae8dee4, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  2980] restart_syscall(<... resuming interrupted call ...> <unfinished ...>

So we have 3 listener processes - it's good to know and probably worth to investigating this segregation of duties - but not in this post. There are so many interesting lines, but I'm searching for a process, so let's continue with

[pid 2979] clone(Process 27028 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2aedd9914b80) = 27028
[pid 2979] wait4(27028, Process 2979 suspended
<unfinished ...>
[pid 27028] clone(Process 27029 attached (waiting for parent)
Process 27029 resumed (parent 27028 ready)
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2aedd9914b80) = 27029
[pid 27028] exit_group(0) = ?
Process 2979 resumed
Process 27028 detached
[pid 2979] <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 27028
[pid 27029] close(15 <unfinished ...>
[pid 2979] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 27029] <... close resumed> ) = 0
[pid 2979] close(14 <unfinished ...>
[pid 27029] close(16 <unfinished ...>
[pid 2979] <... close resumed> ) = 0
[pid 27029] <... close resumed> ) = 0
[pid 2979] close(17) = 0

Here the listener ([pid 2979]) creates a new process by the first clone call. This new Process has the PID 27028. This new process has only one purpose: again clone a new Process: PID 27029 and use exit_group(0) to terminate directly afterwards. By this trick the listener is not shown as parent process for PID 27029. Directly after it's creation PID 27029 closes some file handles. As by the sequence of clone calls the new process inherited a table of all open file (and network) handles it seems it tries to get rid of any it does not need as early as possible. The next part
[pid 2979] fcntl(16, F_SETFD, FD_CLOEXEC) = 0
[pid 27029] setsid( <unfinished ...>
[pid 2979] fcntl(15, F_SETFD, FD_CLOEXEC <unfinished ...>
[pid 27029] <... setsid resumed> ) = 27029
[pid 2979] <... fcntl resumed> ) = 0
[pid 27029] geteuid() = 5831
[pid 2979] fcntl(13, F_SETFD, FD_CLOEXEC) = 0
[pid 27029] setsid() = -1 EPERM (Operation not permitted)

[pid  2979] poll([{fd=8, events=POLLIN|POLLRDNORM}, {fd=11, events=POLLIN|POLLRDNORM}, {fd=12, events=POLLIN|POLLRDNORM}, {fd=16, events=POLLIN|POLLRDNORM}, {fd=15, events=0}], 5, -1 <unfinished ...>

makes sure the file descriptos 16, 15 and 13 will remain after an execve(2) call.
And here it goes:

[pid 27029] execve("/appl/oracle/product/rdbms_112022_a/bin/oracle", ["oracleTTT051", "(LOCAL=NO)"], [/* 109 vars */]) = 0

from the man page if execve:

execve() executes the program pointed to by filename.
...
execve() does not return on success, and the text, data, bss, and stack of the calling process are overwritten by that of the program loaded. The program invoked inherits the calling process’s PID, and any open file descriptors that are not set to close-on-exec. Signals pending on the calling process are cleared. Any signals set to be caught by the calling process are reset to their default behaviour. The SIGCHLD signal (when set to SIG_IGN) may or may not be reset to SIG_DFL.
If the current program is being ptraced, a SIGTRAP is sent to it after a successful execve().
If the set-user-ID bit is set on the program file pointed to by filename, and the calling process is not being ptraced, then the effective user ID of the calling process is changed to that of the owner of the program file. i Similarly, when the set-group-ID bit of the program file is set the effective group ID of the calling process is set to the group of the program file.

From that point on there you can see how the server process comes to life. It's very interesting in some details, but not scope of this post. After some conversation between listener and server process using file descriptors 15 and 16 (I assume these are just sockets) both close these file descriptors. The listener also closes file descriptor 13 which seems to be the TCP connection to the client. From that point the 2 processes seems to be independent.

Well, now I know (at least on my test-system) the simplest way, the listener creates the process - and it uses execve to do so. There still are many questions open, like what's going on at this redirection as shown in Figure 2-8.

This post must be seen as a direct follow up to Arup Nandas Setting Up Oracle Connection Manager.
As there are many references to this post, please read it first. Problem and Solution are quite similar, only the architecture is a little bit different:

The Architecture

The network diagram of the three machines is slightly different:

There is a new needed connection: from the instance on dbhost1 to the connection manager on cmhost1.

After changing the setup, you will need to rewrite the TNSNAMES.ORA in the following way:

TNS_CM = 
  (DESCRIPTION = 
    (ADDRESS = 
      (PROTOCOL = TCP)(HOST = cmhost1)(PORT = 1950)
    )
    (CONNECT_DATA = 
      (SERVICE_NAME=srv1)
    )
  )

You see, the (SOURCE_ROUTE = YES) disappeared as well as the ADDRESS of the listener on dbhost1.

How it Works

Note, all the special parameters and settings on the clients TNSNAMES.ORA disappeared. But the cman must know about the SERVICE_NAME it has to serve. As the cman can be seen as a special kind of listener, there is a common way a listener gets informed about a SERVICE_NAME: the Instance has to register the services to the listener. In general this is done by pmon at registering to logal_listener and remote_listener. In this case, remote_listener is the magic parameter.

Setting Up

You can follow step (1) to (9) as in Arups blog.
But before (10) an additional step is required:

(x) on the instance add the cman to remote_listener:

Alter System Set remote_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=cmhost1)(PORT=1950))))' scope=both;

If there is already an entry in remote_listener, e.g. in a RAC, you can separate the different connection strings by comma. An example can be

Alter System Set remote_listener='SCAN-IP:1521,(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=cmhost1)(PORT=1950))))' scope=both;

(For more details about SCAN I'd recommend this PDF)

CMCTL Primer

As we have now the services registered also on cman, we can see it there. The SHOW command has a 2nd parameter services. Here an example

Services Summary...
Proxy service "cmgw" has 1 instance(s).
  Instance "cman", status READY, has 2 handler(s) for this service...
    Handler(s):
      "cmgw001" established:1 refused:0 current:0 max:256 state:ready
<machine: 127.0.0.1, pid: 16786 >
         (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=44391))
      "cmgw000" established:1 refused:0 current:0 max:256 state:ready
<machine: 127.0.0.1,pid: 16784>
         (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=44390))
Service "INSTANCE1" has 1 instance(s).
  Instance "INSTANCE1", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (ADDRESS=(PROTOCOL=TCP)(HOST=dbhost1)(PORT=1521))
Service "cmon" has 1 instance(s).
  Instance "cman", status READY, has 1 handler(s) for this service...
    Handler(s):
      "cmon" established:3 refused:0 current:1 max:4 state:ready
<machine: 127.0.0.1, pid: 16759>
         (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=44374))
The command completed successfully.

Fine Tuning

I try to create a dedicated service for all (or a well known set of) connections via the connection manager. By doing so it's sometimes easier to separate or identify different kinds of sessions.

I'd like to mention a small peace of software. It's called MrTrace and available in Version 2.0.0.43 right now. For me it's a tool to save time. So what is it doing at all?
MrTrace is a plugin for Oracles SQL Developer to access tracefiles via SQL Developer. It's previous version could only access the tracefile for the statement you just executed. But since version 2 you anyone with the right permissions access any tracefile in the trace directory.

For a DBA it does not sound spectacular to access tracefiles, but it can be quite annoying to get and distribute tracefiles for developers. In my current company there is no OS-access to databae servers for anyone except DBAs and OS-admins. this means someone must copy over the traces to make them accessible for others. It's not a complex task, but it's disturbing.
With MrTrace I can grant anyone who knows how to use SQL Developer permissions to access trace files. So it saves a lot of time, for me and the developer. At a price of less than us$50 it should amortize in no time, if you have a diligent developer.

A list of my very private findings:

PROs:

it's easy to install (on the client side)
it's not OS-dependent - you can apply it on any client-OS where you can start SQL Developer
the installation script for the database user and objects is not wrapped. So you can review and even change this part of the software. (in my case, we have a PASSWORD_VERIFY_FUNCTION enforced in my company. I needed to edit the installation script to create the user MRTRACE.
The support of Method R is great! I had the joy to participate in their beta program for version 2 of MrTrace. We had some nice conversations.

CONs:

You need SQL Developer for the client side. - No big deal for me, but in some companies that might be a problem.
MrTrace needs java to do some tasks. Unfortunately there is no method to list the content of a directory, so java is needed.
OS-commands like ls, find and xargs are used. There is nothing bad about these commands, but I don't see anything they do what cannot be done in java directly. So for me it increases complexity without a need.

And no, I am not an employee of Method R, the only relation is the software license I bought myself.

I currently have the fun to review DBMS_SCHEDULER. As I'm always interested in ways to trace anything, to dig deeper in case of problems, I searched for ways to trace it.
As I did not find a collected list of events anywhere, I start them here. It's by far not a complete list, so feel free to discuss and contribute, if you want!

event 10862

resolve default queue owner to current user in enqueue/dequeue: Cause: resolve default queue owner to current user in enqueue/dequeue.; Action: turn on if client wish to resolve the default queue owner to the current user. If not turned on, the default queue owner will be resolved to the login user.

This event is not checked the way you might imagine. Just in the area of REMOVE_JOB_EMAIL_NOTIFICATION if it's 0, it's set to 1 for a call of DBMS_AQADM.REMOVE_SUBSCRIBER and set to 0 afterwards.

27401

scheduler tracing event

bit 0x10000 - Logging e-mail to table and returning
bitand( ,65536)
logs informations about sending emails into table sys.scheduler$_sent_emails
bit 0x20000 - start DBMS_DEBUG_JDWP.CONNECT_TCP in file watcher
bitand( ,131072)
starts DBMS_DEBUG_JDWP.CONNECT_TCP on localhost, port 4444
I'm not sure if I like this event. In general I don't want any software opening connections without my knowing. And I could not find this documented anywhere.
Is it fair to call this a backdoor?
bit 0x40000 - starts tracing in file watcher
bitand( ,262144)
logs informations about file watcher into trace file

27402

scheduler tracing event

bit 0x40 - starts tracing about emails
bitand( ,64)
similar to event 27401 bit 0x10000, but tracefile instead of table
bit 0x80 - starts tracing about emails
bitand( ,128)
logs information about email jobs into trace file
bit 0x100 - starts tracing in chains
bitand( ,256)
logs information about chains into trace file

I guess there is at least also a bit 0x200, but could not prove it right now.

27403

scheduler stop job event

I did not find anything about it yet. comments are most welcome!

If you want to use Oracle file watcher, you need to Create a Credential. As there a password needs to be stored in the database, Oracle tries to save it in a secure way. But as the password must be decrypted for the purpose to login on the file watchers agent side, it is not safe at all:
The credentials are stored with DBMS_SCHEDULER.CREATE_CREDENTIAL. Here an example:

exec DBMS_SCHEDULER.CREATE_CREDENTIAL(
  credential_name => 'local_credential', 
  username => 'oracle',  password => 'welcome1'); 
exec DBMS_SCHEDULER.CREATE_CREDENTIAL(
  credential_name => 'local_credential2', 
  username => 'oracle2', password => 'welcome1');

It's quite easy to see the values again:


select o.object_name credential_name, username, password 
 FROM SYS.SCHEDULER$_CREDENTIAL c, DBA_OBJECTS o
 WHERE c.obj# = o.object_id;

CREDENTIAL_NAME    USERNAME PASSWORD
------------------ -------- ------------------------------------
LOCAL_CREDENTIAL   oracle   BWVYxxK0fiEGAmtiKXULyfXXgjULdvHNLg==
LOCAL_CREDENTIAL2  oracle2  BWyCCRtd8F0zAVYl44IhvVcJ2i8wNUniDQ==

At least the password is somehow encrypted, and even the password was welcome1 for both credentials, the encrypted string is not identical.

Nothing to blame here, but I mentioned, the password can be decrypted. So let's do so:


SELECT u.name CREDENTIAL_OWNER, O.NAME CREDENTIAL_NAME, C.USERNAME, 
  DBMS_ISCHED.GET_CREDENTIAL_PASSWORD(O.NAME, u.name) pwd
FROM SYS.SCHEDULER$_CREDENTIAL C, SYS.OBJ$ O, SYS.USER$ U
WHERE U.USER# = O.OWNER# 
  AND C.OBJ#  = O.OBJ# ;

CREDENTIAL_OWNER CREDENTIAL_NAME      USERNAME PWD
---------------- -------------------- -------- --------
SYS              LOCAL_CREDENTIAL     oracle   welcome1
SYS              LOCAL_CREDENTIAL2    oracle2  welcome1

Can you see it? It's there. Try it at home!
I don't blame anyone here. It's hard to store anything really safe in case you need to decrypt it also.
But don't expect your password save, if you store it with DBMS_SCHEDULER.CREATE_CREDENTIAL.
Maybe it's slightly to easy to use DBMS_ISCHED.GET_CREDENTIAL_PASSWORD (ok, only SYS can do so) but even it might be slightly more difficult in the future, the basic problem will still exist.

I had a (for my environment) unusual request:
After the migration of a Repository Database from 9i to latest 10g I was asked to keep a backup of the old DB for at least 3 years.
This does not sound very unusual, but it's not that simple in our environment. We do only keep backups for weeks to some month, worst case. I also cannot just backup the datafiles at all: The old database run on Solaris, but we are switching to Linux right now. With just some bad luck I would not have any system to restore (or open) this database backup at all.
This brought me to another solution; in my point of view it was not worth to write a blog about it, but I was asked by Leighton L. Nelson and so I write:

export of the full database
I run a simple export of the database. There is no expdp in 9i, so the choice was easy.
compress the files
the dump (and the logfile!) where tared together and compressed. Just to save space.

prepare a proper store
As mentioned above, there is no dedicated system for this purpose. So I had to prepare a place where the dump is safe. As a DBA, of course I know a good place to store data: A database!
First a DBFS came to my mind. But the DB is in Version 10.2 - no DBFS.
But it's quite simple to do the important steps manually:

create tablespace old_dump datafile '+<DG>' autoextend on;
create user old_dump identified by dump_old1 default tablespace old_dump;
GRANT CONNECT, CREATE SESSION,  CREATE TABLE to old_dump;
alter user old_tech_dump quota unlimited on old_dump;

connect old_dump/dump_old1

create table old_dump_store 
(id integer primary key, description VARCHAR(2000), file_store BLOB) 
LOB (file_store) STORE AS  SECUREFILE 
(TABLESPACE old_dump DISABLE STORAGE IN ROW   NOCACHE LOGGING);

insert the dump (and some metadata)

There is a nice way in SQL Developer to load a file to a BLOB. It's just so simple.
At last some words in the comment field are worth - so everyone knows what's inside the BLOB.

It still might sound strange to save the dump of an old database into it's descendant. But at the end: do you know a better place to store data than a database?

I am somewhat tired to re-install the same set of software again and again, every time I (have to) switch to a new PC. Probably it's me, not the PCs, but it takes some tome to have the system setup, and me productive again.
Somehow it's like a craftsman has to setup a new labor space with new tools in every house they visit. But craftsmen are clever, they bring your tools with them - and take them away if not needed anymore. In best case they do not leave any traces (except the work done).
I try to mimic this approach: I'm creating my own toolbox. Mine is not made of leather or plastic, it's made of an USB-stick, portableapps.com and some additional modifications.

First I chose the programs available in portableapps app directory: Notepad++, Google Chrome, KeePass, PuTTY and WinSCP. I tried to keep the list small, but you can make your own decisions, of course.

Unfortunately I need some more tools: As a DBA, sometimes I not only like to access the database servers, but the database directly. I did not find any proper tool in the app directory, therefore I decided to include Oracles SQL Developer into my toolbox and followed the Portable Apps Format Specification. It's not that complex it looks at first sign. Here my steps:

create the proper directory structure:

SQLDeveloperPortable
+ App
  + AppInfo
  + DefaultData
+ Data
+ Other
  + Help
    + Images
  + Source

download SQL Developer and unzip it into the App folder

in AppInfo create the file appinfo.ini:

[Format]
Type=PortableApps.comFormat
Version=2.0

[Details]
Name=SQLDeveloper Portable
AppID=SQLDeveloperPortable
Publisher=^/\x
Homepage=berxblog.blogspot.com/2012/03/creating-my-mobile-toolbox-for-windows.html
Category=Utilities
Description=Oracle SQL Developer is a graphical version of SQL*Plus that gives database developers a convenient way to perform basic tasks
Language=Multilingual
Trademarks=Oracle
InstallType=

[License]
Shareable=false
OpenSource=false
Freeware=false
CommercialUse=true
EULAVersion=1

[Version]
PackageVersion=3.1.07.42
DisplayVersion=3.1

[SpecialPaths]
Plugins=NONE

[Dependencies]
UsesJava=no
UsesDotNetVersion=

[Control]
Icons=1
Start=sqldeveloper.bat
ExtractIcon=App\sqldeveloper\icon.png

In SQLDeveloperPortable create the file sqldeveloper.bat:

REM ^/\x
SET IDE_USER_DIR=%~d0\PortableApps\SQLDeveloperPortable\Data
start /b %~d0\PortableApps\SQLDEveloperPortable\App\sqldeveloper\sqldeveloper.exe

By setting IDE_USER_DIR all configurations will be stored on the USB-stick, not on (changing) PCs.

creating a proper icon for App\sqldeveloper\icon.png

That's it - works like a charm!

Next I prepared Xming for the portable world. X11 is still needed in the world of an Oracle DBA.
The steps where similar to those of SQL Developer, therefore I only describe the differences here:
As I don't want to extract the installer of Xming, I just let it install onto my PC into C:\Program Files. Then I copied the full structure C:\Program Files\Xming into XmingPortable\App.
Also in this case a bat file as a wrapper is needed, as Xming needs some parameters to go into tray without a window:

start /b %~d0\PortableApps\XmingPortable\App\Xming\Xming.exe :0 ‑clipboard ‑multiwindow

By these 2 examples you can see it's nice and easy to have the private toolbox at your hands all the time.
I do not provide the packages for any of these programs. First, I don't want to take care of any legal implications. Second, I have no interest in this kind of work. I just have no skills in doing so. period.

At the moment I'm trying to collect and sort some informations about Oracles Transparent Application Failover. There is a lot of general information available in the wild, but no deeper details. Here I try to show my findings.

Testcase

For my test-database with DB_UNIQUE_NAME: TTT06_SITE1 I created the service
srvctl add service -d TTT06_SITE1 -s TTT06_TAF -P BASIC -e SELECT -r TTT061,TTT062 .
The tnsnames.ora entry is

TTT06_TAF =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (LOAD_BALANCE = OFF)
      (ADDRESS = (PROTOCOL = TCP)(HOST = crs908.my.domain)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = TTT06_TAF)(SERVER=DEDICATED)
    )
  )

tracing

Just
strace -f -t -s 128 -o sqlplus_taf.strace sqlplus "berx/berx123#@TTT06_TAF"
I will look closely on the sqlplus_taf.strace soon, just the testcase can be finished easily:

current instance

SELECT (SELECT instance_number
        FROM   v$instance) inst,
       s.sid,
       s.service_name,
       s.failover_type,
       s.failover_method,
       s.failed_over,
       p.spid
FROM   v$process p,
       v$session s
WHERE  s.paddr = p.addr
       AND addr IN (SELECT paddr
                    FROM   v$session
                    WHERE  audsid = Sys_context('USERENV', 'SESSIONID'));


   INST      SID SERVICE_NAME FAILOVER_TYPE FAILOVER_M FAI   SPID
------- -------- ------------ ------------- ---------- --- ------
      1      144    TTT06_TAF        SELECT      BASIC  NO  23440

and after a startup force in a 2nd session in instance 1

new instance


/

   INST      SID SERVICE_NAME FAILOVER_TYPE FAILOVER_M FAI   SPID
------- -------- ------------ ------------- ---------- --- ------
      2      146    TTT06_TAF        SELECT      BASIC YES  14927

what's going on

A short excerpt of the sqlplus_taf.strace
First sqlplus tries to access ~/.tnsnames.ora, fails and then opens $TNS_ADMIN/tnsnames.ora. Of course there it reads the connection string shown above.
Next it tries to resolve the HOST entry:
connect(9, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<my_dns>")}, 28) = 0
and gets all IPs for my SCAN-DNS.
sqlplus asks one of the SCAN listeners:

connect(9, {sa_family=AF_INET, sin_port=htons(1521), sin_addr=inet_addr("<any SCAN IP>")}, 16) = -1 EINPROGRESS (Operation now in progress)

for the SERVICE and gets a kind of redirect:

read(9, "\1\10\0\0\6\0\0\0\0@(ADDRESS=(PROTOCOL=TCP)(HOST=<NODE1-vip>)(PORT=1521))\0(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=<SCAN IP>)(PORT=1"..., 8208) = 264

The SCAN-Listener is of no good anymore: close(9). sqlplus looks up the name of <NODE2-vip> in /etc/hosts and tries it's next step with the <NODE1-vip> listener:

connect(9, {sa_family=AF_INET, sin_port=htons(1521), sin_addr=inet_addr("<NODE1-vip IP>")}, 16) = -1 EINPROGRESS (Operation now in progress)

The listener creates a server process for sqlplus - and let's them do their private business.

The startup force killed the server process for sqlplus. But it doesn't know anything about it, until it get's the <newline> from the terminal.

Of course filehandle 9 is somewhat dead and gets close(9). Now really the same steps as above (just tnsnames.ora is not re-read!): SCAN IP lookup, redirect to an NODE-vip, etc.

So only tnsnames.ora is cached, all other lookups and connections are re-run again.
Take this into account if you try to change your setup (IPs, lookups, DNS) while connections are active.

Oracle provides and documents a huge load of possibilities and functions for nearly every purpose. For me it is impossible to know all of them. Even to know such an area exists is hard.
But still sometimes these functions Oracle does not document for customers purpose seems to be more attractive than those officially available.
One of these attractive packages is DBMS_SYSTEM. You will not find any description of this package in the official Oracle documentation. There are some small traces available, but nothing really useful.
Oracle also have quite clear words about using such unofficial, and hidden, packages:
In How To Edit, Read, and Query the Alert.Log [ID 1072547.1] you can read:

NOTE about DBMS_SYSTEM:
This package should in fact only be installed when requested by Oracle Support.
It is not documented in the server documentation.
As such the package functionality may change without notice.
It is to be used only as directed by Oracle Support and its use otherwise is not supported.

Per internal Note 153324.1:
Generally, if a package is not in the Oracle documentation at all, it is intentional, as it is not for end user use. Non-documented packages should be avoided by customers unless specifically instructed to use them by either Oracle Support or Oracle Development.

For some reasons I'm one of those which likes to play with forbidden toys like these. I found a procedure in DBMS_SYSTEM which changed behavior slightly in 11gR2 (I've tested with 11.2.0.3 patchset - so maybe other patchsets behave quite different!)

I'm talking about DBMS_SYSTEM.READ_EV. This procedure more or less calls directly the internal C-routine READ_EV_ICD. Common sense is, it should return the level of an event given. This is also quite true, just for one exception: the probably most known event in Oracle world: 10046 - or sql_trace.

My test-script here

VARIABLE lev number
SET AUTOPRINT ON
EXECUTE sys.dbms_system.read_ev(10046, :lev)

ALTER SESSION SET EVENTS '10046 trace name context forever, level 8';

SELECT sql_trace, sql_trace_waits, sql_trace_binds FROM v$session WHERE sid=userenv('sid')


EXECUTE sys.dbms_system.read_ev(10046,:lev)

oradebug setmypid
oradebug eventdump session

gives the expected result in one of my 10g test DBs:

@test_read_ev.sql

PL/SQL procedure successfully completed.

       LEV
----------
         0

Session altered.

PL/SQL procedure successfully completed.

       LEV
----------
8

Statement processed.
10046 trace name CONTEXT level 8, forever

but an unexpected result in my 11.2.0.3 test DB:

@test_read_ev.sql

PL/SQL procedure successfully completed.

       LEV
----------
         0

Session altered.

PL/SQL procedure successfully completed.

       LEV
----------
0

Statement processed.
sql_trace level=8

I guessed events with an ALIAS might be excluded somehow, but other tests with DEADLOCK==60 or DB_FILES==10222 shows this special behavior only with sql_trace.

My todays conclusion is easy:
If it's not there for you, don't guess you can play with it without any consequences.

In the Oracle DBA World at the moment CVE-2012-1675 is a great issue. Oracle announced some methods how to secure existing systems. But these are sometimes not that easy, and there is no backport for older systems.
As I investigated the problem how to secure a connection manager I was hinted at Note:1455068.1.
The solution is somewhat easy: Only allow incoming connections to your systems. e.g.
(rule=(src=*)(dst=10.220.8.114)(srv=*)(act=accept))

In a well designed environment where you can separate your DB Servers from others at low network layers, a set of CMAN might be enough to secure your DBs against CVE-2012-1675.
At least it might be a simple and fast solution to secure your systens from untrusted areas, until you know how to secure the DB servers themselves. Especially for legacy systems it might be the only solution to secure it.

In this post I try to show what's going on if a local listener dies in a 11gR2 RAC environment. My basic question is: When does (a) SCAN-Listener knows the local Listener disappeared?
My testcase (a sandbox):

A 2-node RAC - all actions are run on node 1, if not explicit defined.
My test-DB is called TTT04 (Test, you know?)

I have 3 SCAN listeners there, but I want to make the test-case easier so I do pin down my connection string to only one SCAN-listener (it's SCAN2 in my case):

TTT04_bx =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 172.24.32.117)
                               (PORT = 1521)) # SCAN2
    (CONNECT_DATA =
      (SERVICE_NAME = TTT04_SITE1)
    )
  )

start tracing pmon:
ps -ef | grep pmon | grep TTT04
SQL> oradebug setospid <pid_of_pmon> Oracle pid: 2, Unix process pid: <pid_of_pmon>, image: oracle@<node1> (PMON) SQL> oradebug Event 10257 trace name context forever, level 16 Statement processed.
just to make sure server side load balancing will lead my to node1:
on node2: several
bzip2 -z -c /dev/urandom > /dev/null &

An now the real test. My 2 test-scripts:
/tmp/bx1.sql


connect berx/berx123#@TTT04_bx
spool /tmp/bx1.txt
select to_char(sysdate, 'YYYY-MM-DD HH24:MI:SS'), HOST_NAME  from v$instance;
exit

/tmp/bx2.sql


connect berx/berx123#@TTT04_bx
spool /tmp/bx2.txt
select to_char(sysdate, 'YYYY-MM-DD HH24:MI:SS'), HOST_NAME  from v$instance;
exit

My command is

kill -9 `pgrep  -f "tnslsnr LISTENER "` ; lsnrctl services LISTENER_SCAN2 > /tmp/lsnr1.txt ; sqlplus /nolog @/tmp/bx1.sql & sleep 5 ;  lsnrctl services LISTENER_SCAN2 > /tmp/lsnr2.txt; sqlplus /nolog @/tmp/bx2.sql

and the result on the Terminal:


SQL*Plus: Release 11.2.0.3.0 Production on Sat May 5 23:00:50 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

ERROR:
ORA-12541: TNS:no listener


SP2-0640: Not connected
[1]+  Done                    sqlplus /nolog @/tmp/bx1.sql 2> /tmp/bx1.err

SQL*Plus: Release 11.2.0.3.0 Production on Sat May 5 23:00:55 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected.

TO_CHAR(SYSDATE,'YY  HOST_NAME
-------------------  ---------
2012-05-05 23:00:55  <node2>

Now the question is, what happened between these 5 seconds of sleep 5?

pmons tracefile TTT041_pmon_4399.trc shows
*** 2012-05-05 23:00:48.391 kmmlrl: status: succ=4, wait=0, fail=0 *** 2012-05-05 23:00:51.398 kmmlrl: status: succ=3, wait=0, fail=1 kmmlrl: update retry kmmgdnu: TTT04_SITE1 goodness=0, delta=1, flags=0x4:unblocked/not overloaded, update=0x6:G/D/- kmmlrl: node load 394 kmmlrl: (ADDRESS=(PROTOCOL=TCP)(HOST=<node1>-vip)(PORT=1521)) block kmmlrl: nsgr update returned 0 kmmlrl: nsgr register returned 0 *** 2012-05-05 23:00:54.401 kmmlrl: status: succ=3, wait=0, fail=1 *** 2012-05-05 23:00:57.402 kmmlrl: status: succ=3, wait=0, fail=1
Just a short explanation what you can see here: pmon distributes every 3 seconds the status he knows about all listeners he knows to all other listeners. Between 23:00:48 and 23:00:51 pmon found the local_listener disappeared - and informed all the other listeners about this fact.
what the LISTENER_SCAN2 knows:
lsnr1.txt
Service "TTT04_SITE1" has 2 instance(s). Instance "TTT041", status READY, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:0 refused:0 state:ready REMOTE SERVER (ADDRESS=(PROTOCOL=TCP)(HOST=<host1>-vip)(PORT=1521)) Instance "TTT042", status READY, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:0 refused:0 state:ready REMOTE SERVER (ADDRESS=(PROTOCOL=TCP)(HOST=<host2>-vip)(PORT=1521))

lsnr2.txt
Service "TTT04_SITE1" has 2 instance(s). Instance "TTT041", status READY, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:0 refused:0 state:blocked REMOTE SERVER (ADDRESS=(PROTOCOL=TCP)(HOST=<host1>-vip)(PORT=1521)) Instance "TTT042", status READY, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:0 refused:0 state:ready REMOTE SERVER (ADDRESS=(PROTOCOL=TCP)(HOST=<host2>-vip)(PORT=1521))
Direct after the kill of the local_listener, LISTENER_SCAN2 still believes both local_listeners are in state:ready - therefore it directs the connection to the listener with lower load (on node1) which I just killed.
But after only 5 seconds, it knows it is in state:blocked and therefore directs my 2nd connection attempt to node2.

What is this all about?

If a Listener dies (for any reason) there is a periode of about 3 seconds, where other processes might rely on it's existence and service.
PMON is the process which informas all listeners about the status of the others (one more reason to make sure it never stucks.
PMONs Listener REGISTRATION is something different.

My todays findings where supported by Understanding and Troubleshooting Instance Load Balancing [Note:263599.1].