orastar

Pramod Garre ORACLE DBA Blog

Cluster factor Data Analysis query

If  you have a query like select … where account_number = xxx , if you suspect higher gets/physical gets are becuase of poor clustering factor , following query can be used to do data Analysis

 

select max(account_rows), min(account_rows), avg(account_rows), median(account_rows), stddev(account_rows),
max(account_blocks), min(account_blocks), avg(account_blocks), median(account_blocks), stddev(account_blocks),
count(*) account_cnt,
sum(account_rows) sum_rows
from (
select /*+ parallel(a,32) */
account_number, count(distinct dbms_rowid.rowid_block_number(rowid)) account_blocks, count(1) account_rows
from  a
group by account_number
);

November 14, 2014 Posted by | Uncategorized | Leave a comment

Hint to fix Access problem with huge in-list

Some times if we have huge in list ORACLE will consider only first 2-3 of them in access and rest of them will be pushed down to filter

For ex :

select * from tab where id in (1,2,3,4,5,6,7 ) ;

execution plan can be like

access –  ID_INdex : ID =1 , ID =2

Filter  –  ID =3 …ID =7

This can solved by using the below hint

NUM_INDEX_KEYS(ID_Index ,25)

The above basically advice optimizer that up to 25 keys in in list should be considered for access itself

-Pramod

October 7, 2013 Posted by | Uncategorized | Leave a comment

Forcing the Execution plan using a SQL Profile

 

NOTE/DISCLAIMER : This is a test to demonstrate the feature on a test table ,Do not try this on production systems. Also some of the features mentioned here may need separate License , contact ORACLE support and make sure that you comply to Licences requirements,

Hello there,

How many times you know how to fix the execution plan of a SQL but could not make it effective as you may not be able to Change the SQL itself in the Application ??

With SQL Profiles , you can do that seamlessly and this is one of my favourite  methods as you can fix the Execution plan of a SQL with out any Associated Risks and Dependencies

For Instance you have following scenario

TABLE :

EMP

—————–

EMP_ID               NOT NULL                 NUMBER
EMP_NAME                                         VARCHAR2(10)
LOCATION                                           VARCHAR2(100)
SALARY                                              NUMBER

Indexes

UNIQUENES          type             INDEX_NAME          COLUMNS
——— ———- —————————— ——————————
UNIQUE               NORMAL         IND_EMP_PK               EMP_ID
NONUNIQUE       NORMAL          IDX_EMP_SALARY     SALARY

Say for some reason following query takes long time ( i tweaked stats ;) )  as it picks up wrong index on Salary instead of Primary key index

select  * from emp where emp_id = 3 and salary > 100;

So SQL_ID : 2hu2m70n0769t

Bad Plan Hash ( Plan picks up Index on salary ) : 3688361668

So now you know what is the current plan for this – which pick up Parimary key Index

So you can generate that plan in shared pool by forcing Index with a hint (No need to change Application sql , just get the current plan in shared Pool)

say your sql is

select  /*+ index(EMP IND_EMP_PK) */ * from emp where emp_id = 3 and salary > 100;

SQL_ID : a9h28y8d6s0m0

correct plan hash value (with PK Index) : 1688361996

Now you can ask all the queries which are coming from Application with SQL_ID  2hu2m70n0769t to pick up correct plan we just generated in shared pool – 1688361996 (using PK index instead of index on salary)

This is how we are going to do that

we can create a profile as follows

Note/Disclaimer  : You may need Additional Licence to use all these features , Contact ORACLE Support before you use these packages.

declare

ar_profile_hints sys.sqlprof_attr;

cl_sql_text clob;

begin

select

extractvalue(value(d), ‘/hint’) as outline_hints

bulk collect

into ar_profile_hints

from

xmltable(‘/*/outline_data/hint’

passing (

select

xmltype(other_xml) as xmlval

from

v$sql_plan

where

sql_id = ‘a9h28y8d6s0m0′

and plan_hash_value = 1688361996

and other_xml is not null

)

) d;

select sql_fulltext  into cl_sql_text   from  v$sqlarea  where sql_id = ‘ 2hu2m70n0769t';

dbms_sqltune.import_sql_profile(sql_text => cl_sql_text , profile => ar_profile_hints , category => ‘DEFAULT’ , name => ‘PROFILE_ 2hu2m70n0769t’

– use force_match => true

– to use CURSOR_SHARING=SIMILAR

– behaviour, i.e. match even with

– differing literals

, force_match => TRUE

);

end;

/

Now you have sql profile – PROFILE_ 2hu2m70n0769t with correct plan is created .Your application SQLs will use this correct plan becuase of SQL Profile wih out any change to Application queries :)

You will see that PROFILE_ 2hu2m70n0769t  is used in Execution plans of SQL.

For Generating correct plan, you can use various options depending on the query like hints , changing parameters etc…

This way we can Force  Application queries  to use correct plan with out making any change :)

Application teams will Love this (Not making change to code ) ;)

Feel free to leave the comments if you have any

Thanks

Pramod

March 27, 2013 Posted by | Uncategorized | Leave a comment

List Linux processes by CPU and Memory usage

This will be very handy

You know your CPU utilization on the linux system is high,next question is who is consuming most of it

Following command will give you nice breakdown

ps -e -o pid,pcpu,cpu,nice,state,cputime,args –sort pcpu | sed ‘/^ 0.0 /d’

Thanks

Pramod

November 17, 2011 Posted by | Uncategorized | Leave a comment

Find out patches applied from OEM

 

You  may have hundreds of servers in your infrastructure.You want to check patches applied on some of the  oracle_home of server.

Do you want to login to each server and run opatch lsinventory command ?

If i want to compare patch levels of couple of databases , do i have to login into each of those servers,run opatch lsinventory and then compare the results using excel etc.. ?

Well OEM grid control makes life easier now .OEM Grid gathers patch level information in different ORACLE_homes.You can also compare configuration between different homes.You can see the query corresponding query as well

I found this very useful when i wanted to compare patches on Dev Vs QA Vs Prod

This is how you do it, Check it out

Login to grid control and choose

Deployments – > Configuration -> Search -> Search Software Installations

Search Oracle Products Installed in Oracle Homes

Go for product oracle database Look for patches interim patches in home

-Pramod for ORASTAR

November 9, 2011 Posted by | Uncategorized | Leave a comment

ORACLE Interconnect Monitoring

Hello All,

Happy Diwali Folks !!

What is the capacity of your RAC interconnect ?

Most of us have “n Giga bit per second ” Interconnect.So  is this enough for you ?Is it under utilized ?

What is your interconnect usage ?

SA  : ” I see high interconnect usage in my network monitor . How is the utilization from ORACLE side ? “

Well , Answers to all these questions are  numbers.

Following are the ways to find out those answers

The first ,Best and infact the only data source where you can query upon is your AWR and ASH

In AWR Report , you can find out cluster load profile and it looks something like this

Global Cache Load Profile
~~~~~~~~~~~~~~~~~~~~~~~~~                  Per Second       Per Transaction
—————       —————
Global Cache blocks received:                             153.53                  0.50
Global Cache blocks served:                               324.09                  1.06
GCS/GES messages received:                      37,658.16                122.99
GCS/GES messages sent:                            21,760.92                 71.07
DBWR Fusion writes:                                         4.76                  0.02
 

Estd Interconnect traffic (KB) 15,426.27

Look at “Estd Interconnect traffic (KB)”..Wow,isn’t that useful.So during that period of time My interconnect throughput is 15 MBPS (My interconnect is highly underutilized) :)

Now is that accurate ?? I crosschecked this number with network stats on different platforms and most of the time , it’s accurate with network stats ofcourse shows a bit more traffic as packets,traffic from other OS operations

is also counted there.

So now i have the data and i can now plot a graph and check the trend.Wait Do i have to do it ? We have OEM , isn’t it

Go to OEM,select your target “Cluster ” > Performance tab > Additional links > cluster cache coherency

There you will find the graphical representation os the same stats mentioned above.

You will find sections like

Global cache block access latency , Global cache block transfer rate etc..

GC_Rate

For the above period of time i found that roughly 2000 blocks in  Global cache block transfer rate graph ,

which is roughly 16000 and is equal to Estd interconnect traffic mentioned above  :)

Also , in OEM you will find interconnect Link in cluster tab

ORACLE_Interconnect_post

So , Very easily you can identify whether you pipe ( interconnect ) utilization from above means

So , How ORACLE Calculated these number .Well obviously from AWR.I did some research and sound that these were captured from dba_hist_sysstat.

After more research figured out  following is the forumula(and it makes lot os sense )  used to calculate “Estd Interconnect Traffic “

Estd Interconnect traffic (KB): =((‘gc cr blocks received’

+ ‘gc current blocks received’ + ‘gc cr blocks served’
+ ‘gc current blocks served’) * Block size)
+ ((‘gcs messages sent’ + ‘ges messages sent’ + ‘gcs msgs received’
+ ‘gcs msgs received’)*200)/1024/Elapsed Time

Bottom line is you have the Data in AWR and ASH and it’s up to you how you want to use this powerful data.

Now you can tweak the data the way you want to the level of granularity you desire.

I have developed couple of scripts around this which are quiet useful for me , contact me if you need a copy

Please leave your comments

- Best

Pramod Garre for ORASTAR

 

October 26, 2011 Posted by | Uncategorized | Leave a comment

ORACLE 11G – Direct Path Read

After ORACLE 11G upgrade , one of the change i observed is ” High number of direct path reads”

As per ORACLE Documentation  and couple of sites on internet

 

“There have been changes in 11g in the heuristics to choose between direct path reads or reads through buffer cache for serial table scans.

In 10g, serial table scans for “large” tables used to go through cache (by default) which is not the case anymore.  In 11g, this decision to read via direct path or through cache is based on the size of the table, buffer cache size and various other stats.
Direct path reads are faster than scattered reads and have less impact on other processes because they avoid latches.

Also as per documentation  it will consider hidden paramter _small_table_threshold in it’s decision making(But during my tests it was proved that it’s not being considered as documented.It depends on number of blocks you are supposed to read”

So,is it good ? is it gonna degrade your system performance ? or what ?

As per definition “DIRECT PATH READ” is supposed to be fast isn’t it ?

Well .. All depends on your configuration.

Although ORACLE makes it’s decision depending on  various system,object and memory stats ,

if your PGA is not adequately sized or not configured properly , you will see degradation in your performance (which happened in my case during 1g upgrade testing ) because Direct path read reads directly into user memory area of PGA.So if your PGA is not configured properly and if there are so many concurrent sessions then you are gonna be in trouble

I also observed significant performance improvement if  we  configure  system properly and use this feature.

For any reason , if you don’t want to use this feature , you can disable it by setting the hidden parameter ” _serial_direct_read”  to False

 

Best

Pramod Garre for ORASTAR

 

 

October 8, 2011 Posted by | Uncategorized | Leave a comment

How to Tell if the IO of the Database is Slow – Numbergame

 

Hi There ,

Often i was asked to look into the performance issues of a test Database.

Well most of the time i find that they are not comparing Apples to Apples.

SO how to let them know that IO on this system is way expensive compared to the system they are drawing comparisons.

Well..Numbers speak for themselves :)

 

The efficiency of IO may be measured in 2 ways :

  • Response Time
    Measured in milliseconds an operation takes to complete. This statistic is gathered by Oracle.
  • Throughput
    Measured as the number of operations per unit of time. This is calculated using OS tools
    for example iostat on Unix.
For now let’s concentrate on Response time

Response time

Hardware does not necessarily respond in a uniform fashion for each IO request; there are always likely to be peaks and troughs. It is therefore common to measure response time using an average.

 

Types of IO

The average response time is directly related to the type of IO:

  • Read or Write

  • Single Block or MultiBlock

    Single block IO, as suggested by its name, reads only one at a time.
    For example, when a session waits for a single-block IO it may typically wait on the event “db file sequential read” to indicate that it is waiting for that block to be delivered.

    Multi-Block operations read more than one block at a time ranging from 2 to 128 Oracle blocks depending on the size of the block and operating system settings. Usually a multi-block request had a limit of 1Mb in size.
    For example when a session waits for a multi-block IO it may typically wait on the event  “db file scattered read” to indicate that it is waiting for those blocks to be delivered.

  • Synchronous or Asynchronous.

    Synchronous (Blocking) operations wait for the hardware to complete the physical I/O, so that they can be informed of and manage appropriately the success or failure of the operation (and to receive the results in the case of a successful read). The execution of the process is blocked while it waits for the results of the system call.

    With Asynchronous (Non-Blocking) operations the system call will return immediately once the I/O request has been passed down to the hardware or queued in the operating system (typically before the physical I/O operation has even begun). The execution of the process is not blocked, because it does not need to wait for the results of the system call. Instead, it can continue executing and then receive the results of the I/O operation later, once they are available.

Expected thresholds for response time

A typical multi-block synchronous read of 64 x 8k blocks (512kB total) should have an average of at most 20 milliseconds before worrying about ‘slow IO’.  Smaller requests should be faster (10-20ms) whereas for larger requests, the elapsed time should be no more than 25ms.

Asynchronous operations should be equally as fast as or faster than synchronous.
Single Block operations should be as fast as or faster than multi-block

‘log file parallel write’, ‘control file write’ and ‘direct path writes’ should be no more than 15ms.

Other wait events and statistics are used to reveal if DBWR (multiple or single, with or without IO slaves) is fast enough to clean the dirty blocks.

As a rule, times higher than those stated above should usually be investigated as should any change for the worse noticed when comparing with previous timings taken.

Note: Just because a system is below these maximum thresholds does not mean that there are no more available tuning opportunities.

Response Times will vary from system to system. As an example, the following could be considered an acceptable average:

10 ms for MultiBlock Synchronous Reads
5 ms for SingleBlock Synchronous Reads
3 ms for ‘log file parallel write’

This is based on the premise that multiblock IO may require more IO subsystem work than a single block IO and that, if recommendations are followed, redo logs are likely to be on the fastest disks with no other concurrent activity.

Identifying  IO Response Time

Oracle records the response time of IO operations as the “Elapsed Time” indicated in  specific wait events and statistics.”Response time” and “elapsed time” are synonymous and interchangeable terms in this context.
Below is a list of some of the more popular wait events and their typical acceptable wait times (not an exhaustive list)
Wait Event R/W Synchronous
/Asynchronous
Singleblock/
Multiblock
Elapsed Time
(with 1000+ waits per hour)
control file parallel write

Write

Asynchronous Multi < 15ms
control file sequential read

Read

Synchronous Single < 20 ms
db file parallel read Read Asynchronous Multi < 20 ms
db file scattered read Read Synchronous Multi < 20 ms
db file sequential read Read Synchronous Single < 20 ms
direct path read Read Asynchronous Multi < 20 ms
direct path read temp Read Asynchronous Multi < 20 ms
direct path write Write Asynchronous Multi < 15 ms
direct path write temp Write Asynchronous Multi < 15 ms
log file parallel write Write Asynchronous Multi < 15 ms

Sources in Oracle identifying Response Time

10046 Trace File

      When level 8 or 12 is specified in the 10046 trace, wait events are included.

 

      The response time is specified in

ela

       field.

 

WAIT #5: nam='cell single block physical read' ela= 672 cellhash#=2520626383 diskhash#=1377492511 bytes=16384 obj#=63 tim=1280416903276618

672 microseconds = 0.672 ms

WAIT #5: nam='db file sequential read' ela= 1018 file#=2 block#=558091 blocks=1 obj#=0 tim=10191852599110

1018 microseconds => 1.018 ms

System State Dump

For each process in a system state, the wait information is included among the other process information. This will either show an active wait:  “waiting for” or a case where waiting is completed and the process is on CPU : “waited for” / “last wait for”.
  • “waiting for”
    This means that the process is currently in a wait state.
    Prior to 11g the field to look at is “seconds since wait started” which shows how long the process has been waiting on this wait event
    Starting 11gR1 the field to look at is “total” which is the total time elapsed on this wait.

    If  a process is indicated to be  “waiting for” an IO related operation and “seconds since wait started” > 0 most likely this means that the IO got “lost” and the session can be considered to be hanging. (since we have mentioned previously that an average acceptable wait would be 20ms , any IO wait of duration > 1 second is a cause for concern)

  • “last wait for” is relevant in versions prior to 11g and indicates a process that is no longer waiting (ie it is on CPU). The last wait is recorded and its wait time indicated in “wait_time” field. (In 11g, “last wait for” is replaced by “not in wait”)
    last wait for 'db file sequential read' blocking sess=0x0 seq=100 wait_time=2264 seconds since wait started=0
    file#=45, block#=17a57, blocks=1

    2264 microseconds => 2.264 ms

  •  “waited for” means the session is no longer waiting. This is used in systemstate trace starting from 11gR1.The field to look at is “total” indicating the total time waited.
    0: waited for 'db file sequential read' file#=9, block#=46526, blocks=1
    wait_id=179 seq_num=180 snap_id=1
    wait times: snap=0.007039 sec, exc=0.007039 sec, total=0.007039 sec
    wait times: max=infinite
    wait counts: calls=0 os=0

    0.007039 sec => 7.039 ms

Statspack and AWR reports

Foreground and Background Wait Events

These reports show sections detailing waits by both foreground and background operations separately. The following is an example of such a section:

                                                             Avg
                                        %Time Total Wait    wait    Waits   % DB
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file sequential read       2,354,428     0      8,256       4      2.6   21.2
db file scattered read           23,614     0         48       2      0.0     .1

In this report the average Response Time is shown by the Av Rd (ms) column (Average Read in Milliseconds)

Tablespace IO Stats

The Tablespace section of these reports also gives useful information from the tablespace perspective:

Tablespace                                                                      
------------------------------                                                  
                 Av       Av     Av                       Av     Buffer  Av Buf 
         Reads Reads/s  Rd(ms) Blks/Rd       Writes Writes/s      Waits  Wt(ms) 
-------------- ------- ------- ------- ------------ -------- ---------- ------- 
APPS_DATA                                                                       
     1,606,553     446     2.2     8.3       75,575       21     60,542     0.9 
                                                                                

Again the Read Response Time is indicated by the Av Rd (ms) column (Average Read in Milliseconds). The write IO time is indicated by the amount of time it is having to wait to service buffer writes which is indicated in the report by the Av Buf Wt(ms) column (Average time to write a buffer in Milliseconds)

Wait Event Histogram

      The Wait event histogram section can provide useful information regarding the spread of the write times that makes up the average. It can show if the average is made up of many writes close to the average or if it is being skewed by a few very large or small values:

                                                  % of Waits                  
                                 -----------------------------------------------
                           Total                                                
Event                      Waits  <1ms  <2ms  <4ms  <8ms <16ms <32ms  <=1s   >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
db file parallel read       4139    .2    .5   2.5  26.4  23.5  15.0  31.9    .1
db file parallel write      329K  88.5   4.0   2.1   1.9   2.3   1.1    .3    .0
db file scattered read     14.4K  54.3   8.5   6.1  16.6  11.5   2.6    .4

Each column indicates the percentage of wait events waited up to the time specified between each bucket. For example the waits indicated under “<16ms” are those that are greater than “8ms” but less than “16ms”.

As long as the greatest percentage of waits are in the buckets from <1ms up to 16ms then the IO performance is generally acceptable.

 

October 7, 2011 Posted by | Uncategorized | Leave a comment

ASH – Your performance tuning PAL

Hi There,

Do we have to tune our system proactively ?

If it is production system , answer is yes.Because if you run in to any problem the business loss is unimaginable(Then we always say “Ah!!We should have found it earlier “).

That being said ,  recently i was doing proactive health check of one of the production database

I  found that every morning around 8:15 AM CPU Utilization is been jumping to 100% and it remain at 100%   for 30 Mins.Load on the system is 40 on a 32 CPU Machiene.Boy!!Definitely Alarming and we have to look into it.And this has been happening everyday.

So,to quickly get a clear picture of what’s going on,I called my performance tuning PAL ..ASH!! which is really  helpful in debugging past issues.(It’s amazing how easily we can look into past using ASH)

I opted for Bottom Up Approach ( Server -> DB -> Waits -> SQLs etc..)

Step 1 : what are the wait events during that time

Select * from

(

Select instance_number,name,count(*) from

SYS.wrh$_active_session_history h, v$event_name n

where n.event_id = h.event_id

and SAMPLE_TIME between to_date(&start_time,’YYYYMMDDHH24MISS’) and to_date(&end_time,’YYYYMMDDHH24MISS’)

group by instance_number,name

order by count(*) desc

)

Where rownum < 20;

INSTANCE_NUMBER NAME                                       COUNT(*)

————— —————————————- ———-

1 read by other session                          1023

 1 latch: cache buffers chains                 1016

1 db file sequential read                         529

1 gc buffer busy acquire                          289

1 direct path write temp                          154

1 direct path read temp                           114

2 db file sequential read                          89

1 asynch descriptor resize                         26

1 direct path read                                 25

1 SQL*Net more data from client                    20

Well Clearly , Seems to be a Concurrency and Hot block Isuue ( Later got p1,p2,p3 values for those wait events,which will be useful in my further analysis

2 )  Now..Let me see what are the queries which are waiting on these events

Select * from (

Select instance_number,name,sql_id,count(*) from

SYS.wrh$_active_session_history h, v$event_name n

where n.event_id = h.event_id

and SAMPLE_TIME between to_date(20111005081000,’YYYYMMDDHH24MISS’) and to_date(20111005082500,’YYYYMMDDHH24MISS’)

and name in (‘read by other session’,’latch: cache buffers chains’)

group by instance_number,name,sql_id

order by count(*) desc

) where rownum < 20;

select distinct sql_id,session_id

from

SYS.wrh$_active_session_history h, v$event_name n

where n.event_id = h.event_id

and SAMPLE_TIME between to_date(20111005081000,’YYYYMMDDHH24MISS’) and to_date(20111005082500,’YYYYMMDDHH24MISS’)

order by 1

INSTANCE_NUMBER NAME                                     SQL_ID          COUNT(*)

————— —————————————- ————- ———-

1 read by other session                    3pq701ucawgkb        652

1 latch: cache buffers chains              3pq701ucawgkb        568

1 latch: cache buffers chains              1gxa5pcuq91sn        408

1 read by other session                                         130

1 read by other session                    4tku3gphdg45q        125

1 read by other session                    4vhk9dxxkbc2p         66

1 read by other session                    640mwfa0dnacd         44

1 latch: cache buffers chains              g5yhu2n9juap6         35

OK..Clearly  3pq701ucawgkb  and 1gxa5pcuq91sn..

Wait these are not new queries.hmm..execution plan seems not changed ( Checked sql_hash and plan_hash combination)

3 ) OK..Next thing is to check concurrency.I know these queries, these are kind of reporting queries.Why there is concurrency issue for this. Ok..let me check how many sessions are executing these queries at a time during that time span

select distinct sql_id,session_id

from

SYS.wrh$_active_session_history h, v$event_name n

where n.event_id = h.event_id

and SAMPLE_TIME between to_date(${begin_time},’YYYYMMDDHH24MISS’) and to_date(${end_time},’YYYYMMDDHH24MISS’)

There you go!! 90 concurrent sessions(Not Ash Sample count   ..multiple occurences from 60 sessions during that period) executing those queries

Knowing this application for quiet some time , this should not happen

Rang the application team and found that their parameter got accidentally changed in their application parameter file and hence it’s  triggering as many as 90 sessions

Thank you ASH and i am glad i found out this problem and fixed it proactively.

I know there would be utter chaos if this  system goes down because  of  High resource utilization..

-Pramod Garre  for ORASTAR

October 7, 2011 Posted by | Uncategorized | 1 Comment

ORACLE 11GR2 RDBMS list of known issues/Bugs

What is one of the basic activities of ORACLE DBA   –  ORACLE Upgrades

What is one of the basic things to do during upgrades  – Make sure that performance of the system is not degraded , don’t run into new issues (if not improve performance and use new features :))

keeping this in mind i added “find  list of known issues and bugs”  as an important exercise before going to new version so that i need not to face these documented issues  and spend sleepless nights post upgrade :)

After all in many cases we raise a case and apply patch why not apply it well before :)

That being said , following are the ORACLE 11gR2 RDBMS list of known issues and patches.Feel free to comment

Oracle_11g_Known_Issues

Best

- ORASTAR

September 29, 2011 Posted by | Uncategorized | Leave a comment

Follow

Get every new post delivered to your Inbox.