Friday, December 7, 2007

Solaris 10/Oracle: Fixing ORA-27102: out of memory Error

Symptom:

As part of a database tuning effort you increase the SGA/PGA sizes; and Oracle greets with an ORA-27102: out of memory error message. The system had enough free memory to serve the needs of Oracle.

SQL> startup
ORA-27102: out of memory
SVR4 Error: 22: Invalid argument

Diagnosis

$ oerr ORA 27102
27102, 00000, "out of memory"
// *Cause: Out of memory
// *Action: Consult the trace file for details

Not so helpful. Let's look the alert log for some clues.

% tail -2 alert.log
WARNING: EINVAL creating segment of size 0x000000028a006000
fix shm parameters in /etc/system or equivalent

Oracle is trying to create a 10G shared memory segment (depends on SGA/PGA sizes), but operating system (Solaris in this example) responded with an invalid argument (EINVAL) error message. There is a little hint about setting shm parameters in /etc/system.

Prior to Solaris 10, shmsys:shminfo_shmmax parameter has to be set in /etc/system with maximum memory segment value that can be created. 8M is the default value on Solaris 9 and prior versions; where as 1/4th of the physical memory is the default on Solaris 10 and later. On a Solaris 10 (or later) system, it can be verified as shown below:

% prtconf | grep Mem
Memory size: 32760 Megabytes

% id -p
uid=59008(oracle) gid=10001(dba) projid=3(default)

% prctl -n project.max-shm-memory -i project 3
project: 3: default
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-shm-memory
       privileged      7.84GB      -   deny                                 -
       system          16.0EB    max   deny                                 -

Now it is clear that the system is using the default value of 8G in this scenario, where as the application (Oracle) is trying to create a memory segment (10G) larger than 8G. Hence the failure.

So, the solution is to configure the system with a value large enough for the shared segment being created, so Oracle succeeds in starting up the database instance.

On Solaris 9 and prior releases, it can be done by adding the following line to /etc/system, followed by a reboot for the system to pick up the new value.

set shminfo_shmmax = 0x000000028a006000

However shminfo_shmmax parameter was obsoleted with the release of Solaris 10; and Sun doesn't recommend setting this parameter in /etc/system even though it works as expected.

On Solaris 10 and later, this value can be changed dynamically on a per project basis with the help of resource control facilities . This is how we do it on Solaris 10 and later:

% prctl -n project.max-shm-memory -r -v 10G -i project 3

% prctl -n project.max-shm-memory -i project 3
project: 3: default
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-shm-memory
       privileged      10.0GB      -   deny                                 -
       system          16.0EB    max   deny                                 -

Note that changes done with prctl command on a running system are temporary, and will be lost when the system is rebooted. To make the changes permanent, create a project with projadd command as shown below:

% projadd -p 102  -c 'eBS benchmark' -U oracle -G dba  -K 'project.max-shm-memory=(privileged,10G,deny)' OASB

Finally make sure the project is created with projects -l or cat /etc/project commands.

% projects -l
...
...
OASB
       projid : 102
       comment: "eBS benchmark"
       users  : oracle
       groups : dba
       attribs: project.max-shm-memory=(privileged,10737418240,deny)

% cat /etc/project
...
...
OASB:102:eBS benchmark:oracle:dba:project.max-shm-memory=(privileged,10737418240,deny)

With these changes, Oracle would start the database up normally.

SQL> startup
ORACLE instance started.

Total System Global Area 1.0905E+10 bytes
Fixed Size                  1316080 bytes
Variable Size            4429966096 bytes
Database Buffers         6442450944 bytes
Redo Buffers               31457280 bytes
Database mounted.
Database opened.

Wednesday, November 28, 2007

Facking ASM Disk And Configration on Windows

Hard Disk Faking Methods
ASM takes units of physical storage and wraps them up into a logical volume which is then presented to databases as a single chunk of available storage. To create an ASM storage array on a laptop or desktop PC, therefore, you need to be able to present Oracle with a number of 'chunks' of physical storage. One way to do that would be to attach half a dozen hard disks to your laptop, of course... but that would make for an, er... interesting laptop! A slightly more practical way of doing it would be to re-partition your existing hard disk and to carve out half a dozen fresh, unformatted logical partitions. That's certainly do-able, but it's very intrusive and disruptive: chances are, you'd rather not mangle your existing PC quite so dramatically.

There is, of course, always the option of installing VMware and creating half a dozen virtual hard drives for a brand new virtual PC. That is the sort of approach I use a lot myself for my Laptop RAC articles, after all, so it is definitely do-able -and even more so now that VMware Server is available for free. Unfortunately, installing Windows into a virtual PC is supposed to require an additional full license from Microsoft (meaning: pay for a fresh copy), and in any case virtual PCs run slower than their physical counterparts. Slow and expensive are not ideal qualities for this sort of learning experience!

So, in this article, I am going to fake hard disks using much the same sort of technique as I used in my Laptop ASM on Linux article: create a set of 'solid' files and get Oracle to recognise them as usable storage, as though they were hard disks. In Linux, you make this approach work by using operating system loopback devices, and the operating system itself is thus fooled into thinking it has new hard disks at its disposal. There is no equivalent approach for Windows, however. Instead, we have to use an init.ora parameter that fools the Oracle instance into thinking a handful of files represent physical storage. This approach will certainly work, but it relies on a hidden initialisation parameter, and is therefore completely, 100% an absolute no-no in a production setting!

It's easier to do in practice than to describe in theory, so let's begin!

Creating the Fake Hard Disks
As part of the standard 10g Release 2 installation, Oracle ships an executable called asmtool, and it can be used to create 'solid' files -solid in the sense that they are filled with zeros, as opposed to being full of empty space. These are the sorts of files ASM demands if it is to treat them as hard disks. The utility is found in the ORACLE_HOME\bin directory, which is generally made part of your PATH during a conventional Oracle installation. You should therefore be able to invoke the tool from anywhere, and hence the following commands should work:

md c:\asmdisks
asmtool -create c:\asmdisks\disk1 1000
asmtool -create c:\asmdisks\disk2 1000
asmtool -create c:\asmdisks\disk3 1000
asmtool -create c:\asmdisks\disk4 1000
asmtool -create c:\asmdisks\disk5 1000

That lot will create you five files each 1GB in size (that is, 1000MB) and store them in a C:\ASMDISKS directory (which could have been called anything, of course).

And that's all there is to it. You now can go about creating an ASM instance which will treat these five files as though they were 5 1GB hard drives.

Creating the ASM Instance
Traditional Logical Volumes have traditional Logical Volume Managers. ASM has a 'logical volume manager', too -but it happens to be in the form of an Oracle instance, complete with Shared Pool, Large Pool and a veritable flotilla of background processes. All up, an ASM instance will be somewhere between 64MB and 100MB in size, which is not too excessive, all things considered.

Before you can create an ASM instance on 10g Release 2, though, you must first run a script which instantiates the Oracle Cluster Registry and its associated processes & services. That's done by issuing the command:

C:\oracle\product\10.2.0\db_1\bin\localconfig add

You might wonder why you have to muck about with clustering technology on a single desktop PC, but that's just the way things are done: even single-instance, single-node, non-clustered Oracle uses a cluster registry to record the existence of ASM instances, and regular(database) instances are hard-coded to check it to find out whether any ASM instances exist that are capable of offering and providing storage facilities. Anyway, if you don't run that script, any attempt to start an ASM instance will return with the error 'Cannot contact Cluster Manager', so run it now!

Now, with that done, it would be nice if we could use 10g Release 2's dedicated ASM Instance creation tool... but because we're going to have to use an unsupported hidden initialisation parameter to get the ASM instance to use our fake hard disks as though they were the real thing, we can't. The Database Configuration Assistant's new option to create ASM instances will not allow any non-standard parameters to be specified, which means, for us, it is crippled and of little use. Not so in a true production environment, of course, but for the purposes of this article, it's a non-starter.

Instead, we'll do everything at the command line. The first thing we're going to do is to create a new init.ora for the future ASM instance:

cd \oracle\product\10.2.0\db_1\database
notepad init+ASM.ora

In the new document that last command creates, enter the following initialisation parameters and values:

INSTANCE_TYPE=ASM
DB_UNIQUE_NAME=+ASM
LARGE_POOL_SIZE=8M
ASM_DISKSTRING='c:\asmdisks\*'
_ASM_ALLOW_ONLY_RAW_DISKS=FALSE

Save the modified file, and check that it hasn't accidentally been given a .txt extension (one of Notepad's favourite tricks for the unwary!)

As you can see, an ASM Instance's init.ora is pretty thin stuff! It has to be identified as an ASM instance (INSTANCE_TYPE=ASM, instead of RDBMS), because that way it won't keep expecting to open a database. It has to be given a unique name, and it must have a large pool (which is optional for regular instances). The ASM_DISKSTRING parameter tells the instance where to find its physical storage units, and in this case is simply the directory where our solid files are stored.

All of that is standard stuff (the specific value for ASM_DISKSTRING will vary in a production environment, of course). The really unusual parameter here is the last one: it starts with an underscore parameter, which makes it a hidden parameter and thus completely unsupported. By disallowing only the use of raw disks, it allows the use of files which look like raw disks, and is thus the magic ingredient that lets this simulation work. You would never use it in a production setting, though.

This being Windows, one further thing needs taking care of before our ASM instance can run: instances on Windows need a Windows Service to be created which will provide them with their memory and threads. That always has, and still is, created by running the oradim utility like so:

oradim -NEW -ASMSID +ASM -STARTMODE auto

The only change between this and the standard ORADIM syntax used to create 'regular' instances is that it uses a -ASMSID switch instead of a -SID one. Otherwise, it works identically, and should mean that your ASM instance is now running.

You can test whether that is indeed the case by trying to connect to it, of course:

C:\>set ORACLE_SID=+ASM
C:\>sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on Tue May 30 07:45:13 2006

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> show user
USER is "SYS"

Notice I have to set the right ORACLE_SID first, but otherwise you connect to ASM instances exactly as you would to a normal instance -though you have to do it as SYS, because ASM instances can't consult a data dictionary to check up on ordinary users and their passwords. Anyway: we know this instance is running because otherwise, we'd have gotten a message that said, 'Connected to an idle instance'. But if you want incontrovertible proof:

SQL> select status from v$instance;

STATUS
------------
STARTED

Which goes to prove, incidentally, that a lot of the V$ views you are used to with standard instances work perfectly well in ASM instances, too.

Creating the ASM Diskgroup
Once we have an ASM instance running, we can check whether it knows about our fake hard disks, because if it doesn't know about them now, it won't be able to turn them into usable storage later:

set pagesize 3000
set linesize 132
column path format a55
select path, mount_status from v$asm_disk;

PATH MOUNT_S
------------------------------------------------------- -------
C:\ASMDISKS\DISK1 CLOSED
C:\ASMDISKS\DISK2 CLOSED
C:\ASMDISKS\DISK5 CLOSED
C:\ASMDISKS\DISK4 CLOSED
C:\ASMDISKS\DISK3 CLOSED

(Tip: if this query initially returns no rows, check that your init+ASM.ora doesn't have 'curly quotes' around the ASM_DISKSTRING value: they need to be plain old 'straight quotes', but cutting and pasting from various browsers on Windows can change that).

You'll notice all five 'disks' have been detected according to this special ASM V$ view, but are currently 'closed' -because I haven't turned them into a disk group yet. That requires that I issue this command:

create diskgroup DGROUP1 normal redundancy disk
'c:\asmdisks\disk1',
'c:\asmdisks\disk2',
'c:\asmdisks\disk3',
'c:\asmdisks\disk4',
'c:\asmdisks\disk5';

Once that command completes, the earlier query now returns this result:

select path, mount_status from v$asm_disk;

PATH MOUNT_S
------------------------------------------------------- -------
C:\ASMDISKS\DISK1 CACHED
C:\ASMDISKS\DISK2 CACHED
C:\ASMDISKS\DISK3 CACHED
C:\ASMDISKS\DISK4 CACHED
C:\ASMDISKS\DISK5 CACHED

This time, a status of 'CACHED' means the disks have been 'wrapped' into a single 5GB diskgroup and are available to have databases make use of them.

All that remains to do is to make sure that the ASM instance automatically "mounts" this new diskgroup every time it starts up. That is done by adding a new parameter, ASM_DISKGROUPS, to the initialisation file, and you can achieve that most simply by cutting-and-pasting these commands:

create spfile from pfile;
startup force;
alter system set asm_diskgroups=DGROUP1 scope=spfile;
startup force;

After the second of those forced startups, you should see this:

ASM instance started

Total System Global Area 79691776 bytes
Fixed Size 1247396 bytes
Variable Size 53278556 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted

...with the last line there indicating success in auto-mounting the disk group.

Migrating a Database to ASM
There is no requirement for a database to use ASM storage exclusively: it's perfectly legitimate for a database to have some of its datafiles on 'conventional' storage and some inside an ASM array, in other words. But, practically, you won't want to run in such a transitional, half-and-half, state for long. ASM is supposed to make storage administration easier, not complicate things! Ultimately, therefore, you will probably want to convert a database that is using conventional storage into one that uses purely ASM -and there's only one tool which can do that (because it's the one tool that know how to read from and write to both conventional file systems and ASM): Oracle's Recovery Manager (RMAN).

The basic principle of conversion is easy enough: you get RMAN to backup your existing database 'into' the ASM array, and then you tell RMAN to make the just-performed backup the actual production database with the new 10g command switch database to copy. This principle applies equally well if you want to move a database from one ASM storage array to another, or indeed from an ASM array back to a conventional file system... it's always the same idea: backup to your desired destination, whatever that may be, and then "switch" to the backup.

There are just two flaws with this inspirationally-simple idea: RMAN never backs up temporary tablespace or online redo logs, so those require a bit of post-conversion manual tinkering to get right!

Apart from that, though, it's really very simple -but it's best if I work my way through the process to prove the point, rather than just claiming it to be so!

Preparing the Database
Before you begin backing up the database, you need to set a couple of its initialisation parameters to new values, as follows:

alter system set db_create_file_dest='+DGROUP1' scope=spfile;
alter system set db_recovery_file_dest='+DGROUP1' scope=spfile;
alter system set db_recovery_file_dest_size=3000M scope=spfile;
alter system reset control_files scope=spfile sid='*';

The first two commands are needed so that RMAN knows by default where to write its backups and where to create new online redo logs and datafiles. The third command is required so that RMAN doesn't create future control files in their current location: once CONTROL_FILES is reset the DB_CREATE_FILE_DEST parameter becomes the location where control files are created -and, in my case, that means 'inside the ASM diskgroup', which is what we're after, of course. (I'm ignoring the possible use of DB_CREATE_ONLINE_LOG_DEST_, which takes precedence over the CREATE_FILE_DEST parameter, if set).

Remember that these are settings that are applied to the real, conventional instance -so don't forget to set ORACLE_SID=orcl before trying to connect with SQL*Plus (assuming your database name is 'orcl', of course!)

Once those three commands have been issued, the next two commands are simple enough:

shutdown immediate
startup nomount

We want a nice, clean shutdown so that we don't have to perform recovery on an inconsistent backup of a badly shutdown database, and we need an instance for RMAN to connect to, but can't go any further than that, because we've unset the details of where to find the database's control files.

Use RMAN to Bulk Migrate
You're now ready to fire up RMAN and backup the database "into" the ASM array. The commands are really pretty standard RMAN stuff:

C:\>rman target /

restore controlfile from 'C:\oracle\product\10.2.0\oradata\orcl\control01.ctl';

alter database mount;

backup as copy database format '+DGROUP1';

switch database to copy;

alter database open;

exit;

The first command causes RMAN to copy the specified source controlfile into the +DGROUP1 disk group. It will actually copy it twice, because both the CREATE_FILE_DEST and RECOVERY_FILE_DEST parameters have been set. Once the copies are safely inside the ASM array, the next command causes the instance to mount them: the operative controlfiles are now ASM-based, not file system ones.

The next command is new in 10g: take an image copy of each of the datafiles of the database in turn. The copies are made into the ASM diskgroup, because that's what the FORMAT clause says to do.

The real magic then happens with the next command: 'switch database to copy' means 'that backup you just took should now be regarded as the operative database, not just a copy'... and, of course, that means all my datafiles are now inside the ASM array. I can therefore open the database, and quit RMAN.

Sorting out the Extra Bits
RMAN gets the bulk of the database conversion right without too much pain, I think you'll agree. But as I mentioned earlier, it doesn't know how to cope with temporary tablespaces nor with the online redo logs, as these queries executed on the newly-opened database nicely illustrate:

SQL> select name from v$controlfile;

NAME
-----------------------------------------------------
+DGROUP1/orcl/controlfile/backup.256.591786977
+DGROUP1/orcl/controlfile/backup.257.591786979

SQL> select name from v$datafile;

NAME
-----------------------------------------------------
+DGROUP1/orcl/datafile/system.258.591787015
+DGROUP1/orcl/datafile/undotbs1.261.591787141
+DGROUP1/orcl/datafile/sysaux.259.591787091
+DGROUP1/orcl/datafile/users.262.591787149
+DGROUP1/orcl/datafile/example.260.591787125

SQL> select member from v$logfile;

MEMBER
-----------------------------------------------------
C:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO03.LOG
C:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO02.LOG
C:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO01.LOG

SQL> select name from v$tempfile;

NAME
-----------------------------------------------------
C:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\TEMP01.DBF

So, we see the scale of the problem.

Well, the easiest issue to tackle first is the temporary tablespace one:

alter tablespace TEMP add tempfile;

alter database tempfile 'C:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\TEMP01.DBF' drop;

It might surprise you thatyou can drop temporary files from a temporary tablespace, but you've been able to do that since version 8i, so it's not a new feature! In any case, the effect of those two commands is seen easily enough:

SQL> select name from v$tempfile;

NAME
-----------------------------------------
+DGROUP1/orcl/tempfile/temp.265.591787645

Which leaves just the online redo logs to sort out. There's no easy way of migrating them: the best you can do is to create new groups inside the ASM array and drop the old ones that are outside of it:

alter database add logfile size 50m;
alter database add logfile size 50m;
alter database add logfile size 50m;

SQL> select group#, status from v$log order by group#;

GROUP# STATUS
---------- ----------------
1 INACTIVE
2 CURRENT
3 INACTIVE
4 UNUSED
5 UNUSED
6 UNUSED

You can see the original three logs, and the new three. I can't drop my original group 2, though, because it's marked as CURRENT. But I can at least start by dropping groups 1 and 3:

alter database drop logfile group 1;
alter database drop logfile group 3;

In theory, a log switch or two should then cause Group 2 to become inactive:

SQL> select group#, status from v$log order by group#;

GROUP# STATUS
---------- ----------------
2 ACTIVE
4 CURRENT
5 UNUSED
6 UNUSED

One switch won't do it, because a log becomes ACTIVE before it becomes INACTIVE, and only INACTIVE logs can be dropped. Maybe one more switch?

SQL> select group#, status from v$log order by group#;

GROUP# STATUS
---------- ----------------
2 ACTIVE
4 ACTIVE
5 CURRENT
6 UNUSED

But no: now two logs are active, and so it goes on. In fact, for this database which is not in archivelog mode, I find that this is required:

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup
ORACLE instance started.

Total System Global Area 167772160 bytes
Fixed Size 1247900 bytes
Variable Size 79693156 bytes
Database Buffers 83886080 bytes
Redo Buffers 2945024 bytes
Database mounted.
Database opened.

SQL> select group#, status from v$log order by group#;

GROUP# STATUS
---------- ----------------
2 INACTIVE
4 INACTIVE
5 CURRENT
6 UNUSED

SQL> alter database drop logfile group 2;

Database altered.

SQL> select member from v$logfile;

MEMBER
---------------------------------------------------------
+DGROUP1/orcl/onlinelog/group_4.266.591787973
+DGROUP1/orcl/onlinelog/group_4.267.591787993
+DGROUP1/orcl/onlinelog/group_5.268.591788059
+DGROUP1/orcl/onlinelog/group_5.269.591788069
+DGROUP1/orcl/onlinelog/group_6.270.591788083
+DGROUP1/orcl/onlinelog/group_6.271.591788093

6 rows selected.

It's a bit long-winded, but an instance bounce clears the spurious 'ACTIVE' status, allowing me to finally drop the last of the original log files. As the final query shows, I now have three groups, multiplexed twice (because I set both CREATE_FILE_DEST and RECOVERY_FILE_DEST earlier), safely inside my ASM array.

Finally, now that every part of my database is using the ASM array, I should remember to delete all the bits that were stored on my C: drive originally.

Migration Variations
I'll conclude this small example of database migration to ASM by pointing out its obvious deficiencies: it requires a lot of downtime, and (at least momentarily) double the amount of disk space currently occupied by your database.

However, you can minimise the downtime and the disk space requirements by migrating one tablespace at a time. Where I had backup database as copy, you could instead issue the RMAN command backup tablespace BLAH as copy..., followed by switch tablespace to copy instead of my earlier switch database to copy. Instead of the entire database being in the MOUNT state for the duration, only the tablespace being copied needs to be taken offline before the copy is taken.

Ultimately, however, you cannot migrate to ASM without some downtime: not only is each tablespace out of commission whilst it's being migrated, but there must come a time when you want to migrate SYSTEM, and you can't take SYSTEM offline -except by getting into the MOUNT state.

I ought to mention, too, that I have glossed over two 10g new features which might cause problems: RMAN doesn't backup the block change tracking file, nor the flashback logs (assuming that either are enabled, which they might not be, of course). Therefore, there is no direct way of moving the existing change tracking file or flashback logs to an ASM array. Instead, you have to disable block change tracking and then re-enable it once your DB_RECOVERY_FILE_DEST is set to the ASM disk group and the rest of the database has migrated. Similarly, flashback logs are simply destroyed and re-created by disabling and then re-enabling flashback database: they get re-created wherever the RECOVERY_FILE_DEST parameter is set to, of course.

Finally, don't forget that there may be a multitude of other initialisation parameters which are currently set to traditional file systems but which might now need modifying: archiving destinations, for example. You might also want your spfile inside the ASM array, in which case don't forget the create pfile from spfile and create spfile='+DGROUP1' from pfile commands. Don't forget, too, that you'll need to leave behind (in ORACLE_HOME\database) a one-line init.ora which points to the new location of the spfile with the entry SPFILE='+DGROUP1/DB_NAME/SPFILE/spfileDB_NAME.ora'.

Administering ASM with Enterprise Manager
For the last part of this article, I thought I'd show you how to get Enterprise Manager's Database Control to be aware of the ASM instance, and how to begin monitoring and managing the ASM instance with it. It's not a comprehensive guide to Enterprise Manager: that would take a lifetime to write. But hopefully, it will be enough to get you started on a course of self-motivated experimentation!

The Nature of the Problem
You launch Enterprise Manager in 10g by launching your web browser, and pointing it to http://localhost:1158/em. Ideally, you replace "localhost" there with the actual name of your PC. You'll be prompted to log on to the database, and although it's not particularly good practice in a production environment, I suggest for the purposes of this article you log on as SYS. The first time you do so, you'll have to agree to Oracle's licensing conditions (whcih, in a nutshell, mean 'Anything really useful contained within costs extra money'!)

At this point, you should see the Enteprise Manager/Database Control home page

Sunday, November 25, 2007

New Parameters in 11g

1.0 Introduction

One of the first things I do whenever a new version of Oracle comes out is to check what new initialisation parameters we have been given and what old ones have been taken away. My theory is that inspecting parameters in this way is a pretty good guide to new and obsolete functionality – which is the thing we are actually worried about, of course.

There are 289 parameters listed in the 11g Enterprise Edition's V$PARAMETER view, as compared to 10g Release 2's 258. The difference is made up of 34 new parameters and 3 obsoleted ones. Having laboriously computed the differences myself, I then discovered I could just have read all about it here, but that's life!

In this article, I'm going to run through the differences and give a hint of what the changes mean to core database functionality. Because there are so many of them, I'll knock off about 7 per article: that should keep you busy for the week!

ASM_PREFERRED_READ_FAILURE_GROUPS

Obviously only of relevance if you're running ASM. Failure groups have been around since ASM was invented: they let you specify which set of disks should be used to provide redundancy to which other disks in a diskgroup. This new parameter is intended for clustered ASM instances and specifically for 'extended clusters' -where nodes in a cluster are separated by kilometres of dark fibre. The disk group being used for redundant information might be the disk group closest to your node -and the new parameter lets your instance? read from the redundant copy of the data rather than the 'live' copy which is stored on a remote node (and which would therefore take you longer to read from if you had to... which you were forced to do in 10g).

For further details, see: this part of the official documentation

CLIENT_RESULT_CACHE_LAG & CLIENT_RESULT_CACHE_SIZE

The Client Result Cache is a new feature that allows a client to remember the results of queries that it issues repetitively. Second and subsequent submissions of the same query can fetch results from local memory instead of having to waltz back to the database and re-execute the query. These parameters govern how that client cache behaves.

A quick introduction to the entire concept of the client cache and how you can write queries to make use of it (basically, stick in a /*+ result_cache */ hint in your code!) is available here.

The LAG parameter is measured in milliseconds, and defaults to 5000 (hence, 5 seconds!) After that time, repeats of earlier queries will go back to the database to get updates. Within that time, results will come from the client cached copy of the earlier results.

The SIZE parameter is measured in bytes and defaults to zero, meaning the client cacheing feature is switched off by default, because no client memory is available for caching. Set it to a non-zero value, however, and the feature is enabled. The parameter is a one-size-fits-all affair: all client connections can use up to this amount of local memory. You can't have some connections able to use more than others.

COMMIT_LOGGING

This new parameter governs how LGWR behaves. In versions prior to 10g, if a user says 'commit', LGWR has had to immediately flush their redo to the redo logs. This will still happen if the new parameter is set to a value of IMMEDIATE. But you can also set the parameter to a value of BATCH -and at that point, LGWR will wait to collate a chunky batch of redo to write to disk before actually writing to disk.

The usual sources are a bit thin on detail! However, the short version would appear to be that the new parameter tries to turn setting 10g's COMMIT_WRITE, which could be set in 8 different ways, into a simple 'on/off' affair.

By batching up redo before flushing it to disk, you can reduce waits associated with redo generation and thus improve performance and throughput. On the other hand, it means a committed transaction is not 100% guaranteed to be recoverable any more, because there's a gap between when you commit and when your transactional data is finally written to physical disk. Good for high-volume OLTP sites, probably (because if you lose a transaction because of the delayed writing, it's likely to be small enough to be repeated without too much drama. Meanwhile, the huge quantity of transactions will benefit by not forever driving LGWR into the ground).

COMMIT_WAIT

Related to the above. This parameter can accept values of NOWAIT, WAIT and FORCE_WAIT.

By setting COMMIT_LOGGING to IMMEDIATE and COMMIT_WAIT to WAIT, you appear to achieve the same thing as you could have done in 10g by setting COMMIT_WRITE to 'IMMEDIATE,WAIT' -but you do it with a parameter that doesn't require quotation marks and commas to set, which makes it rather easier to set in code, for example.

CONTROL_MANAGEMENT_PACK_ACCESS

Standard Edition users may rejoice at this point. This parameter can take the values NONE, DIAGNOSTIC or DIAGNOSTIC+TUNING. Set it to NONE and the Automatic Workload Repository, ADDM and all the other Enterprise Edition-only advanced tuning features introduced in 10g will no longer be accessible in Enterprise Manager (the links still work, but you get sent to a nag-screen (“You haven't bought the license but we think you should”!) instead of seeing any real, meaningful content.

Contrary to some reports doing the rounds on the Internet, however, the parameter does NOT switch off MMON, the background process responsible for waking up every hour and populating the AWR tables you aren't allowed to query:

SQL> show parameter control_management

 NAME                                 TYPE        VALUE

 ------------------------------------ ----------- ------------------------------

 control_management_pack_access       string      DIAGNOSTIC+TUNING

SQL> !

oracle@plutonium$ ps -ef | grep mmon

oracle    6546     1  0 09:46 ?        00:00:01 ora_mmon_orcl

oracle    7285  7245  0 09:51 pts/1    00:00:00 grep mmon

oracle@plutonium$ exit

exit

SQL> alter system set control_management_pack_access='NONE';

System altered.

SQL> !

oracle@plutonium$ ps -ef | grep mmon

oracle    6546     1  0 09:46 ?        00:00:02 ora_mmon_orcl

oracle    7356  7336  0 09:51 pts/1    00:00:00 grep mmon

oracle@plutonium$ exit

exit

SQL> startup force ORACLE instance started.

Total System Global Area  422670336 bytes

Fixed Size                  1300352 bytes

Variable Size             293603456 bytes

Database Buffers          121634816 bytes

Redo Buffers                6131712 bytes

Database mounted.

Database opened.

SQL> !

oracle@plutonium$ ps -ef | grep mmon

oracle    7391     1  2 09:51 ?        00:00:01 ora_mmon_orcl

oracle    7543  7523  0 09:52 pts/1    00:00:00 grep mmon

Note how, in this example, MMON is on – I change the parameter value – MMON is still on – I bounce my instance -and MMON stays on regardless!

In short, the effect of the parameter is simply to switch Enterprise Manager into a 'license-safe' configuration, and that's it. It does nothing to stop your server resources being consumed to no purpose.

The only other slight fly in the ointment is that the default value for this parameter is DIAGNOSTIC+TUNING... which means that a whole slew of expensive features you're not supposed to touch are still switched on by default... which in turn means that you're still in some jeopardy of inadvertently breaching the terms of your Oracle license and rendering yourself liable to any upgrade fees Oracle may feel inclined to demand during a license audit!

DB_LOST_WRITE_PROTECT

A 'lost write' is what happens when Oracle flushes a data block to disk, the file system reports the write has taken place... but in fact it's lying through its teeth and the write never actually completed. The trouble starts when Oracle next reads the block: if it thinks the block is 'fresh' and contains the data it was supposed to had it been saved properly, Oracle will rely on its contents as being up-to-date. Use those contents to update something else and you've just corrupted that something else.

The parameter takes values of TYPICAL, FULL or NONE. NONE is the default, meaning no lost-write protection takes place. TYPICAL means the database protects read-write tablespaces. FULL means it protects read-only tablespaces as well as read-write ones.

'Protection' in this context means recording, in the log buffer, the fact that a block is read and the SCN of the block being read. If you subsequently ever do recovery, therefore, you will be able to compare the SCN the block should be at (as recorded in the redo) and the SCN it is actually at (as recorded in the block itself). If the block turns out to be older than it ought to be, you know you've got a lost write on your hands. More to the point, Oracle will know and warn you with a new ORA-00752 error message.

There's no fix for a lost-write problem, apart from restoring the database from a backup and performing recovery. But at least you will know what the problem is with a new degree of certainty. Of course, this certainty comes at a cost: flooding your log buffer with details of every block being read. Expect to clobber your database's performance as a result by around 10-20%. The feature is especialyl intended for no-data-loss Data Guard environments, where no data loss or corruption can be tolerated and a performance hit is a price worth paying. It doesn't have to be used in a Data Guard setting, though... but I'm not sure I'd chuck that much performance out of the door for it!

The official documentation has more to say on the topic, of course.

DB_SECUREFILE

Large Objects (BLOBs and CLOBs) can now be encrypted and compressed within the database, provided they are stored within ASSM tablespace (so ASSM stops being quite so evil in 11g!). LOBs whichhave the ability to be compressed and encrypted are called 'securefiles' and the DB_SECUREFILE parameter controls whether LOBS are created as securefiles automatically or not. LOBs which are not created as 'securefiles' are now said to be 'basicfiles' -which is what you've been using in versions 8.0 to 10g.

The parameter takes values NEVER, PERMITTED, ALWAYS and IGNORE, and the default value is PERMITTED. Never means you can never create securefile LOBs; Permitted means you can create securefile LOBs but you'll have to specify the fact manually at the time you create them; Always means all LOBs will automatically be created as secure, even if you forget to ask for it. Ignore is a bit like NEVER: practically, you won't be able to create securefile LOBs. But whereas an attempt to do so when NEVER is set will cause an error, an attempt to do so when IGNORE is set will just be silently ignored.

Manually requesting a securefile LOB looks like this:

SQL> create table t(
2  col1 blob,
3  col2 varchar2(10))
4  lob(col1) store as securefile;
create table t(
*
ERROR at line 1:
ORA-43853: SECUREFILE lobs cannot be used in non-ASSM tablespace "SYSTEM"

You'll note that ASSM is a requirement for this new feature (and that will teach me, trying to create a user table in SYSTEM tablespace!)

SQL> create table t(
2  col1 blob,
3  col2 varchar2(10))
4  tablespace users
5  lob (col1) store as securefile (compress);

Table created.

SQL> select * from t;
no rows selected

This time, the table creation works fine -and you'll notice another new feature in 11g: you can select from a table containing BLOBS in 11g without the select statement blowing up!

DB_ULTRA_SAFE

This parameter is an admission by Oracle that incrementally adding data protection to their database over several versions eventually ends up with a pile of parameters that it's difficult to get your head around!

As things stand, we have DB_BLOCK_CHECKING (which goes back to at least 8.0 days, I think); DB_BLOCK_CHECKSUM (8i); and now, in 11g, we have DB_LOST_WRITE_PROTECT. All three parameters are designed to detect corrupt or otherwise iffy blocks of data. Setting three parameters is a pain; not being quite sure of the interactions between them is potentially a major problem.

Enter DB_ULTRASAFE. It takes values of OFF, DATA_ONLY or DATA_AND_INDEX, and OFF is the default. If set to DATA_ONLY, you've actually just set DB_BLOCK_CHECKING to MEDIUM, DB_BLOCK_CHECKSUM to FULL and DB_LOST_WRITE_PROTECT to TYPICAL. Setting the new parameter to DATA_AND_INDEX means you've done almost the same, except that DB_BLOCK_CHECKING is now set to FULL rather than MEDIUM.

In other words, you set one parameter to effectively set three others under the hood. It's a bit like setting FAST_START_MTTR_TARGET and thereby causing values for LOG_CHECKPOINT_INTERVAL and FAST_START_IO_TARGET to be computed for you. Less hassle, in a word.

I'll just mention, though, that personally I find it a real hassle to remember that ULTRA_SAFE does have an underscore whereas SECUREFILE doesn't! Can we have some consistency in word-split policy, please, Oracle?!

Additionally, I find the new 11g fashion for having parameters with very wordy arguments annoying. ULTRASAFE could have had values of OFF, PART and FULL (or 0, 1 and 2)... but instead, we are saddled with 'DATA_AND_INDEX', which is a pain to type. 10g did it better with things like 'BASIC', 'TYPICAL' and 'FULL' for STATISTICS_LEVEL, for example. Clearly, new staff have been hired in the meantime... but it's not an improvement!

DDL_LOCK_TIMEOUT

OK, rant over: This one I like! It's measured in seconds, takes values from 0 to 1,000,000 and says how long a DDL statement should wait to acquire the necessary exclusive table locks before blowing itself up. In earlier versions where the behaviour was not configurable, a DDL statement would immediately blow up if it couldn't acquire the necessary locks. This remains the default behaviour in 11g, because the default value for the parameter is 0 seconds.

This, for example, is from 10g Release 2:

In session 1:

SYS@orcl> create table t(col1 char(4));
Table created.

SYS@orcl> insert into t(col1) values ('AAAA');
1 row created.

Note there is no commit at the end of that insert, so a shared lock on the table is still held. Now in session 2:

SYS@orcl> drop table t;

drop table t

ERROR at line 1:

ORA-00054: resource busy and acquire with NOWAIT specified

Now, in 11g, without configuring anything, we see the same sort of thing. In session 1:

SQL> create table t(col1 char(4));
Table created.

SQL> insert into t(col1) values ('AAAA');
1 row created.

...and in the second session:

SQL> drop table t;

drop table t

ERROR at line 1:

ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

The error message gets a little longer, but otherwise the behaviour remains unchanged: instant blow-up. However, the second session can now do this:

alter session set ddl_lock_timeout=50;

...and the drop table command will now merely appear to hang for 50 seconds. If a commit is issued in session 1, thereby releasing the shared table lock, session 2 will automatically spring back into life and complete successfully. Although in some respects it merely extends the ability we've had for a while to specify a WAIT time for a DML statement, it's nice to have the same functionality for DDL, too.

DIAGNOSTIC_DEST

Another nice one! In previous versions, we have had to set BACKGROUND_DUMP_DEST to be the location for process trace files and the alert log; USER_DUMP_DEST for user-requested trace files; CORE_DUMP_DEST and so on and on. With 11g, you now set this one new parameter to some path on your file system and the myriad trace files, alert logs and other diagnostic output files are then created within subdirectories automatically created within that main path.

USER_DUMP_DEST and the rest still exist, incidentally: presumably there are applications out there that expect to be able to set them. But the value set for the old parameters is silently ignored: DIAGNOSTIC_DEST takes precedence. For example:

SQL> show parameter user_dump


NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

user_dump_dest                       string      /oracle/diag/rdbms/orcl/orcl/t

                                                 race

This is the default setting for my USER_DUMP_DEST parameter ...and it happens to be pointing at a location within the DIAGNOSTIC_DEST (which I haven't set myself, but it defaults to $ORACLE_BASE, which in my case is simply /oracle). So the _DUMP_DEST parameters 'pick up' a setting which is effectively that of the new DIAGNOSTIC_DEST parameter. Let's over-ride that behaviour:

SQL> alter system set user_dump_dest='/home/oracle';

System altered.


SQL> show parameter user_dump


NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

user_dump_dest                       string      /home/oracle


SQL> alter database backup controlfile to trace;

Database altered.

So I've forced USER_DUMP_DEST to have a user-specified value, and the new setting has been accepted and is visible. The request to produce a tracefile backup of the control file would, in all previous versions of Oracle, have caused a file to be written to the new USER_DUMP_DEST directory -in my case, the /oracle/home directory. But not any more:

SQL> !

[oracle@plutonium ~]$ ls

database

[oracle@plutonium ~]$ pwd
/home/oracle

Definitely sitting in /home/oracle as I am, I cannot find any evidence of any text file output from Oracle at all!

[oracle@plutonium ~]$ cd /oracle/diag/rdbms/orcl/orcl/trace

[oracle@plutonium trace]$ ls

alert_orcl.log  orcl_ora_3470.trc  orcl_ora_3470.trm

But if I travel to the DIAGNOSTIC_DEST, lo! the trace file appears (and you'll notice that the alert log is also there, meaning that BACKGROUND_DUMP_DEST must pick up the new DIAGNOSTIC_DEST value, too).

This might sound trivial, but it isn't: the entire diagnostic capabilities of the database have now been wrapped up into a file-based repository, the location of which is given by this new parameter. It's grandly called the Automatic Diagnostic Repository or ADR for short. What's more, an entire management infrastructure has been put in place that lets you manage this repository: for example, there's an automatic (but configurable) retention policy, so old trace files will get automatically deleted (by the MMON background process); there's an entire command-line tool called adrci to allow you to view the respository, change its retention policy and so on; and there's a new view called V$DIAG_INFO which lets you see the repository's key locations from within the database itself. It's one of the rare 'extremely nice to have' features in 11g (in my view!), and it's most welcome.

Just one minor word of warning: the new ADR is available regardless of your edition. Standard Edition users can make use of it without a problem, for example. But Standard Edition users might want to turn off MMON, because they are not licensed to use the Automatic Workload Repository that MMON spends most of its time populating. But switching off MMON in 11g would now mean loss of Automatic Diagnostic Respository functionality... choose your poison, basically!

More information and documentation available from the usual place, of course.

ENABLE_DDL_LOGGING

Weird. I do a search for this in Google... nothing. I do a search for this on the 11g documentation website... nothing! So there's not a lot to go on!!

Fortunately, the parameter's name is fairly self-explanatory: DDL statements (create table, drop table etc) will now be recorded in the Alert Log (with the exact time when they occurred) if the parameter is set to TRUE. They won't be if it's set to FALSE, which is both the default value and the behaviour in previous versions.

Interesting to note, though, that whilst setting the parameter to true causes both the time and the exact syntax of the DDL statement to be recorded in the alert log, not a whisper is to be found about which user issued the command! Seems a bit of an omission to me, but there you go: it's better than a kick in the teeth when it comes to trying to work out the precise time at which to halt an incomplete recovery.

GLOBAL_TXN_PROCESSES

This one's a bit of a snooze, I'm afraid. It's only of use in a RAC, it controls the number of GTXn background processes each instance? in a RAC is allowed to run, it defaults to a value of 1, and you don't muck around with it unless you're doing distributed transactions. It's default setting means that big transactions don't spill across multiple instances in a RAC without being able to share locks and other resources. In earlier versions, they could (and did) and as a result could end up 'loosely coupled', with odd results that looked anything but transactionally consistent.

It's definitely advanced exotica, so read the official story here for further details.

JAVA_JIT_ENABLED

Java has been in the Oracle database since 8i days and it's always been run as an interpreted language by default: the code is turned into executable byte code only at run-time. As a result, Java code run in the database has acquired a bit of a reputation for running slowly (and hence should be whipped out and thrown on an Application Server pronto!). This parameter might just change that perception, because it does 'just-in-time' compiling of Java source into native machine code and subsequent management of that code (expiration, invalidation and so on): Java code should therefore run 'much faster' than before because it's not having to be forever interpreted.

The parameter defaults to TRUE for Linux, though the official documentation points out that the default value for the parameter is operating-system dependent.

The fun-and-games aspect of this arise from the fact that the just-in-time compilation is managed by an MMON slave: switch MMON off because as a Standard Edition user you have no use for the Automatic Workload Repository and you've just clobbered your ability to do JIT Java Compilation!

LDAP_DIRECTORY_SYSAUTH

You have been able to use LDAP directories such as Oracle Internet Directory to locate Oracle services for a long time: I documented it here a while back. You can also use an LDAP directory to authenticate ordinary users, and you've been able to do that for a while (it's nice: it saves users having different usernames and passwords for the domain and the database, for example). But you haven't been able to use an LDAP directory to authenticate privileged users -those that connect 'as sysdba' or 'as sysoper'. I wasn't actually aware of that fact until now, so you learn something every day... and now that I've just learnt it, this parameter means I can un-learn it, because if it's set to YES, then privileged users can be authenticated by an LDAP directory.

MEMORY_MAX_TARGET and MEMORY_TARGET

Fun and games. The rules about what happens when you set these parameters and the older ones they're designed to replace are legion! I'm not going to document them all here!

Suffice it to say that in 10g and 9i, you were able to automate the allocation of memory by setting SGA?_TARGET and PGA_AGGREGATE_TARGET. The first parameter governed the size of the shared memory -basically 'the instance- and the second governed the size of the private, working memory areas (sort areas and so on) allocated to each user's server process.

The obvious question was 'why have two parameters', and the new answer in 11g is, 'fair enough: have one instead'. MEMORY_TARGET on its own now does the job of both older parameters and thereby automates the total memory management of an Oracle server in one step.

Targets are only targets, however, and Oracle can, if it feels it needs to, allocate more memory than the target: the MEMORY_MAX_TARGET puts a hard ceiling on the amount of memory that can be consumed, however.

Now for just some of those 'interaction rules' I mentioned earlier:

If you don't set MEMORY_MAX_TARGET explicitly, it gets set to be the same as MEMORY_TARGET.
If you set both SGA_TARGET and MEMORY_TARGET, SGA_TARGET acts as a minimum below which the SGA cannot shrink.
Likewise, setting both PGA_AGGREGATE_TARGET and MEMORY_TARGET means the PGA_AGGREGATE_TARGET acts as a minimum amount of PGA memory that is available.

I suppose this is a good thing: but I'm currently working with a database that has a 100% miss rate on the library cache because its code wouldn't know a bind variable if it came up and hit it on the head with a big mallet, and using 10g's automatic memory management facilities on that database resulted in ever-larger allocations of memory to the shared pool (to absolutely no effect because it is the code that's bad, not the amount of memory) and ever-smaller amounts of buffer cache (where the memory could have actually been doing useful work). So whilst it's lovely to have automatic memory management if you're a mainstream database, if you are in any way 'odd', it's not yet subtle enough to cope... and being able to configure it with one parameter instead of two doesn't help that situation any. Just call me a grump, I suppose... but I daresay this is a much more convenient way of doing things for many.

Anyway, if you are thinking the feature can work for you, there's more information in the usual place, of course.

OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES

This one takes the prize for being the parameter with the longest name (in case you're wondering, it's 3 characters longer than anything else... and I might add that nearly all the longest-named parameters were released in 11g. Someone at Oracle Corporation needs to take concision pills, I think!)

Anyway, SQL Plans were a brilliant feature of 10g -though you had to be running Enterprise Edition and have paid for the extra diagnostic and tuning packs to be able to use it. The problem is a simple one: you fire off a piece of SQL to the database and you expect it to turn it into a perfect execution plan in a matter of a few tenths of seconds?! The fact that it usually does exactly that in so little time is nothing short of a miracle, if you ask me... but there are bound to be 'iffy' bits of SQL that confuse the hell out of it and as a result it ends up getting its plans horribly wrong. In such case, if only you could give the optimiser advanced warning! Feed it the query half an hour before you need to run it, for example, and let it think about it really hard! Chances are, with lots of time to think it over, it will come up with a better plan: and that's precisely what creating a SQL Plan in 10g is all about.

Using the SQL Tuning Advisor in 10g, you could get the optimiser to think about a SQL statement really hard. It would maybe get a chance to check if the table statistics were accurate; if not, it would re-calculate them. It would part-run the query to test whether it really was running as fast as it expected it to; if not, it would re-jig its plans and try again. It would check if histograms existed, were accurate, needed to be recalculated... and if it thought fresher histograms would help, it would calculate them. And taking all that into account, it would produce a bundle of hints called a SQL Plan, and future executions of that same SQL statement would then be able to take advantage of all that preparatory 'research'.

Well, this new feature in 11g means you don't have to run the SQL Tuning Advisor to create a SQL Plan. If the parameter is set to TRUE (it's FALSE by default), then every SQL statement thrown at the database gets a plan created for it automatically. When the same statement is subsequently re-run, it can take advantage of the plan created by the first run. As the statement gets repeated over and over, the optimiser keeps an eye on how long it runs, and updates the plan if it runs faster or slower, trying to keep the plan up-to-date as the structure of the table and the nature of its data changes over time.

It's a good feature, if you ask me. The bad news is, of course, it's using the same functionality you would have to buy the Enterprise Edition plus packs to make use of in 10g... and you accordingly still need the Enterprise Edition 11g. As far as I can tell from the 'Family of Products' document Oracle produces to explain licensable options, however, you don't need the Diagnostics and Tuning packs to make use of 'SQL Plan Management' (which I assume covers this bit of functionality). Don't take my word for it, of course: talk to your Oracle Sales Representative for the official version.

Saturday, November 24, 2007

ORACLE ERROR MESSAGES

Error Messages

1.1. ORA-00001: unique constraint (string.string) violated
1.2. ORA-00018: maximum number of sessions exceeded
1.3. ORA-00020: maximum number of processes (string) exceeded
1.4. ORA-00026: missing or invalid session ID
1.5. ORA-00036: maximum number of recursive SQL levels (string) exceeded
1.6. ORA-00054: resource busy and acquire with NOWAIT specified
1.7. ORA-00058: DB_BLOCK_SIZE must be string to mount this database (not string)
1.8. ORA-00059: maximum number of DB_FILES exceeded
1.9. ORA-00100: no data found
1.10. ORA-00106: cannot startup/shutdown database when connected to a dispatcher
1.11. ORA-00204 / ORA-00205 / ORA-00206 / ORA-00207 : error in reading (block string, # blocks string) of control file
1.12. ORA-00210: cannot open the specified control file
1.13. ORA-00221: error on write to control file
1.14. ORA-00227: corrupt block detected in control file.
1.15. ORA-00230: operation disallowed: snapshot control file enqueue unavailable
1.16. ORA-00265: instance recovery required, cannot set ARCHIVELOG mode
1.17. ORA-00272: error writing archive log string
1.18. ORA-00371: not enough shared pool memory, should be atleast string bytes
1.19. ORA-00381: cannot use both new and old parameters for buffer cache size specification
1.20. ORA-0470 LGWR process terminated with error
1.21. ORA-00483: During shutdown a process abnormally terminated
1.22. ORA-00724: ALTER DATABASE CONVERT command has been de-supported
1.23. ORA-600[KELTNFY-LDMINIT]
1.24. ORA-601: cleanup lock conflict
1.25. ORA-1031 Insufficient privileges
1.26. OERR: ORA 1092 Oracle instance terminated
1.27. ORA-01034 Oracle not available
1.28. ORA-01124: cannot recover data file 1 - file is in use or recovery
1.29. ORA-12547 tns lost contact
1.30. ORA-20854
1.31. ORA-01501: CREATE DATABASE failed
1.32. ORA-01502: index 'string.string' or partition of such index is in unusable
1.33. ORA-01506: missing or illegal database name
1.34. ORA-01507: database not mounted
1.35. ORA-01523: cannot rename data file to 'string' - file already part of database
1.36. ORA-27102: out of memory
1.37. ORA-29285: file write error
1.38.

1.1.

ORA-00001: unique constraint (string.string) violated

Cause: An UPDATE or INSERT statement attempted to insert a duplicate key.
Action: Either remove the unique restriction or do not insert the key.
To determine if duplicate rows exist in the table:

Set Up Data:-
SQL> create table dupe_row( id number(2) );

Table created.

declare
i integer:=1;
begin
while ( i <= 20 ) loop insert into dupe_row values ( i ); i:=i+1; end loop; end; / commit; Run the PL/SQL script couple of times to generate your data. Verify that there is enough data to work with. SELECT COUNT(1) FROM dupe_row; -- insert one more row insert into dupe_row values ( 99 ) ; commit; METHOD 1 SELECT rowid, id , count(1) FROM tbl_nm GROUP BY id HAVING COUNT(1) > 1;

To Delete duplicates:
DELETE FROM tbl_nm WHERE rowid in (
SELECT rowid
FROM tbl_nm
GROUP BY id HAVING COUNT(1) > 1 );

METHOD 2
Delete From dupe_row A
Where A.Rowid > ANY (Select B.Rowid From dupe_row B Where B.id = A.id);
--20 rows deleted

METHOD 3
delete from dupe_row where rowid in ( select rowid from dupe_row
group by rowid, id
minus
select min(rowid) from from dupe_row group by id );

--20 rows deleted.

METHOD 4
DELETE dupe_row
WHERE rowid IN
( SELECT LEAD(rowid) OVER
( PARTITION BY id
ORDER BY id )
FROM dupe_row );
--20 rows deleted.

Please note that id is the key, it could be a single value or a composite key.

1.2. ORA-00018: maximum number of sessions exceeded

If you get this error then increase the number of sessions in init.ora file.
ALTER SYSTEM SET SESSIONS=300 scope=both;

Sessions are different from processes.

1.3. ORA-00020: maximum number of processes (string) exceeded

Cause: All process state objects are in use.
Action: Increase the value of the PROCESSES initialization parameter.

This happens when there are more users connecting to the database.
One way to solve this is to create a profile for a user and set the idle limit
on it.

This could also be caused by a runaway java connection pool.

1.4. ORA-00026: missing or invalid session ID

Cause: Missing or invalid session ID string for ALTER SYSTEM KILL SESSION.
Action: Retry with a valid session ID.

Ensure that you are using the right session id.

SELECT * FROM V$SESSION WHERE SID=;
In a RAC environment you have to specify the right session id or you would be
killing another user's session.

1.5. ORA-00036: maximum number of recursive SQL levels (string) exceeded

Cause: An attempt was made to go more than the specified number of recursive
SQL levels.
Action: Remove the recursive SQL, possibly a recursive trigger.

This can be caused if a routine calls itself recursively.
PSEUDO CODE:-
function fn() begin fn end; #-- This will cause ORA-00036.

1.6. ORA-00054: resource busy and acquire with NOWAIT specified

Cause: Resource interested is busy.
Action: Retry if necessary.

SELECT * FROM V$LOCKED_OBJECT; -- To determine which user is holding the lock
on the object.

1.7. ORA-00058: DB_BLOCK_SIZE must be string to mount this database (not string)

Cause: DB_BLOCK_SIZE initialization parameter is wrong for the database
being mounted. It does not match the value used to create the database.
Action: Fix the value of the DB_BLOCK_SIZE parameter or mount a
database that matches the value.

1.8. ORA-00059: maximum number of DB_FILES exceeded

Cause: The value of the DB_FILES initialization parameter was exceeded.
Action: Increase the value of the DB_FILES parameter and warm start.

SQL> show parameter db_files
NAME TYPE VALUE
------------------------------------ -----------
db_files integer 200
SQL> alter system set db_files=250 scope=spfile;

System altered.
Please note that this is different from MAXDATAFILES which is a controlfile
property, in order to incrase maxdatafiles we have to recreate the control
file.

1.9. ORA-00100: no data found

Cause: An application made reference to unknown or inaccessible data.
Action: Handle this condition within the application or make
appropriate modifications to the application code. NOTE: If the
application uses Oracle-mode SQL instead of ANSI-mode SQL, ORA-01403
will be generated instead of ORA-00100.

This is how a ORA-100 is handled in a plsql block.
DECLARE
ln_no number(2):=0;
BEGIN
SELECT 1 INTO ln_no FROM dual where 1=2;
WHEN NO_DATA_FOUND THEN
null; --do something.
END;

1.10. ORA-00106: cannot startup/shutdown database when connected to a dispatcher

Cause: An attempt was made to startup/shutdown database when connected to
a shared server via a dispatcher.
Action: Re-connect as user INTERNAL without going through the
dispatcher. For most cases, this can be done by connect to INTERNAL
without specifying a network connect string.
You have to be connected as dedicated to the server.

SELECT server FROM v$session where sid=120;
This should display as dedicated;

1.11. ORA-00204 / ORA-00205 / ORA-00206 / ORA-00207 : error in reading (block string, # blocks string) of control file

Cause: A disk I/O failure was detected on reading the control file.
Action: Check if the disk is online, if it is not, bring it online and
try a warm start again. If it is online, then you need to recover the
disk.
Ensure that the disk is available, this probably indicates that Oracle was
unable to read the disk. It could lead to media corruption.

Check the alert log and log an SR with Oracle.

1.12. ORA-00210: cannot open the specified control file

Cause: Cannot open the control file.
Action: Check to make sure the control file exists and is not locked
by some other program.

As the error indicates check if the control file exists.

1.13. ORA-00221: error on write to control file

Cause: An error occurred when writing to one or more of the control files.
Action: See accompanying messages.

Check if the control file is available. Check alert log.

1.14. ORA-00227: corrupt block detected in control file.

Cause: A block header corruption or checksum error was detected on reading
the control file.
Action: Use the CREATE CONTROLFILE or RECOVER DATABASE USING BACKUP
CONTROLFILE command.

1.15. ORA-00230: operation disallowed: snapshot control file enqueue unavailable

Cause: The attempted operation cannot be executed at this time because
another process currently holds the snapshot control file enqueue.
Action: Retry the operation after the concurrent operation that is
holding the snapshot control file enqueue terminates.

As the error indicates another process is holding the enqueue on the control
file. Determine which process is holding the CF enqueue.

SELECT * FROM V$ENQUEUE_LOCK;

1.16. ORA-00265: instance recovery required, cannot set ARCHIVELOG mode

Cause: The database either crashed or was shutdown with the ABORT option.
Media recovery cannot be enabled because the online logs may not be
sufficient to recover the current datafiles.
Action: Open the database and then enter the SHUTDOWN command with the
NORMAL or IMMEDIATE option.

If you want to set the database mode to archivelog you have to do a shutdown
immediate or shutdown normal.

1.17. ORA-00272: error writing archive log string

Cause: An I/O error occurred while archiving a redo log file.
Action: Check that the output device is still available and correct
any device errors that may have occurred. Also, make certain that
sufficient space for archiving is available on the output device.

Check the device disk space on the archive log destinatino directory.

1.18. ORA-00371: not enough shared pool memory, should be atleast string bytes

Cause: Init.ora parameter shared_pool_size is too small
Action: Increase the parameter value

Increase the size of the shared pool size.

1.19. ORA-00381: cannot use both new and old parameters for buffer cache size specification

Cause: User specified one or more of { db_cache_size ,
db_recycle_cache_size, db_keep_cache_size, db_nk_cache_size (where n is
one of 2,4,8,16,32), db_cache_advice } AND one or more of {
db_block_buffers, buffer_pool_keep , buffer_pool_recycle }. This is
illegal.
Action: Use EITHER the old (pre-Oracle_8.2) parameters OR the new
ones. Don't specify both. If old size parameters are specified in the
parameter file, you may want to replace them with new parameters since
the new parameters can be modified dynamically and allow you to
configure additional caches for additional block sizes. Cache advisory
can only be enabled with the new cache parameters.

db_cache_size and db_block_buffers are mutually exclusive you can use only
one, cannot specify both.

This error is common when you are using /3GB switch in Windows.

1.20. ORA-0470 LGWR process terminated with error

Normally ORA-470 signals that the LGWR process terminated due to an error. And
hence PMON terminated the instance due to error ORA-470.
Failure in one background process causes failure in all of them and ultimately
the instance will terminate.
Look for errors in the alert.log
One has to look at all the error log trace files to determine the cause
of the problem. DBWR, LGWR, PMON, SMON and so on. Look for the trace files
in the udump directory.
If there are no files written then the error could be at the OS level
check for NFILE kernel parameter in Unix. Call your sysadmin to
look at the OS specific errors.

1.21. ORA-00483: During shutdown a process abnormally terminated

Cause: One of the background processes did not exit normally at or near
the time of shutdown.
Action: Use shutdown abort.

Check the alert log for more information.

1.22. ORA-00724: ALTER DATABASE CONVERT command has been de-supported

Cause: ALTER DATABASE CONVERT command has been de-supported since Oracle
10i.
Action: No action required.

1.23.1. ORA-00823: Specified value of sga_target greater than sga_max_size

Cause: The specified value of sga_target is greater than sga_max_size.
Action: Increase sga_max_size to match up with sga_target or decrease
sga_target to match up with sga_maxsize.

1.23.2. ORA-00825: cannot set db_block_buffers if sga_target set

Cause: sga_target set with db_block_buffers set.
Action: Do not set sga_target or use new cache parameters and do not
use db_block_buffers which is a old cache parameter.

Ensure that you are using db_cache instead of db_block_buffers.

1.23. ORA-600[KELTNFY-LDMINIT]

This was a error which took lot of googling with no results.

The final solution was to make appropriate entries into /etc/hosts file
my current /etc/hosts file looks like this :-

127.0.0.1 localhost
127.0.0.1 localhost.localdomain

1.24. ORA-601: cleanup lock conflict

Look in the alert log, this could be a bug with Oracle.

1.25. ORA-1031 Insufficient privileges

cd /u02/app/oracle/product/10.2.0/db_1/bin
or
cd $ORACLE_HOME/
ls -ltr
if the files do not belong to the dba group then change the owner to
dba as follows:-
su -
chown -R oracle:dba ./*
ls -l
now the listing should look like
-rwxr-x--x 1 oracle dba 1125429 Jun 3 17:21 hsodbc
-rwxr-xr-x 1 oracle dba 437 Jun 3 17:21 commonenv
-rwx------ 1 oracle dba 2931 Jun 3 17:21 onsctl
-rwxr-x--- 1 oracle dba 676324 Jun 3 17:21 ldapbind
-rwxr-xr-x 1 oracle dba 1722 Jun 3 17:21 emca
-rwxr-xr-x 1 oracle dba 5401 Jun 3 17:21 dbca

sqlplus /nolog
sql>connect sys/passwd as sysdba
sql>startup;

1.26. OERR: ORA 1092 Oracle instance terminated

Look for error messages in the alert.log.

sqlplus: error while loading shared libraries: libsqlplus.so: cannot open
shared object file: No such file or directory

Ensure that all directories are readable by the user.
chmod -R 755 $ORACLE_HOME/
Also check the environment variable
LD_LIBRARY_PATH to $ORACLE_HOME/lib:/usr/lib
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/usr/lib

1.27. ORA-01034 Oracle not available

"Oracle Not Available" is a general indication that your database is not
running. Please follow the below steps and try to connect again:
WINDOWS
1. net start oracleService
2. C:\> set ORACLE_SID=>
3. C:\> set ORACLE_HOME=
4. C:\> sqlplus "sys/password as sysdba"

UNIX /Linux
1. set ORACLE_SID=
2. sqlplus " / as sysdba"
3. startup;
4. note any error messages in alert.log

If error still persists, check the contents of Alert log file
(background_dump_dest parameter in your INIT.ORA file). It shall give a clear
message about the problem you are facing.

1.28. ORA-01124: cannot recover data file 1 - file is in use or recovery

Please check if the database is already running.

1.29. ORA-12547 tns lost contact

This is a broad error. Ensure that all OS dependencies are
met before you even go ahead in solving the problem.

Linux
Ensure that Libaio is of correct version.

HP tru64
Ensure that all dependencies are met. Kernel parameters.

1.30. ORA-20854

http://forums.oracle.com/forums/search.jspa?objID=f61&q=ORA-02085+
http://www.dbasupport.com/oracle/ora9i/naming02.shtml
http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:633023740029

1.31. ORA-01501: CREATE DATABASE failed

Cause: An error occurred during create database
Action: See accompanying errors.
Check the alert.log

1.32. ORA-01502: index 'string.string' or partition of such index is in unusable

state
Cause: An attempt has been made to access an index or index partition that
has been marked unusable by a direct load or by a DDL operation
Action: DROP the specified index, or REBUILD the specified index, or
REBUILD the unusable index partition

Rebuild the partition.

1.33. ORA-01506: missing or illegal database name

Cause: No db_name INIT.ORA aprameter was specified.
Action: The database name must be given in the db_name INIT.ORA
parameter.

Check the db name in the initSID.ora or spfile .

1.34. ORA-01507: database not mounted

Cause: A command was attempted that requires the database to be mounted.
Action: If you are using the ALTER DATABASE statement via the SQLDBA
startup command, specify the MOUNT option to startup; else if you are
directly doing an ALTER DATABASE DISMOUNT, do nothing; else specify
the MOUNT option to ALTER DATABASE. If you are doing a backup or copy,
you must first mount the desired database. If you are doing a
FLASHBACK DATABASE, you must first mount the desired database.

SQL>ALTER DATABASE MOUNT;

1.35. ORA-01523: cannot rename data file to 'string' - file already part of database

Cause: During ALTER DATABASE RENAME or ALTER TABLESPACE RENAME, the new
name of a file is already present in the control file.
Action: Rename the file to a name not already being used as part of
the database.

Check if the file name is already present.

1.36. ORA-27102: out of memory

Cause: Out of memory
Action: Consult the trace file for details
Solaris 10

1.37. ORA-29285: file write error

Cause: Failed to write to, flush, or close a file.
Action: Verify that the file exists, that it is accessible, and that it is
open in write or append mode.

You may need to supply a value for the max line length default in the FOPEN
call:
UTL_FILE.FOPEN (
location IN VARCHAR2,
filename IN VARCHAR2,
open_mode IN VARCHAR2,
max_linesize IN BINARY_INTEGER)
RETURN file_type;
max_linesize = Maximum number of characters per line, including the newline
character, for this file. (minimum value 1, maximum value 32767).
The default is approximately 1000 bytes.

1.38.

ORA-39700: database must be opened with UPGRADE option

Cause: A normal database open was attempted, but the database has not been
upgraded to the current server version.
Action: Use the UPGRADE option when opening the database to run catupgrd.sql
(for database upgrade), or to run catalog.sql and catproc.sql (after initial
database creation).

The error probably means that there is a mismatch between the Datafile version
and the oracle binary version.

One example
Database / Datafile => 10.1.0.1
Oracle Binary => 10.1.0.2
In that case you will get the error indicating you to upgrade the Datafile /
database.

Having said that. There are couple of things we can do:-
1. if you feel adventurous you can start the database using the upgrade
option.
SQL>startup upgrade;

http://www.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10763/upgrade.htm#sthref136

2. Else, figure out the datafile version and try to put all the datafiles in
the same oracle version in which they were created and open the database.

Please take this with a grain of salt and use your discretion.

STATSPACK SURVIVAL GUIDE

Installing and Configuring STATSPACK

Create PERFSTAT Tablespace

The STATSPACK utility requires an isolated tablespace to obtain all of the objects and data. For uniformity, it is suggested that the tablespace be called PERFSTAT, the same name as the schema owner for the STATSPACK tables. It is important to closely watch the STATSPACK data to ensure that the stats$sql_summary table is not taking an inordinate amount of space.

SQL> CREATE TABLESPACE perfstat
DATAFILE '/u01/oracle/db/AKI1_perfstat.dbf' SIZE 1000M REUSE
EXTENT MANAGEMENT LOCAL UNIFORM SIZE 512K
SEGMENT SPACE MANAGEMENT AUTO
PERMANENT
ONLINE;

Run the Create Scripts

Now that the tablespace exists, we can begin the installation process of the STATSPACK software. Note that you must have performed the following before attempting to install STATSPACK.

Run catdbsyn.sql as SYS

Run dbmspool.sql as SYS

$ cd $ORACLE_HOME/rdbms/admin
$ sqlplus "/ as sysdba"
SQL> start spcreate.sql

Choose the PERFSTAT user's password
-----------------------------------
Not specifying a password will result in the installation FAILING

Enter value for perfstat_password: perfstat

Choose the Default tablespace for the PERFSTAT user
---------------------------------------------------
Below is the list of online tablespaces in this database which can
store user data. Specifying the SYSTEM tablespace for the user's
default tablespace will result in the installation FAILING, as
using SYSTEM for performance data is not supported.

Choose the PERFSTAT users's default tablespace. This is the tablespace
in which the STATSPACK tables and indexes will be created.

TABLESPACE_NAME CONTENTS STATSPACK DEFAULT TABLESPACE
------------------------------ --------- ----------------------------
PERFSTAT PERMANENT
SYSAUX PERMANENT *
USERS PERMANENT

Pressing will result in STATSPACK's recommended default
tablespace (identified by *) being used.

Enter value for default_tablespace: PERFSTAT

Choose the Temporary tablespace for the PERFSTAT user
-----------------------------------------------------
Below is the list of online tablespaces in this database which can
store temporary data (e.g. for sort workareas). Specifying the SYSTEM
tablespace for the user's temporary tablespace will result in the
installation FAILING, as using SYSTEM for workareas is not supported.

Choose the PERFSTAT user's Temporary tablespace.

TABLESPACE_NAME CONTENTS DB DEFAULT TEMP TABLESPACE
------------------------------ --------- --------------------------
TEMP TEMPORARY *

Pressing will result in the database's default Temporary
tablespace (identified by *) being used.

Enter value for temporary_tablespace: TEMP

.....
.....
Creating Package STATSPACK...

Package created.

No errors.
Creating Package Body STATSPACK...

Package body created.

No errors.

NOTE:
SPCPKG complete. Please check spcpkg.lis for any errors.

Check the Logfiles: spcpkg.lis, spctab.lis, spcusr.lis

Adjusting the STATSPACK Collection Level

STATSPACK has two types of collection options, level and threshold. The level parameter controls the type of data collected from Oracle, while the threshold parameter acts as a filter for the collection of SQL statements into the stats$sql_summary table.

SQL> SELECT * FROM stats$level_description ORDER BY snap_level;

Level 0 This level captures general statistics, including rollback segment, row cache, SGA, system events, background events, session events, system statistics, wait statistics, lock statistics, and Latch information.

Level 5 This level includes capturing high resource usage SQL Statements, along with all data captured by lower levels.

Level 6 This level includes capturing SQL plan and SQL plan usage information for high resource usage SQL Statements, along with all data captured by lower levels.

Level 7 This level captures segment level statistics, including logical and physical reads, row lock, itl and buffer busy waits, along with all data captured by lower levels.

Level 10 This level includes capturing Child Latch statistics, along with all data captured by lower levels.

You can change the default level of a snapshot with the statspack.snap function. The i_modify_parameter => 'true' changes the level permanent for all snapshots in the future.

SQL> exec statspack.snap(i_snap_level => 6, i_modify_parameter => 'true');

Create, View and Delete Snapshots

sqlplus perfstat/perfstat
SQL> exec statspack.snap;
SQL> select name,snap_id,to_char(snap_time,'DD.MM.YYYY:HH24:MI:SS')
"Date/Time" from stats$snapshot,v$database;

NAME SNAP_ID Date/Time
--------- ---------- -------------------
AKI1 4 14.11.2004:10:56:01
AKI1 1 13.11.2004:08:48:47
AKI1 2 13.11.2004:09:00:01
AKI1 3 13.11.2004:09:01:48

SQL> @?/rdbms/admin/sppurge;
Enter the Lower and Upper Snapshot ID

Create the Report

sqlplus perfstat/perfstat
SQL> @?/rdbms/admin/spreport.sql

Statspack at a Glance

What if you have this long STATSPACK report and you want to figure out if everything is running smoothly? Here, we will review what we look for in the report, section by section. We will use an actual STATSPACK report from our own Oracle 10g system.

STATSPACK report for

DB Name DB Id Instance Inst Num Release RAC Host
------------ ----------- ------------ -------- ----------- --- ----------------
AKI1 2006521736 AKI1 1 10.1.0.2.0 NO akira

Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 5 14-Nov-04 11:18:00 15 14.3
End Snap: 6 14-Nov-04 11:33:00 15 10.2
Elapsed: 15.00 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 24M Std Block Size: 4K
Shared Pool Size: 764M Log Buffer: 1,000K

Note that this section may appear slightly different depending on your version of Oracle. For example, the Curs/Sess column, which shows the number of open cursors per session, is new with Oracle9i (an 8i Statspack report would not show this data).

Here, the item we are most interested in is the elapsed time. We want that to be large enough to be meaningful, but small enough to be relevant (15 to 30 minutes is OK). If we use longer times, we begin to lose the needle in the haystack.

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 425,649.84 16,600,343.64
Logical reads: 1,679.69 65,508.00
Block changes: 2,546.17 99,300.45
Physical reads: 77.81 3,034.55
Physical writes: 78.35 3,055.64
User calls: 0.24 9.55
Parses: 2.90 113.00
Hard parses: 0.16 6.27
Sorts: 0.76 29.82
Logons: 0.01 0.36
Executes: 4.55 177.64
Transactions: 0.03

% Blocks changed per Read: 151.59 Recursive Call %: 99.56
Rollback per transaction %: 0.00 Rows per Sort: 65.61

Here, we are interested in a variety of things, but if we are looking at a "health check", three items are important:

The Hard parses (we want very few of them)
Executes (how many statements we are executing per second / transaction)
Transactions (how many transactions per second we process).

This gives an overall view of the load on the server. In this case, we are looking at a very good hard parse number and a fairly light system load (1 - 4 transactions per second is low).

Next, we move onto the Instance Efficiency Percentages section, which includes perhaps the only ratios we look at in any detail:

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 99.99
Buffer Hit %: 95.39 In-memory Sort %: 100.00
Library Hit %: 99.42 Soft Parse %: 94.45
Execute to Parse %: 36.39 Latch Hit %: 100.00
Parse CPU to Parse Elapsd %: 59.15 % Non-Parse CPU: 99.31

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 10.28 10.45
% SQL with executions>1: 70.10 71.08
% Memory for SQL w/exec>1: 44.52 44.70

The three in bold are the most important: Library Hit, Soft Parse % and Execute to Parse. All of these have to do with how well the shared pool is being utilized. Time after time, we find this to be the area of greatest payback, where we can achieve some real gains in performance.

Here, in this report, we are quite pleased with the Library Hit and the Soft Parse % values. If the library Hit ratio was low, it could be indicative of a shared pool that is too small, or just as likely, that the system did not make correct use of bind variables in the application. It would be an indicator to look at issues such as those.

OLTP System

The Soft Parse % value is one of the most important (if not the only important) ratio in the database. For a typical OLTP system, it should be as near to 100% as possible. You quite simply do not hard parse after the database has been up for a while in your typical transactional / general-purpose database. The way you achieve that is with bind variables. In a regular system like this, we are doing many executions per second, and hard parsing is something to be avoided.

Data Warehouse

In a data warehouse, we would like to generally see the Soft Parse ratio lower. We don't necessarily want to use bind variables in a data warehouse. This is because they typically use materialized views, histograms, and other things that are easily thwarted by bind variables. In a data warehouse, we may have many seconds between executions, so hard parsing is not evil; in fact, it is good in those environments.

The moral of this is ...

... to look at these ratios and look at how the system operates. Then, using that knowledge, determine if the ratio is okay given the conditions. If we just said that the execute-to-parse ratio for your system should be 95% or better, that would be unachievable in many web-based systems. If you have a routine that will be executed many times to generate a page, you should definitely parse once per page and execute it over and over, closing the cursor if necessary before your connection is returned to the connection pool.

Moving on, we get to the Top 5 Timed Events section (in Oracle9i Release 2 and later) or Top 5 Wait Events (in Oracle9i Release 1 and earlier).

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Call Time
-------------------------------------------- ------------ ----------- ---------
CPU time 122 91.65
db file sequential read 1,571 2 1.61
db file scattered read 1,174 2 1.59
log file sequential read 342 2 1.39
control file parallel write 450 2 1.39
-------------------------------------------------------------
Wait Events DB/Inst: AKI1/AKI1 Snaps: 5-6

-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)

This section is among the most important and relevant sections in the Statspack report. Here is where you find out what events (typically wait events) are consuming the most time. In Oracle9i Release 2, this section is renamed and includes a new event: CPU time.

CPU time is not really a wait event (hence, the new name), but rather the sum of the CPU used by this session, or the amount of CPU time used during the snapshot window. In a heavily loaded system, if the CPU time event is the biggest event, that could point to some CPU-intensive processing (for example, forcing the use of an index when a full scan should have been used), which could be the cause of the bottleneck.

Db file sequential read - This wait event will be generated while waiting for writes to TEMP space generally (direct loads, Parallel DML (PDML) such as parallel updates. You may tune the PGA AGGREGATE TARGET parameter to reduce waits on sequential reads.

Db file scattered read - Next is the db file scattered read wait value. That generally happens during a full scan of a table. You can use the Statspack report to help identify the query in question and fix it.

Here you will find the most CPU-Time consuming SQL statements

SQL ordered by Gets DB/Inst: AKI1/AKI1 Snaps: 5-6
-> Resources reported for PL/SQL code includes the resources used by all SQL
statements called by the code.
-> End Buffer Gets Threshold: 10000 Total Buffer Gets: 720,588
-> Captured SQL accounts for 3.1% of Total Buffer Gets
-> SQL reported below exceeded 1.0% of Total Buffer Gets

CPU Elapsd Old
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
16,926 1 16,926.0 2.3 2.36 3.46 1279400914
Module: SQL*Plus
create table test as select * from all_objects

Tablespace
------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
TAB 1,643 4 1.0 19.2 16,811 39 0 0.0
UNDO 166 0 0.5 1.0 5,948 14 0 0.0
SYSTEM 813 2 2.5 1.6 167 0 0 0.0
STATSPACK 146 0 0.3 1.1 277 1 0 0.0
SYSAUX 18 0 0.0 1.0 29 0 0 0.0
IDX 18 0 0.0 1.0 18 0 0 0.0
USER 18 0 0.0 1.0 18 0 0 0.0
-------------------------------------------------------------

->A high value for "Pct Waits" suggests more rollback segments may be required
->RBS stats may not be accurate between begin and end snaps when using Auto Undo
managment, as RBS may be dynamically created and dropped as needed

Trans Table Pct Undo Bytes
RBS No Gets Waits Written Wraps Shrinks Extends
------ -------------- ------- --------------- -------- -------- --------
0 8.0 0.00 0 0 0 0
1 3,923.0 0.00 14,812,586 15 0 14
2 5,092.0 0.00 19,408,996 19 0 19
3 295.0 0.00 586,760 1 0 0
4 1,312.0 0.00 4,986,920 5 0 5
5 9.0 0.00 0 0 0 0
6 9.0 0.00 0 0 0 0
7 9.0 0.00 0 0 0 0
8 9.0 0.00 0 0 0 0
9 9.0 0.00 0 0 0 0
10 9.0 0.00 0 0 0 0
-------------------------------------------------------------

->Optimal Size should be larger than Avg Active

RBS No Segment Size Avg Active Optimal Size Maximum Size
------ --------------- --------------- --------------- ---------------
0 364,544 0 364,544
1 17,952,768 8,343,482 17,952,768
2 25,292,800 11,854,857 25,292,800
3 4,321,280 617,292 6,418,432
4 8,515,584 1,566,623 8,515,584
5 126,976 0 126,976
6 126,976 0 126,976
7 126,976 0 126,976
8 126,976 0 126,976
9 126,976 0 126,976
10 126,976 0 126,976
-------------------------------------------------------------

Generate Execution Plan for given SQL statement

If you have identified one or more problematic SQL statement, you may want to check the execution plan. Remember the "Old Hash Value" from the report above (1279400914), then execute the scrip to generate the execution plan.

sqlplus perfstat/perfstat
SQL> @?/rdbms/admin/sprepsql.sql
Enter the Hash Value, in this example: 1279400914

SQL Text
~~~~~~~~
create table test as select * from all_objects

Known Optimizer Plan(s) for this Old Hash Value
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Shows all known Optimizer Plans for this database instance, and the Snap Id's
they were first found in the shared pool. A Plan Hash Value will appear
multiple times if the cost has changed
-> ordered by Snap Id

First First Plan
Snap Id Snap Time Hash Value Cost
--------- --------------- ------------ ----------
6 14 Nov 04 11:26 1386862634 52

Plans in shared pool between Begin and End Snap Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Shows the Execution Plans found in the shared pool between the begin and end
snapshots specified. The values for Rows, Bytes and Cost shown below are those
which existed at the time the first-ever snapshot captured this plan - these
values often change over time, and so may not be indicative of current values
-> Rows indicates Cardinality, PHV is Plan Hash Value
-> ordered by Plan Hash Value

--------------------------------------------------------------------------------
| Operation | PHV/Object Name | Rows | Bytes| Cost |
--------------------------------------------------------------------------------
|CREATE TABLE STATEMENT |----- 1386862634 ----| | | 52 |
|LOAD AS SELECT | | | | |
| VIEW | | 1K| 216K| 44 |
| FILTER | | | | |
| HASH JOIN | | 1K| 151K| 38 |
| TABLE ACCESS FULL |USER$ | 29 | 464 | 2 |
| TABLE ACCESS FULL |OBJ$ | 3K| 249K| 35 |
| TABLE ACCESS BY INDEX ROWID |IND$ | 1 | 7 | 2 |
| INDEX UNIQUE SCAN |I_IND1 | 1 | | 1 |
| NESTED LOOPS | | 5 | 115 | 16 |
| INDEX RANGE SCAN |I_OBJAUTH1 | 1 | 10 | 2 |
| FIXED TABLE FULL |X$KZSRO | 5 | 65 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| VIEW | | 1 | 13 | 2 |
| FAST DUAL | | 1 | | 2 |
--------------------------------------------------------------------------------

Resolving Your Wait Events

The following are 10 of the most common causes for wait events, along with explanations and potential solutions:

This generally indicates waits related to full table scans. As full table scans are pulled into memory, they rarely fall into contiguous buffers but instead are scattered throughout the buffer cache. A large number here indicates that your table may have missing or suppressed indexes. Although it may be more efficient in your situation to perform a full table scan than an index scan, check to ensure that full table scans are necessary when you see these waits. Try to cache small tables to avoid reading them in over and over again, since a full table scan is put at the cold end of the LRU (Least Recently Used) list.

This event generally indicates a single block read (an index read, for example). A large number of waits here could indicate poor joining orders of tables, or unselective indexing. It is normal for this number to be large for a high-transaction, well-tuned system, but it can indicate problems in some circumstances. You should correlate this wait statistic with other known issues within the Statspack report, such as inefficient SQL. Check to ensure that index scans are necessary, and check join orders for multiple table joins. The DB_CACHE_SIZE will also be a determining factor in how often these waits show up. Problematic hash-area joins should show up in the PGA memory, but they're also memory hogs that could cause high wait numbers for sequential reads. They can also show up as direct path read/write waits.

This indicates your system is waiting for a buffer in memory, because none is currently available. Waits in this category may indicate that you need to increase the DB_BUFFER_CACHE, if all your SQL is tuned. Free buffer waits could also indicate that unselective SQL is causing data to flood the buffer cache with index blocks, leaving none for this particular statement that is waiting for the system to process. This normally indicates that there is a substantial amount of DML (insert/update/delete) being done and that the Database Writer (DBWR) is not writing quickly enough; the buffer cache could be full of multiple versions of the same buffer, causing great inefficiency. To address this, you may want to consider accelerating incremental checkpointing, using more DBWR processes, or increasing the number of physical disks.

This is a wait for a buffer that is being used in an unshareable way or is being read into the buffer cache. Buffer busy waits should not be greater than 1 percent. Check the Buffer Wait Statistics section (or V$WAITSTAT) to find out if the wait is on a segment header. If this is the case, increase the freelist groups or increase the pctused to pctfree gap. If the wait is on an undo header, you can address this by adding rollback segments; if it's on an undo block, you need to reduce the data density on the table driving this consistent read or increase the DB_CACHE_SIZE. If the wait is on a data block, you can move data to another block to avoid this hot block, increase the freelists on the table, or use Locally Managed Tablespaces (LMTs). If it's on an index block, you should rebuild the index, partition the index, or use a reverse key index. To prevent buffer busy waits related to data blocks, you can also use a smaller block size: fewer records fall within a single block in this case, so it's not as "hot." When a DML (insert/update/ delete) occurs, Oracle Database writes information into the block, including all users who are "interested" in the state of the block (Interested Transaction List, ITL). To decrease waits in this area, you can increase the initrans, which will create the space in the block to allow multiple ITL slots. You can also increase the pctfree on the table where this block exists (this writes the ITL information up to the number specified by maxtrans, when there are not enough slots built with the initrans that is specified).

Latches are low-level queuing mechanisms (they're accurately referred to as mutual exclusion mechanisms) used to protect shared memory structures in the system global area (SGA). Latches are like locks on memory that are very quickly obtained and released. Latches are used to prevent concurrent access to a shared memory structure. If the latch is not available, a latch free miss is recorded. Most latch problems are related to the failure to use bind variables (library cache latch), redo generation issues (redo allocation latch), buffer cache contention issues (cache buffers LRU chain), and hot blocks in the buffer cache (cache buffers chain). There are also latch waits related to bugs; check MetaLink for bug reports if you suspect this is the case. When latch miss ratios are greater than 0.5 percent, you should investigate the issue.

An enqueue is a lock that protects a shared resource. Locks protect shared resources, such as data in a record, to prevent two people from updating the same data at the same time. An enqueue includes a queuing mechanism, which is FIFO (first in, first out). Note that Oracle's latching mechanism is not FIFO. Enqueue waits usually point to the ST enqueue, the HW enqueue, the TX4 enqueue, and the TM enqueue. The ST enqueue is used for space management and allocation for dictionary-managed tablespaces. Use LMTs, or try to preallocate extents or at least make the next extent larger for problematic dictionary-managed tablespaces. HW enqueues are used with the high-water mark of a segment; manually allocating the extents can circumvent this wait. TX4s are the most common enqueue waits. TX4 enqueue waits are usually the result of one of three issues. The first issue is duplicates in a unique index; you need to commit/rollback to free the enqueue. The second is multiple updates to the same bitmap index fragment. Since a single bitmap fragment may contain multiple rowids, you need to issue a commit or rollback to free the enqueue when multiple users are trying to update the same fragment. The third and most likely issue is when multiple users are updating the same block. If there are no free ITL slots, a block-level lock could occur. You can easily avoid this scenario by increasing the initrans and/or maxtrans to allow multiple ITL slots and/or by increasing the pctfree on the table. Finally, TM enqueues occur during DML to prevent DDL to the affected object. If you have foreign keys, be sure to index them to avoid this general locking issue.

This wait occurs because you are writing the log buffer faster than LGWR can write it to the redo logs, or because log switches are too slow. To address this problem, increase the size of the log files, or increase the size of the log buffer, or get faster disks to write to. You might even consider using solid-state disks, for their high speed.

All commit requests are waiting for "logfile switch (archiving needed)" or "logfile switch (Checkpoint. Incomplete)." Ensure that the archive disk is not full or slow. DBWR may be too slow because of I/O. You may need to add more or larger redo logs, and you may potentially need to add database writers if the DBWR is the problem.

When a user commits or rolls back data, the LGWR flushes the session's redo from the log buffer to the redo logs. The log file sync process must wait for this to successfully complete. To reduce wait events here, try to commit more records (try to commit a batch of 50 instead of one at a time, for example). Put redo logs on a faster disk, or alternate redo logs on different physical disks, to reduce the archiving effect on LGWR. Don't use RAID 5, since it is very slow for applications that write a lot; potentially consider using file system direct I/O or raw devices, which are very fast at writing information.

There are several idle wait events listed after the output; you can ignore them. Idle events are generally listed at the bottom of each section and include such things as SQL*Net message to/from client and other background-related timings. Idle events are listed in the stats$idle_event table.

Remove STATSPACK from the Database

After a STATSPACK session you want to remove the STATSPACK tables.

sqlplus "/ as sysdba"
SQL> @?/rdbms/admin/spdrop.sql
SQL> DROP TABLESPACE perfstat INCLUDING CONTENTS AND DATAFILES;

Experts Frequently Asked Questions

Blog Archive

About Me