11g: Multiple failed login attempt can block New Application connections

In Oracle database 11g, a new feature/enhancement has potential of bringing down the application due to multiple sessions hitting database with wrong password.

As per Bug 7715339 – Logon failures causes “row cache lock” waits – Allow disable of logon delay [ID 7715339.8]

In 11g there is an intentional delay between allowing failed logon attempts to retry. After 3 successive failures a sleep delay is introduced starting
at 3 seconds and extending to 10 seconds max. During each delay the user X row cache lock is held in exclusive mode preventing any concurrent logon attempt as user X (and preventing any other operation which would need the row cache lock for user X).

Let’s simulate this using a test case.I am using a 11.2.0.3 database.Open two or more database server  connections

Connection 1

create user amit identified by amit;
grant connect to amit;
alter user amit profile MONITORING_PROFILE;

MONITORING_PROFILE has failed_login_attempts set to unlimited.

Create two scripts

$cat 1.sh
sqlplus -s amit/test123 <<EOF
EOF
$cat lock_db.sh
#!/bin/bash
for i in {1..1000}
do
nohup sh 1.sh &
done

I have set loop to 1000 which you can increase depending on your process limit. This is test database with process parameter set to 150. Execute script

sh lock_db.sh 

Now if you go to Connection 2, you will see following issues

a) Database will not allow fresh connections for short time as you will get ORA-00020: maximum number of processes (150) exceeded errors.
After that database connections will be possible
b) You cannot connect to AMIT user for considerable time even with correct password. All sessions will be waiting on library cache lock

 select event,count(*) from V$session group by event order by 2;

EVENT								   COUNT(*)
---------------------------------------------------------------- ----------
Streams AQ: waiting for time management or cleanup tasks		  1
VKTM Logical Idle Wait							  1
Space Manager: slave idle wait						  1
SQL*Net message from client						  1
ges remote message							  1
ASM background timer							  1
GCR sleep								  1
smon timer								  1
pmon timer								  1
Streams AQ: qmn coordinator idle wait					  1
asynch descriptor resize						  1
Streams AQ: qmn slave idle wait 					  1
PING									  1
class slave wait							  2
gcs remote message							  2
DIAG idle wait								  2
wait for unread message on broadcast channel				  2
rdbms ipc message							 18
library cache lock							101

select
 distinct
   ses.ksusenum sid, ses.ksuseser serial#, ses.ksuudlna username,KSUSEMNM module,
   ob.kglnaown obj_owner, ob.kglnaobj obj_name
   ,lk.kgllkcnt lck_cnt, lk.kgllkmod lock_mode, lk.kgllkreq lock_req
   , w.state, w.event, w.wait_Time, w.seconds_in_Wait
 from
  x$kgllk lk,  x$kglob ob,x$ksuse ses
  , v$session_wait w
where lk.kgllkhdl in
(select kgllkhdl from x$kgllk where kgllkreq >0 )
and ob.kglhdadr = lk.kgllkhdl
and lk.kgllkuse = ses.addr
and w.sid = ses.indx
order by seconds_in_wait desc
/

      SID    SERIAL# USERNAME	 MODULE 			     OBJ_OWNER	     OBJ_NAME		LCK_CNT  LOCK_MODE   LOCK_REQ STATE		  EVENT 			  WAIT_TIME SECONDS_IN_WAIT
---------- ---------- ---------- ----------------------------------- --------------- --------------- ---------- ---------- ---------- ------------------- ------------------------------ ---------- ---------------
       153	   77		 testbox01		     84 		      0 	 0	    3 WAITING		  library cache lock			  0		  3
	36	  101		 testbox01		     84 		      0 	 0	    3 WAITING		  library cache lock			  0		  3
	16	  137		testbox01		     84 		      0 	 0	    3 WAITING		  library cache lock			  0		  3
       143	  203		 testbox01		     84 		      0 	 0	    3 WAITING		  library cache lock			  0		  3
	26	  261		 testbox01		     84 		      1 	 3	    0 WAITED KNOWN TIME   library cache lock			119		  3

You will notice that seconds_in_wait value will increase . Also ‘alter system kill session’  takes long time to kill session (it seems as if hung). As per bug ,any operation involving row cache lock for that user will not be allowed. I confirmed by trying to reset profile to Default and session was stuck.

Till all these sessions are cleared from database, you will not be allowed to login. This can severy impact applications as fresh connection would not be possible. Pre-existing sessions will work
If you repeat same experiment in 10g, you will not face similar issue.

To disable this behavior i.e no sleep delay set following event

event=”28401 trace name context forever, level 1″

You can set it dynamically too using

alter system set events ‘28401 trace name context forever, level 1’;

To go back to original behavior

alter system set events ‘28401 trace name context off’;

Bug 7715339 mentions that if FAILED_LOGIN_ATTEMPTS is set to some value, then further attempts to log in will then correctly fail immediately with no delay

I tested this and found that this is not entirely true as my script was able to again spawn 101 sessions which were waiting on library cache lock again. Fresh connection with correct password was again stuck. DBA_USERS view was showing that account is locked. Though here the session cleared in quick time as compared to test case involving failed_login_attempts to unlimited.