Thursday, January 15, 2009

Oracle FGA - An Introduction

Hi
I will be briefly discussing about the Fine Grained Auditing (FGA) feature of Oracle 10g.
My next posts will be a 2 part discussion around FGA.
------------------------------------------------------------------------------------------------
Simple explanation on Oracle Fine Grained Auditing

What is Oracle FGA?

A traditonal approach in Oracle Database Auditing allows the users to keep track of the User's Actions they perfom on objects e.g. who selected data, from what table. But you are unaware of what they have selected This is MACRO LEVEL Auditing. This is not the case with the statemens such as UPDATE/DELETE/INSERT. You can keep track of what changes have been made to the database by these statements. Because SELECT does not fall under the DML statement, they neither fire a trigger nor get logged into archieved log. Therefore you cannot get them from LogMiner.
In Oracle 9i, a new feature "Fine Grained Auditing" was introduced. By the help of this feature you can even track individual SELECT statements alongwith the EXACT STATEMENT issued by the users. And in Oracle 10g this facility is extended to incorporate the DML statements as well.

Lets see the following example:-


1.Fine grained auditing extends Oracle standard auditing capabilities by allowing the user to audit actions based on user-defined predicates.
2.It is independent of the AUDIT_TRAIL parameter setting and all audit records are stored in the FGA_LOG$ table, rather than the AUD$ table
3.You need to define the policies to make the auditing. Policies can be defined / dropped / enabled and disabled using DBMS_FGA package.
3.1 ADD_POLICY
3.2 DROP_POLICY
3.3 ENABLE_POLICY
3.4 DISABLE_POLICY
4. Example
-- Apply the policy to the SAL column of the EMP table.
SQL> CONN sys/password AS SYSDBA
-- you require to log in as SYSDBA (this could prove to be a limitation in your
-- requirements)
At the SQL prompt type the following PL/SQL code.
BEGIN
DBMS_FGA.add_policy(
object_schema => 'AUD_TEST',
object_name => 'EMP',
policy_name => 'SAL_AUD',
audit_condition => NULL, -- Equivalent to TRUE
audit_column => 'SAL',
statement_types => 'SELECT,INSERT,UPDATE,DELETE');
END;
/

-- Test the auditing.
SQL>CONN aud_test/password -- here you can provide your normal username and password

SQL>SELECT * FROM emp WHERE empno = 9999;

SQL>INSERT INTO emp (empno, ename, sal) VALUES (9999, 'Riya', 10000);

SQL>UPDATE emp SET sal = 100000 WHERE empno = 9999;

SQL>DELETE emp WHERE empno = 9999;

SQL>ROLLBACK;

-- Check the audit trail.

SQL>CONN sys/password AS SYSDBA
-- you require to log in as SYSDBA (this could prove to be a limitation in your
-- requirements)
SQL>SELECT sql_text FROM dba_fga_audit_trail;
SQL_TEXT
--------------------------------------
SELECT * FROM emp WHERE empno = 9999

INSERT INTO emp (empno, ename, sal) VALUES (9999, 'Riya', 10000)

UPDATE emp SET sal = 100000 WHERE empno = 9999

DELETE emp WHERE empno = 9999

4 rows selected.
Thus as you can see the DBA_FGA_AUDIT_TRAIL table is giving you the complete statement issued by the user to perform an interaction with the database.
------------------------------------------------------------------------------------------------
Columns of DBA_FGA_AUDIT_TRAIL

Columns –

SESSION_ID
Session id of the query
TIMESTAMP
Time of the query
DB_USER
Database username who executes the query
OS_USER
OS username who executes the query
USERHOST
Numeric instance ID for the Oracle instance from which the user is accessing the
database. Used only in environments with distributed file systems and shared database
files (e.g.,clustered Oracle on DEC VAX/VMS clusters)
CLIENT_ID
Client identifier in each Oracle session
EXT_NAME
External name
OBJECT_SCHEMA
Owner of the table or view
OBJECT_NAME
Name of the table or view
POLICY_NAME
Name of Fine Grained Auditing Policy
SCN
SCN of the query
SQL_TEXT
SQL text of the query
SQL_BIND
Bind variable data of the query
COMMENT$TEXT
Comments

Himanshu Dabir
Oracle Certified Professional - DBA

Tuesday, December 2, 2008

Simple tips to TUNE your SQL Statement

As we all know that all RDBMS works on SQL. Oracle database also uses the SQL to perform the DDL, DML, DCL and TCL operations. There are various ways to write SQL statement to achieve the same result. These statements will have their own COST to get the result from the database. Here are few simple tips to tune the SQL statement.
How Oracle processes the SQL statement?
Before we go ahead for the tuning, lets take a look at how Oracle processes the SQL statement. There are two steps involved in processing of SQL statement:
1. Parsing
2. Execution
In the first step, Oracle will break the submitted SQL statement into its component parts, it will then determine the type of the statement i.e. DDL, DML, DCL and then performs a series of checks on the statement.
Let’s quickly take a look at the steps involved in the Parsing of SQL statement:
1. Syntax Check
Oracle will check whether the statement submitted is syntactically correct.
2. Semantic Analysis
If the Syntax is correct, then this step is executed. In this step, Oracle will determine that the table(s), column(s) referred in the SQL statement already exists. Oracle will also determine that the user who has submitted the statement have the sufficient rights to perform the activity. It also performs the AMBIGUITY check like if there is a same column (column name) available in two tables and both these tables are referenced in the FROM clause of the statement. Then Oracle will return a message like "Column ambiguously defined".
3. Decide - Is the statement already parsed by other session?
4. If the statement is already parsed, perform a SOFT PARSE,
5. If the statement is not parsed, perform a HARD PARSE.
Let’s understand what is SOFT PARSE and HARD PARSE.
After completing the first 2 steps (mentioned above) Oracle database now needs to check if the current SQL statement is already executed by any other session. If the statement is already processed, then PARSE operation will skip the "OPTIMIZATION" and "ROW SOURCE GENERATION" steps . If these steps are skipped, then this is called as "SOFT PARSE". This saves a considerable amount of CPU utilization while running the queries.
If the statement that has been submitted is never processed then the PARSE operation will have to perform all the steps before processing the statement. This type of PARSING is known as "HARD PARSE".
It is important that the developers write the queries that will take advantage of SOFT PARSE.
---------------------------------------------------------------------------
Identical SQL
If any two SQL statements needs to be shared within Oracle, those two statements MUST BE EXACTLY MATCHING TO EACH OTHER. If the two statements are EXACT MATCH of each other, Oracle will avoid the parsing step for the subsequent statement.

1. SELECT ENAME FROM EMP WHERE EMPID = 7729;
2. SELECT ENAME FROM emp WHERE EMPID = 7729;
3. SELECT ENAME FROM EMP WHERE EMPID=7729;
4. SELECT ENAME
FROM EMP
WHERE ID=7729;
All the above statement are treated as different statement and will be parsed before executing.
---------------------------------------------------------------------------
Writing Standard SQL
Simple rules and standards will allow more memory sharing within Oracle Database.
1. Use a single case for all SQL verbs
2. Start all SQL verbs on a new line
3. Align the verbs Left or Right within the initial SQL verb
4. Separate all words with a SINGLE space
---------------------------------------------------------------------------
Bind variables
Use bind variables. The values of bind variables need not to be the same for two statements to be considered identical. Bind variables are not substituted until a statement has been successfully parsed.
Sharable SQL
1. SELECT * FROM dept
WHERE deptno = :BND1; Bind value: 7729
2. SELECT * FROM dept
WHERE deptno = :BND1; Bind value: 7345
Non-sharable SQL
1. SELECT * FROM dept
WHERE deptno = 7729;
2. SELECT * FROM dept
WHERE deptno = 7345;
---------------------------------------------------------------------------
Table Aliases
Use table aliases and a standard approach to table aliases. If two identical SQL statements differ because an identical table has two different aliases, then the SQL is different and will not be shared.
Prefix all column names by their aliases when more than one table is involved in a query. By doing this the time required for the parsing is decreased and prevents future syntax errors if someone adds a column to one of the tables with the same name as a column in another table.
---------------------------------------------------------------------------
WHERE Clause
Look out for the WHERE clauses which do not use indexes at all. Even if there is a column on which a index is defined, and that column is referenced by a WHERE clause, Oracle will ignore the index. You can always re-write the WHERE clause to make use of Indexes and return the same result.
Examples:
1.
SELECT acc_nm, amt

FROM bank_account
WHERE SUBSTR (acc_nm,1,7) = 'SAVINGS';
There is an index on "ACC_NM" column. But as we are performing the SUBSTR operation on this column, Oracle will ignore the index. Try using LIKE operator here to get the same result.
SELECT acc_nm, amt
FROM bank_account

WHERE acc_nm LIKE 'SAVINGS%';
---------------------------------------------------------------------------
HAVING Clause
HAVING clause filters rows AFTER ALL the rows are fetched by the query. As long as possible please avoid using HAVING clause in the SELECT statement. Use HAVING clause only and only when you want to perform summary operaitons on columns.
Example:
SELECT deptno, AVG(sal)
FROM emp
GROUP BY deptno
HAVING deptno != 20;
This statement can be written as
SELECT deptno, AVG(sal)
FROM emp
WHERE deptno != 20
GROUP BY deptno;
---------------------------------------------------------------------------
DECODE
Consider using DECODE to avoid having to scan the same rows repetitively or join the same table repetitively. DECODE is not necessarily faster as it depends on your data and the complexity of the resulting query. Also, using DECODE requires you to change your code when new values are allowed in the field.
Example:

SELECT COUNT (a.user_id), a.regionid, b.reg_nm
FROM reg_user RU, region R
WHERE RU.regionid IN (1000,1001,1002,1005)
AND RU.regionid = R.regionid

This statement can be re-written by eliminating the need of joining the region table

SELECT COUNT (user_id),

DECODE(regionid, 1000, 'USA', 1001, 'CANADA', 1002, 'UNITED KINGDOM', 1005, 'INDIA')
FROM reg_user
WHERE a.regionid IN (1000,1001,1002,1005,1006,1010)
---------------------------------------------------------------------------
UNION ALL
Detarmine whether a "UNION ALL" will be sufficient in place of a "UNION".
How UNION works?
The UNION forces all rows returned by each portion of the UNION to be sorted and merged and duplicate to be filtered before the first row is returned.
How UNION works?
UNION ALL simply returns all rows including duplicates and does not have to perform any sort, merge or filter. If you don't bother about the duplicates and/or your tabels do not have duplicates then UNION ALL is much more efficient way to fetch the data from the tables.
---------------------------------------------------------------------------
DISTINCT v/s EXISTS
If DISTINCT qualifier is used on SELECT list for the queires which are used to determine data at the owner side of a 1-M (one to many) relationship. DISTINCT forces Oracle to fetch all rows satisfying the table join and then perform sorting and filtering duplicate values. EXISTS is a faster alternative, because when the subquery has been satisfied once, the Oracle optimizer realizes there is no need to proceed further and the next matching row can be fetched.
Example: Find departments (number and names) who have at-least one employee working in it.
SELECT DISTINCT deptno, dname
FROM dept D, emp E
WHERE D.deptno = E.deptno;
Consider the below query to get the same result:
SELECT deptno, dname
FROM dept D
WHERE EXISTS (
SELECT 'X'
FROM emp E
WHERE E.deptno = D.deptno);
Also please try to avoid DISTINCT with GROUP BY in the SELECT statement
---------------------------------------------------------------------------
Views
Lastly be more importantly, take a look at your Views as well. Views are SELECT statements, and you should not forget to tune them either. The SELECT statement in the views can be tuned in the same way as any other SELECT statement can be tuned.

Monday, December 1, 2008

GOOD IDEAS about Database Backups

I got this article emailed to me from the DBSPECIALIST(www.dbspecialists.com). This article is written by Iggy Fernandez, a member of Database Specialists Team. This article is Iggy's response to a question about best practices for backups.
-------------------------------------------------------------

The following practices apply to all database backups, not just Oracle backups:
1. Establish a service level agreement (SLA) for backups and recovery. Clearly document the maximum amount of data loss that is permitted, backup retention policies, and how much time is allowed for database recovery.
2. Document the backup methodology and have it formally reviewed. Publish the documentation to an audience that includes the database owner.
3. Periodically test the recoverability of backups. Typically, this is done in a testing environment. A recovery test establishes how long it will take to recover the database.
4. Be prepared to recover from user error, for example when a user inadvertently updates or deletes the wrong data.
5. Have a contingency plan that covers damage to the operating system or hardware-be prepared to move the database to another server.
6. Ensure the safety of backups. The traditional method is to send backup tapes offline. Newer methods involve backups to a backup appliance over the network. A combination of near-line (disk) and far-line (tape or network) backups can be used. Near-line backups reduce the time needed for recovery while far-line backups increase the safety of backups. The tape library should not be a single point of failure-ensure that data can be recovered even if the tape library is damaged.
7. Retain multiple backups in case one set of backups is damaged.
8. Backup scripts should incorporate error checking and an adequate amount of logging. The logs should be retained for an appropriate time. Notification of backup failures should be sent to the database administrators. Backup failures should be formally investigated.
9. Reports of backup successes and failures, the amount of data backed up, and the time it took to perform backups should be generated and sent to an audience that includes the database administrators and the database owner.
10. Changes to any aspect of backup procedures, whether temporary or permanent, should be performed under formal change control procedures.

------------------------------------------------------------------------------------------------
The following practices apply specifically to Oracle database backups:
1.
Use Recovery Manager (RMAN) for backups. The advantages of RMAN are so numerous and valuable that it is hard to justify not using it. For example, RMAN checks data blocks for corruption while it is making a backup, single-block recovery is possible with RMAN, and so on. Backups performed using third-party software such as Network Appliance Snapshot can be integrated with RMAN.
2. Prevent unlogged operations in databases that use ARCHIVELOG mode; use the ALTER DATABASE FORCE LOGGING command to do this.
3. Backup all aspects of the database including the archived redo logs (for databases that use ARCHIVELOG mode), the control file, and the parameter file (spfile).
4. Create logical backups (exports) to supplement physical backups. This creates a certain amount of protection from logical damage, such as data entry errors. Use a setting such as FLASHBACK_TIME to ensure the consistency of the exported data.
5. Leverage Oracle's flashback features by increasing the value of UNDO_RETENTION from the default value of 15 minutes to a more appropriate value such as 24 hours. This allows recovery from logical damage without having to resort to physical backups.
6. Databases that run in ARCHIVELOG mode should set LAG_ARCHIVE_TARGET to an appropriate value, such as 15 minutes, to control maximum data loss.
7. Incorporate the use of techniques that check for data corruption. These include initialization parameters, such as DB_BLOCK_CHECKING, DB_BLOCK_CHECKSUM, and DB_LOST_WRITE_UPDATE, and commands that scan the database, such as VALIDATE DATABASE. Oracle Database 11g introduced a single parameter called DB_ULTRA_SAFE that controls the values of DB_BLOCK_CHECKING, DB_BLOCK_CHECKSUM, and DB_LOST_WRITE_UPDATE.

Tuesday, November 11, 2008

Denormalization

Denormalization is the process of attempting to optimize the performance of a database by adding redundant data or by grouping data. In some cases, denormalization helps cover up the inefficiencies inherent in relational database software. A relational normalized database imposes a heavy access load over physical storage of data even if it is well tuned for high performance.
A normalized design will often store different but related pieces of information in separate logical tables (called relations). If these relations are stored physically as separate disk files, completing a database query that draws information from several relations (a join operation) can be slow. If many relations are joined, it may be prohibitively slow. There are two strategies for dealing with this. The preferred method is to keep the logical design normalized, but allow the DBMS to store additional redundant information on disk to optimize query response. In this case it is the DBMS software's responsibility to ensure that any redundant copies are kept consistent. This method is often implemented in SQL as indexed views (MS SQL) or materialized views (Oracle). A view represents information in a format convenient for querying, and the index ensures that queries against the view are optimized.

The more usual approach is to denormalize the logical data design. With care this can achieve a similar improvement in query response, but at a cost—it is now the database designer's responsibility to ensure that the denormalized database does not become inconsistent. This is done by creating rules in the database called constraints, that specify how the redundant copies of information must be kept synchronized. It is the increase in logical complexity of the database design and the added complexity of the additional constraints that make this approach hazardous. Moreover, constraints introduce a trade-off, speeding up reads (SELECT in SQL) while slowing down writes (INSERT, UPDATE, and DELETE). This means a denormalized database under heavy write load may actually offer worse performance than its functionally equivalent normalized counterpart.
A denormalized data model is not the same as a data model that has not been normalized, and denormalization should only take place after a satisfactory level of normalization has taken place and that any required constraints and/or rules have been created to deal with the inherent anomalies in the design. For example, all the relations are in third normal form and any relations with join and multi-valued dependencies are handled appropriately.


Examples of a denormalization techniques include:
materialized views, which may implement the following: storing the count of the "many" objects in a one-to-many relationship as an attribute of the "one" relation adding attributes to a relation from another relation with which it will be joined star schemas which are also known as fact-dimension models and have been extended to snowflake schemas pre-built summarization or OLAP cubes

Database Normalization

Database normalization

Database normalizationisiming, sometimes referred to as canonical synthesis, is a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems, namely data anomalies.
For example, when multiple instances of a given piece of information occur in a table, the possibility exists that these instances will not be kept consistent when the data within the table is updated, leading to a loss of data integrity. A table that is sufficiently normalized is less vulnerable to problems of this kind, because its structure reflects the basic assumptions for when multiple instances of the same information should be represented by a single instance only.
Higher degrees of normalization typically involve more tables and create the need for a larger number of joins, which can reduce performance. Accordingly, less highly normalized tables are typically used in database applications involving many isolated transactions (e.g. an automated teller machine), while more normalized tables tend to be used in database applications that need to map complex relationships between data entities and data attributes (e.g. a reporting application, or a full-text search application).

Database theory describes a table's degree of normalization in terms of normal forms of successively higher degrees of strictness. A table in third normal form (3NF), for example, is consequently in second normal form (2NF) as well; but the reverse is not necessarily the case.

Although the normal forms are often defined informally in terms of the characteristics of tables, rigorous definitions of the normal forms are concerned with the characteristics of mathematical constructs known as relations. Whenever information is represented relationally, it is meaningful to consider the extent to which the representation is normalized.


Contents

1 Problems addressed by normalization
2 Background to normalization: definitions
3 History
4 Normal forms
a. First normal form
b. Second normal form
c. Third normal form
d. Boyce-Codd normal form
e. Fourth normal form
f. Fifth normal form
g. Domain/key normal form
h. Sixth normal form
5 Denormalization
a. Non-first normal form (NF² or N1NF)
6 Further reading
7 Notes and References


Problems addressed by normalization An update anomaly. Employee 519 is shown as having different addresses on different records. An insertion anomaly. Until the new faculty member is assigned to teach at least one course, his details cannot be recorded. A deletion anomaly. All information about Dr. Giddens is lost when he temporarily ceases to be assigned to any courses.A table that is not sufficiently normalized can suffer from logical inconsistencies of various types, and from anomalies involving data operations. In such a table:
The same information can be expressed on multiple records; therefore updates to the table may result in logical inconsistencies. For example, each record in an "Employees' Skills" table might contain an Employee ID, Employee Address, and Skill; thus a change of address for a particular employee will potentially need to be applied to multiple records (one for each of his skills). If the update is not carried through successfully—if, that is, the employee's address is updated on some records but not others—then the table is left in an inconsistent state. Specifically, the table provides conflicting answers to the question of what this particular employee's address is. This phenomenon is known as an update anomaly. There are circumstances in which certain facts cannot be recorded at all. For example, each record in a "Faculty and Their Courses" table might contain a Faculty ID, Faculty Name, Faculty Hire Date, and Course Code—thus we can record the details of any faculty member who teaches at least one course, but we cannot record the details of a newly-hired faculty member who has not yet been assigned to teach any courses. This phenomenon is known as an insertion anomaly. There are circumstances in which the deletion of data representing certain facts necessitates the deletion of data representing completely different facts. The "Faculty and Their Courses" table described in the previous example suffers from this type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses, we must delete the last of the records on which that faculty member appears. This phenomenon is known as a deletion anomaly. Ideally, a relational database table should be designed in such a way as to exclude the possibility of update, insertion, and deletion anomalies. The normal forms of relational database theory provide guidelines for deciding whether a particular design will be vulnerable to such anomalies. It is possible to correct an unnormalized design so as to make it adhere to the demands of the normal forms: this is called normalization. Removal of redundancies of the tables will lead to several tables, with referential integrity restrictions between them.


Normalization typically involves decomposing an unnormalized table into two or more tables that, were they to be combined (joined), would convey exactly the same information as the original table.

Background to normalization:
Definitions:

Functional dependency:

Attribute B has a functional dependency on attribute A i.e. A → B if, for each value of attribute A, there is exactly one value of attribute B. If value of A is repeating in tuples then value of B will also repeat. In our example, Employee Address has a functional dependency on Employee ID, because a particular Employee ID value corresponds to one and only one Employee Address value. (Note that the reverse need not be true: several employees could live at the same address and therefore one Employee Address value could correspond to more than one Employee ID. Employee ID is therefore not functionally dependent on Employee Address.) An attribute may be functionally dependent either on a single attribute or on a combination of attributes. It is not possible to determine the extent to which a design is normalized without understanding what functional dependencies apply to the attributes within its tables; understanding this, in turn, requires knowledge of the problem domain.

For example, an Employer may require certain employees to split their time between two locations, such as New York City and London, and therefore want to allow Employees to have more than one Employee Address. In this case, Employee Address would no longer be functionally dependent on Employee ID. Another way to look at the above is by reviewing basic mathematical functions:
Let F(x) be a mathematical function of one independent variable. The independent variable is analogous to the attribute A. The dependent variable (or the dependent attribute using the lingo above), and hence the term functional dependency, is the value of F(A); A is an independent attribute. As we know, mathematical functions can have only one output. Notationally speaking, it is common to express this relationship in mathematics as F(A) = B; or, B → F(A).


There are also functions of more than one independent variable--commonly, this is referred to as multivariable functions. This idea represents an attribute being functionally dependent on a combination of attributes. Hence, F(x,y,z) contains three independent variables, or independent attributes, and one dependent attribute, namely, F(x,y,z). In multivariable functions, there can only be one output, or one dependent variable, or attribute.

Trivial functional dependency: A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee ID, Employee Address} → {Employee Address} is trivial, as is {Employee Address} → {Employee Address}.

Full functional dependency: An attribute is fully functionally dependent on a set of attributes X if it is functionally dependent on X, and not functionally dependent on any proper subset of X. {Employee Address} has a functional dependency on {Employee ID, Skill}, but not a full functional dependency, because it is also dependent on {Employee ID}.

Transitive dependency: A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y and Y→Z.

Multivalued dependency: A multivalued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows: see the Multivalued Dependency article for a rigorous definition.

Join dependency: A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.

Superkey: A superkey is an attribute or set of attributes that uniquely identifies rows within a table; in other words, two distinct rows are always guaranteed to have distinct superkeys. {Employee ID, Employee Address, Skill} would be a superkey for the "Employees' Skills" table; {Employee ID, Skill} would also be a superkey.

Candidate key: A candidate key is a minimal superkey, that is, a superkey for which we can say that no proper subset of it is also a superkey. {Employee Id, Skill} would be a candidate key for the "Employees' Skills" table.

Non-prime attribute: A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non-prime attribute in the "Employees' Skills" table. Primary key: Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible unique keys. A primary key is a key which the database designer has designated for this purpose.

Normal forms

The normal forms (abbrev. NF) of relational database theory provide criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less vulnerable it is to inconsistencies and anomalies. Each table has a "highest normal form" (HNF): by definition, a table always meets the requirements of its HNF and of all normal forms lower than its HNF; also by definition, a table fails to meet the requirements of any normal form higher than its HNF.
The normal forms are applicable to individual tables; to say that an entire database is in normal form n is to say that all of its tables are in normal form n.


Newcomers to database design sometimes suppose that normalization proceeds in an iterative fashion, i.e. a 1NF design is first normalized to 2NF, then to 3NF, and so on. This is not an accurate description of how normalization typically works. A sensibly designed table is likely to be in 3NF on the first attempt; furthermore, if it is 3NF, it is overwhelmingly likely to have an HNF of 5NF. Achieving the "higher" normal forms (above 3NF) does not usually require an extra expenditure of effort on the part of the designer, because 3NF tables usually need no modification to meet the requirements of these higher normal forms.

Edgar F. Codd originally defined the first three normal forms (1NF, 2NF, and 3NF). These normal forms have been summarized as requiring that all non-key attributes be dependent on "the key, the whole key and nothing but the key". The fourth and fifth normal forms (4NF and 5NF) deal specifically with the representation of many-to-many and one-to-many relationships among attributes. Sixth normal form (6NF) incorporates considerations relevant to temporal databases.


First normal form

A table is in first normal form (1NF) if and only if it represents a relation.Given that database tables embody a relation-like form, the defining characteristic of one in first normal form is that it does not allow duplicate rows or nulls. Simply put, a table with a unique key (which, by definition, prevents duplicate rows) and without any nullable columns is in 1NF.
Note that the restriction on nullable columns as a 1NF requirement, as espoused by Christopher J. Date, et. al., is controversial. This particular requirement for 1NF is a direct contradiction to Dr. Codd's vision of the relational database, in which he stated that "null values" must be supported in a fully relational DBMS in order to represent "missing information and inapplicable information in a systematic way, independent of data type."By redefining 1NF to exclude nullable columns in 1NF, no level of normalization can ever be achieved unless all nullable columns are completely eliminated from the entire database. This is in line with Date's and Darwen's vision of the perfect relational database, but can introduce additional complexities in SQL databases to the point of impracticality.


One requirement of a relation is that every table contains exactly one value for each attribute. This is sometimes expressed as "no repeating groups".
While that statement itself is axiomatic, experts disagree about what qualifies as a "repeating group", in particular whether a value may be a relation value; thus the precise definition of 1NF is the subject of some controversy. Notwithstanding, this theoretical uncertainty applies to relations, not tables. Table manifestations are intrinsically free of variable repeating groups because they are structurally constrained to the same number of columns in all rows.
Put at its simplest; when applying 1NF to a database, every record must be the same length. This means that each record has the same number of fields, and none of them contains a null value.


Second normal form

The criteria for second normal form (2NF) are:
The table must be in 1NF.
None of the non-prime attributes of the table are functionally dependent on a part (proper subset) of a candidate key; in other words, all functional dependencies of non-prime attributes on candidate keys are full functional dependencies.For example, consider an "Employees' Skills" table whose attributes are Employee ID, Employee Name, and Skill; and suppose that the combination of Employee ID and Skill uniquely identifies records within the table. Given that Employee Name depends on only one of those attributes – namely, Employee ID – the table is not in 2NF.
In simple terms, a table is 2NF if it is in 1NF and all fields are dependent on the whole of the primary key, or a relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on each candidate key of the relation. Note that if none of a 1NF table's candidate keys are composite – i.e. every candidate key consists of just one attribute – then we can say immediately that the table is in 2NF. All columns must be a fact about the entire key, and not a subset of the key.

Third normal form

The criteria for third normal form (3NF) are:
The table must be in 2NF. Every non-key attribute must be non-transitively dependent on the primary key. All attributes must rely only on the primary key. So, if a database has a table with columns Student ID, Student, Company, and Company Phone Number, it is not in 3NF. This is because the Phone number relies on the Company. So, for it to be in 3NF, there must be a second table with Company and Company Phone Number columns; the Phone Number column in the first table would be removed.

Fourth normal form

A table is in fourth normal form (4NF) if and only if, for every one of its non-trivial multivalued dependencies X Y, X is a superkey—that is, X is either a candidate key or a superset thereof.
For example, if you can have two phone numbers values and two email address values, then you should not have them in the same table.

Fifth normal form

The criteria for fifth normal form (5NF and also PJ/NF) are:
The table must be in 4NF. There must be no non-trivial join dependencies that do not follow from the key constraints. A 4NF table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.
Domain/key normal formMain article: Domain/key normal formDomain/key normal form (or DKNF) requires that a table not be subject to any constraints other than domain constraints and key constraints.

Sixth normal form

According to the definition by Christopher J. Date and others, who extended database theory to take account of temporal and other interval data, a table is in sixth normal form (6NF) if and only if it satisfies no non-trivial (in the formal sense) join dependencies at all, meaning that the fifth normal form is also satisfied. When referring to "join" in this context it should be noted that Date et al. additionally use generalized definitions of relational operators that also take account of interval data (e.g. from-date to-date) by conceptually breaking them down ("unpacking" them) into atomic units (e.g. individual days), with defined rules for joining interval data, for instance.

Sixth normal form is intended to decompose relation variables to irreducible components. Though this may be relatively unimportant for non-temporal relation variables, it can be important when dealing with temporal variables or other interval data. For instance, if a relation comprises a supplier's name, status, and city, we may also want to add temporal data, such as the time during which these values are, or were, valid (e.g. for historical data) but the three values may vary independently of each other and at different rates. We may, for instance, wish to trace the history of changes to Status.

In a different meaning, sixth normal form may also be used by some to refer to Domain/key normal form (DKNF).

Boyce-Codd normal form

A table is in Boyce-Codd normal form (BCNF) if and only if, for every one of its non-trivial functional dependencies X → Y, X is a superkey—that is, X is either a candidate key or a superset thereof.



Denormalization

Databases intended for Online Transaction Processing (OLTP) are typically more normalized than databases intended for Online Analytical Processing (OLAP). OLTP Applications are characterized by a high volume of small transactions such as updating a sales record at a super market checkout counter. The expectation is that each transaction will leave the database in a consistent state. By contrast, databases intended for OLAP operations are primarily "read mostly" databases. OLAP applications tend to extract historical data that has accumulated over a long period of time. For such databases, redundant or "denormalized" data may facilitate business intelligence applications. Specifically, dimensional tables in a star schema often contain denormalized data. The denormalized or redundant data must be carefully controlled during ETL processing, and users should not be permitted to see the data until it is in a consistent state. The normalized alternative to the star schema is the snowflake schema. It has never been proven that this denormalization itself provides any increase in performance, or if the concurrent removal of data constraints is what increases the performance. In many cases, the need for denormalization has waned as computers and RDBMS software have become more powerful, but since data volumes have generally increased along with hardware and software performance, OLAP databases often still use denormalized schemas.
Denormalization is also used to improve performance on smaller computers as in computerized cash-registers and mobile devices, since these may use the data for look-up only (e.g. price lookups). Denormalization may also be used when no RDBMS exists for a platform (such as Palm), or no changes are to be made to the data and a swift response is crucial.


Non-first normal form (NF² or N1NF)In recognition that denormalization can be deliberate and useful, the non-first normal form is a definition of database designs which do not conform to the first normal form, by allowing "sets and sets of sets to be attribute domains" (Schek 1982). This extension is a (non-optimal) way of implementing hierarchies in relations. Some academics have dubbed this practitioner developed method, "First Ab-normal Form", Codd defined a relational database as using relations, so any table not in 1NF could not be considered to be relational.

(Source: Wikipedia)

Welcome to my BLOG...

This blog will host various documents, references for

1. Database Administration (Oracle)

2. Data Modeling (Erwin)

3. Data Warehousing



You all are welcome to share your thoughts, inputs, concerns on this blog. I will try to publish your inputs, try to answer your queries through this.



Himanshu Dabir

Oracle Certified Professional - DBA

Certified for IBM Information Management Software

Data Modeler