Sunday, December 6, 2009

Performance tuning with Hibernate – Understanding fetching strategies


Sky is the limit when it comes to improve the performance of a hibernate based application. But before jumping to using second level cache, terrcotta etc., there are a few things we need to look at first. Below are a few lessons learned from experience →

Choose the right fetching strategy:

It is very important to understand how hibernate issues SQLs to the DB for different fetching strategies. If hibernate issues too many queries to the database, it can cause significant performance degradation. Let's take an example of department and employee relationship.

We have one to many relationship between dept and employee. Below is how we can translate this association in hibernate →

@Entity

public class Dept implements java.io.Serilizable{


….

….

@OneToMany

private Collection employees;

public Collection getEmployees{

return employees;

}

public void setEmployees(Collection emps{

this.employees = emps;

}


}


Now, lets say we want to get the name of the emplyees in each dept. We will do something like →


Collection depts = (Collection) entityManager.createQuery(“select dept from Dept dept”).getResultSet();

for(Dept dept:depts){

for(Employee emp: dept.getEmployees(){

//process employee

}

}

Looks simple and straightforward. But, if we have N depts, the above example will issue N+1 SQL statements to the database.

One SQL to feth all the departments:

select * from department;

N SQLs to fetch emplyees for N departments.

select * from emplyee where deptid=

….

select * from emplyee where deptid=


Now, that's a lot of calls to the DB. This is what is typically known as N+1 selection problem. There are two ways avoid this.

  • Fetch data in batches

    You define a batch size in the relationship mapping as

    @OneToMany

    @batch-size=10

    The above will fetch employees in batches of 10 departments as →

    select *from employees deptid in (1st dept-id, ….,10th dept-id)

  • Use subselect

@OneToMany

    @subselect

  • This will issue only two SQL statements. It will use the original query (that was used to fetch the dept )as the sub query to get all employees.

    select *from employees deptid in (select id from department)


For ManyToOne and OneToOne mapping, the default fetching strategy is ESGER. Change it to LAZY if you do not want the associated object to be loaded with the main object as hibernate will issue an additional SQL to fetch the related object. If you need the related object, user EAGER-JOIN strategy so that hibernate issues one outer-join statement to retrieve two objects as opposed to two different queries. Remember to do due diligence before using EAGE-FETCH though, if hibernate fetches too many unnecessary data there is a performance overhead due to network traffic, too many objects in memory and the time hibernate takes to translate these data to objects.

The other thing to remember is to always use the same instance of entity manager so that objects can be retrieved from the cache. Hibernate cache is maintained with the entitymanager istance. If you use different instances of entityManager, the same object will be fetched from DB repeatedly.

Bottom line is always turn on the show-sql property to “true” in the development environment and monitor the SQL statements issued by hibernate. I can assure you from my experience that you will get great performance benefit just by optimizing the SQL calls.