Last week, I was exploring how to avoid N+1 selects problem when loading associations or collections for an entity in hibernate. I went down a lot of interesting paths and came up with so many fascinating findings.
How hibernate loads associations and collections??
In hibernate, when fetching an entity that has associations or collections, these can either be loaded along with the entity or the loading can be deferred. This is decided by fetch plan or fetch type and can either be EAGER or LAZY. With fetch type set to EAGER, the association will be loaded along with the entity. When LAZY, an un-initialized proxy for the collection is returned and it will be loaded only when it is accessed.
For my testing, I set the fetch type to EAGER. This can be done either by adding “lazy=false” in the mapping file for the collection or by adding @OneToMany(fetch = FetchType.EAGER) (it can also be @CollectionOfElements, @ManyToOne, etc) on the field corresponding to the collection in the entity class. Since I like hibernate annotations, I took the second approach.
public class Person { private Long id; // primary key private String name; private String address; @OneToMany(fetch = FetchType.EAGER) @Fetch(FetchMode.SELECT) private Set bankAccounts; // Association // rest of the code }
public class Account { private Long id; // primary key private Long accountNumber; private String bankName; private Long personId; // foreign key to Person table // rest of the code }
- JOIN – Uses LEFT OUTER JOIN to fetch the collections
select this_.id as id12_0_, this_.name as name12_0_, this_.address as address12_0_, account17_.personId as personId17_0_, account17_.id as id17_0_, account17_.accountNumber as accountNumber17_0_, account17_.bankName as bankName17_0_ from Person this_ left outer join Account account17_ on this_.id = account17_.personId where this_.id = ?
With this fetch mode, you will end up with duplicates.
- SELECT – First fetches the ids and then issues selects for all the ids
select account17_.id as personId17_0_ from Account account17_ where account17_.personId = ? select account17_.personId as personId17_0_, account17_.id as id17_0_, account17_.accountNumber as accountNumber17_0_, account17_.bankName as bankName17_0_ from Account account17_ where account17_.id = ? . . .
- SUBSELECT – Uses a subquery to fetch the collections. (Not used much)
I set my fetch mode as SELECT.
N+1 selects problem:
As I mentioned before, when the fetch mode is set to SELECT, hibernate first fetches the ids from the child table and then issues select queries for all the ids one by one. The problem with this approach is that if there are N ids, then there will be N + 1 selects (1 to fetch the ids and N for fetching the objects from the child table) necessary to fetch the collection. This is called N + 1 select problem.
In the previous example, if there are 5 Accounts (with ids = 51,52,53,54,55) for a Person (id = 10),
// SELECT to fetch the Account ids select account17_.id as personId17_0_ from Account account17_ where account17_.personId = 5; // SELECT for each Account id select account17_.personId as personId17_0_, account17_.id as id17_0_, account17_.accountNumber as accountNumber17_0_, account17_.bankName as bankName17_0_ from Account account17_ where account17_.id = 51; select account17_.personId as personId17_0_, account17_.id as id17_0_, account17_.accountNumber as accountNumber17_0_, account17_.bankName as bankName17_0_ from Account account17_ where account17_.id = 52; select account17_.personId as personId17_0_, account17_.id as id17_0_, account17_.accountNumber as accountNumber17_0_, account17_.bankName as bankName17_0_ from Account account17_ where account17_.id = 53; select account17_.personId as personId17_0_, account17_.id as id17_0_, account17_.accountNumber as accountNumber17_0_, account17_.bankName as bankName17_0_ from Account account17_ where account17_.id = 54; select account17_.personId as personId17_0_, account17_.id as id17_0_, account17_.accountNumber as accountNumber17_0_, account17_.bankName as bankName17_0_ from Account account17_ where account17_.id = 55;
The total number of queries is 6 for loading the bank accounts. This is clearly inefficient because of so many round trips to the database.
Batch fetching in hibernate:
Hibernate can fetch the collections in batch. This is driven by a global property called “default_batch_fetch_size” or by annotation @BatchSize. The default value is 1 which explains why hibernate issues a select query for each id. Let’s set the batch size to 3.
@OneToMany(fetch = FetchType.EAGER) @Fetch(FetchMode.SELECT) @BatchSize(size = 3) // Batch fetch size private Set bankAccounts;
Now, if we load the same Person entity,
// SELECT to fetch the Account ids select account17_.id as personId17_0_ from Account account17_ where account17_.personId = 5; // SELECT with IN clause select account17_.personId as personId17_0_, account17_.id as id17_0_, account17_.accountNumber as accountNumber17_0_, account17_.bankName as bankName17_0_ from Account account17_ where account17_.id IN (51, 52, 53); select account17_.personId as personId17_0_, account17_.id as id17_0_, account17_.accountNumber as accountNumber17_0_, account17_.bankName as bankName17_0_ from Account account17_ where account17_.id IN (54, 55);
Now, hibernate issues select queries with IN clause with the number of parameters determined by the batch size. In this case, the number of queries got reduced to 3!!!! We can increase the batch size to 5 to load the collection in 2 queries. To generalize, the number of queries reduced from N + 1 to N/M + 1 where M is the batch size. I started to feel like I was in cloud nine after doing this optimization because it reduced our database traffic to a greater extent 😃
We must tune the batch size by trying out various values and see what fits our needs. If it is very low, we might not get much improvement. We must be very careful when choosing a higher value. Some databases like MySQL does not allow more than 1000 values in IN clause. We will end up with SQL Exceptions or even Out Of Memory in some cases.
Mischievous default_batch_fetch_size:
In applications with large number of entities that have associations/collections, it is not wise to add the @BatchSize annotation in each and every entity. It is better to set the “hibernate.default_batch_fetch_size” in configuration file to take benefit.
In my application, there was more than 100 different entity classes but only 20 were the most commonly used. After witnessing the improvement by adding the annotation on a single entity, I have decided to set the global property to a value of 10. You can also have the annotation if you want to override the global batch fetch size for a particular entity. When I started the application, it failed with out of memory error (OOM). The heap size of the application is 512 MB.
Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2367) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415) at java.lang.StringBuffer.append(StringBuffer.java:237) at org.hibernate.sql.Delete.toStatementString(Delete.java:58) at org.hibernate.persister.collection.BasicCollectionPersister.generateDeleteString(BasicCollectionPersister.java:92) at org.hibernate.persister.collection.AbstractCollectionPersister.(AbstractCollectionPersister.java:491) at org.hibernate.persister.collection.BasicCollectionPersister.(BasicCollectionPersister.java:74) at org.hibernate.persister.PersisterFactory.createCollectionPersister(PersisterFactory.java:104) at org.hibernate.impl.SessionFactoryImpl.(SessionFactoryImpl.java:307) at org.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:1872)
Before diving into what caused the OOM, I tried increasing the heap size to 1 GB. Voila!! The server started. But the consumed heap memory is around 800 MB which is clearly a bad sign. Before setting this property, the utilized heap would be around 350 MB. The size increased to more than double the amount without default_batch_fetch_size.
Sherlocking:
I tried starting the application again with 512 MB. I took many thread dumps this time to find out what is causing the out of memory. All of them point to the same line org.hibernate.loader.entity.BatchingEntityLoader.createBatchingEntityLoader(BatchingEntityLoader.java:128)
"main" prio=10 tid=0x00007f5388009800 nid=0x193be waiting on condition [0x00007f538ffd7000] java.lang.Thread.State: RUNNABLE at java.util.Arrays.copyOfRange(Arrays.java:2694) at java.lang.String.(String.java:203) at java.lang.StringBuilder.toString(StringBuilder.java:405) at org.hibernate.persister.entity.AbstractEntityPersister.selectFragment(AbstractEntityPersister.java:1016) at org.hibernate.loader.AbstractEntityJoinWalker.initStatementString(AbstractEntityJoinWalker.java:124) at org.hibernate.loader.AbstractEntityJoinWalker.initStatementString(AbstractEntityJoinWalker.java:109) at org.hibernate.loader.AbstractEntityJoinWalker.initAll(AbstractEntityJoinWalker.java:91) at org.hibernate.loader.entity.EntityJoinWalker.(EntityJoinWalker.java:75) at org.hibernate.loader.entity.EntityLoader.(EntityLoader.java:109) at org.hibernate.loader.entity.EntityLoader.(EntityLoader.java:71) at org.hibernate.loader.entity.BatchingEntityLoader.createBatchingEntityLoader(BatchingEntityLoader.java:128) at org.hibernate.persister.entity.AbstractEntityPersister.createEntityLoader(AbstractEntityPersister.java:1914) at org.hibernate.persister.entity.AbstractEntityPersister.createEntityLoader(AbstractEntityPersister.java:1937) at org.hibernate.persister.entity.AbstractEntityPersister.createLoaders(AbstractEntityPersister.java:3246) at org.hibernate.persister.entity.AbstractEntityPersister.postInstantiate(AbstractEntityPersister.java:3192) at org.hibernate.persister.entity.SingleTableEntityPersister.postInstantiate(SingleTableEntityPersister.java:728) at org.hibernate.impl.SessionFactoryImpl.(SessionFactoryImpl.java:348) at org.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:1872)
Also, I took live histograms of objects in the memory using jmap.
num #instances #bytes class name ---------------------------------------------- 1: 727727 235191336 [C 2: 1949160 48144472 [Ljava.lang.String; 3: 171172 21921824 4: 171172 21581600 5: 727337 17456088 java.lang.String 6: 13770 16400304 7: 331148 10596736 java.util.HashMap$Entry 8: 13770 10483088 9: 96859 9298464 org.hibernate.loader.entity.EntityLoader 10: 125190 8366608 [[Ljava.lang.String; 11: 140946 6765408 java.util.HashMap 12: 8927 6644608 13: 145303 5812120 java.util.LinkedHashMap$Entry 14: 52938 5595736 [Ljava.util.HashMap$Entry; 15: 47148 4732600 [B 16: 110339 4413560 org.hibernate.loader.DefaultEntityAliases 17: 41846 3347680 java.lang.reflect.Method 18: 44019 3169368 java.lang.reflect.Field 19: 97394 3116608 org.hibernate.LockOptions 20: 99327 2412328 [Lorg.hibernate.type.EntityType; 21: 99327 2412328 [Lorg.hibernate.LockMode; 22: 99326 2412304 [Lorg.hibernate.persister.entity.Loadable; 23: 99326 2412304 [Lorg.hibernate.loader.EntityAliases; 24: 41633 2331448 java.util.LinkedHashMap . . . . 60: 9685 309920 org.hibernate.loader.entity.BatchingEntityLoader
There were so many instances of String and hibernate classes like EntityLoader, BatchingEntityLoader, etc.
Without batching (default_batch_fetch_size = 1), I checked the memory histogram and there were very few instances of EntityLoader and interestingly no instance of BatchingEntityLoader could be found even though I had the @BatchSize annotation on an entity (I guess that the annotations behave differently from the global property). This clearly indicates that setting the hibernate.default_batch_fetch_size to a value more than 1 has some memory effects.
Batch fetching algorithm in hibernate:
After googling and reading a lot of posts in StackOverFlow (SO), I finally found the reason for this issue in this SO post. It is attributed to the way batch fetching algorithm is implemented in hibernate.
From the post,
Suppose you want to fetch a collection of 119 elements and your batch size is 20. This could be fetched in batches as 20, 20, 20, 20, 20, 19. This requires two different prepared statements – one with 20 ‘?’ in the IN clause and the other with 19 ‘?’
eg.
SELECT * FROM Table2 WHERE id IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?); // 20 ‘?’ IN clause
SELECT * FROM Table2 WHERE id IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?); // 19 ‘?’ IN clause
The IN clause can be dynamically constructed at the time of the query based on the batch size and the number of elements to be fetched. But this can become overhead if the query is hot. So, to avoid the construction of query string at run time, during startup, hibernate reads the mapping metadata and creates 11 batch loaders internally. Each loader knows how many proxies(?) it can initialize: 20, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1. The goal is to minimize the memory consumption for loader creation and to create enough loaders that every possible batch fetch can be produced. Another goal is to minimize the number of SQL SELECTs, obviously. Hence, for fetching 119 elements, the applied batch loaders are five times 20, one time 10, and one time 9, automatically selected by Hibernate.
So, if the batch fetch size is 1, it is not considered as batching and hibernate creates only EntityLoaders (This explains why I did not find even a single instance of BatchingEntityLoader) . Since I gave a batch size of 10, hibernate creates a BatchingEntityLoader which comprises of many EntityLoaders with different batch sizes like 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 (x10 EntityLoaders compared to the previous case). The worst part is that a separate BatchingEntityLoader is created for each entity and for each LockMode (Don’t even try to find the count by multiplication!!!!!)
Hibernate batch size vs Heap memory usage:
I was experimenting with different batch sizes to see how they affect the heap usage.
Batch fetch size | Heap memory usage(MB) | EntityLoader instances | EntityLoader memory usage(MB) | String memory usage(MB) | char[] memory usage(MB) |
---|---|---|---|---|---|
1 | 363 | 13864 | 1.3 | 13 | 81.5 |
2 | 404 | 27724 | 2.6 | 14.5 | 109 |
3 | 446 | 41584 | 4 | 16.1 | 137 |
4 | 493 | 55444 | 5.3 | 17.7 | 166 |
5 | 532 | 69304 | 6.6 | 19.2 | 194 |
7 | 621 | 97024 | 9.3 | 22.4 | 253 |
10 | 753 | 138604 | 13.3 | 27.1 | 342 |
15 | 797 | 152464 | 14.6 | 28.6 | 373 |
20 | 796 | 152464 | 14.6 | 28.6 | 374 |
30 | 836 | 166324 | 15.9 | 30.2 | 408 |
50 | 889 | 180184 | 17.3 | 31.7 | 446 |
There is significant increase in heap memory usage and number of EntityLoader instances when the batch size is increased from 1 to 10 but not much from 10 to 50. The increase seems to be mainly attributed to the huge number of String and char[] instances. This suggests that the hibernate optimizes runtime query construction by precomputing the query string for various batch sizes and keeping them in memory.
I found a thread in hibernate mailing list about this issue but no fix. http://hibernate-dev.jboss.narkive.com/GHa1VLeR/memory-consumption#post31
Upto the batch size of 3, I was able to start the application without OOM.
What is new in hibernate 4.2:
I was using hibernate 3.6.10.Final for my testing. In hibernate 4.2, the batch fetching logic has been changed. (https://docs.jboss.org/hibernate/orm/4.2/manual/en-US/html/ch20.html#performance-fetching-batch). There is an additional property called BatchFetchStyle (hibernate.batch_fetch_style) that determines how the EntityLoaders are instantiated and used. BatchStyle can be LEGACY, PADDED and DYNAMIC.
- LEGACY follows the old approach (multiple EntityLoaders with different batch size)
- PADDED is kind of similar to LEGACY but hibernate will use the next-bigger batch size and pads the extra identifier placeholders i.e using the same value multiple times.
- DYNAMIC constructs the IN clause with required number of proxies at query time i.e memory efficient.
I wanted to test the new batching strategies. But there seems to be a lot of code to change before upgrading from hibernate 3.6 to 4.2. I could not do mainly because of method signature modifications in 4.2. Also I found it difficult from Spring perspective. So I did not proceed after making few changes.
Even though the new version of hibernate provided new ways for batch fetching associations/collections, there is still a tradeoff between memory and runtime efficiency. Another approach would be create loaders for the entities when they are accessed for the first time (LAZY) and keep them in memory so that they can be used for the subsequent invocations. This still could result in OOM but only if we are going to load all the entity classes during the life time of the application. In my case, it would definitely solve the issue.