Not long ago I was working on improving the performance of a Spring Boot application. A Spring Boot app that uses Spring Data JPA. Since we had a whole bunch of database writes happening it was necessary that we batch these database writes in order to reduce the communication overhead to the database server and thereby improve performance.
In this post I’m going to describe what I learnt and use an example to show how batching works under the hood when using Spring Data JPA.
Approaches to data access in Spring
There are multiple approaches to implementing a JDBC based persistence layer for a Spring application. JDBC template is the classic Spring JDBC approach and the most popular. This is the most “low level” approach and offers maximum flexibility. Spring Data JPA on the other hand provides an abstraction over the data access layer using JPA(Jakarta Persistence API) and an underlying ORM (e.g. Hibernate). Though not as flexible as the former approach, it does reduce programmer effort and the amount of boilerplate needed to implement a data access layer. These are advantages that help when building a rapid prototype where the data model is likely to change often.
In the rest of this article we will use an example to undertand how batching can be implemented when using Spring Data JPA and also see how it behaves in different scenarios. We’re going to be using Kotlin for this example.
Create entity and repository
Define a simple Employee class for storing Employee objects and annotate it as a JPA entity. We are using our own identifier here as opposed to a generated one. This decision is of some significance for our batching experiment which will be apparent later
Next, create a simple repository for this entity by extending Spring JPA’s CrudRepository.
We can now autowire this repository class in our test to verify that the
save() works as expected. I used the embedded H2 database to be able to test this quickly.
To perform multiple inserts in a single call to the database, the
CrudRepository provides the
saveAll() function. A little digging through the docs will reveal that we also need to set a batch size for this to work.
spring.jpa.properties.hibernate.jdbc.batch_size = 10
To check if batching works, add an additional test case that inserts and updates a list of Employees in the database. When we run this, it seems to work without issues. But we still don’t have any indication of what it really does under the hood.
Add the following properties to your spring application. This will show some verbose information to help us determine if batching is working as expected.
spring.jpa.show-sql = true
spring.jpa.properties.hibernate.generate_statistics = true
We’re also going to use a data-source-proxy to print detailed logs of every query that gets executed.
Behavior of batch insert operations
Two inferences can be made from the detailed logs
- The inserts got batched
- For every INSERT statement, there is also an additional SELECT statement being executed
I found clues to this behavior in Spring’s default entity state detection strategies.
By default, Spring Data JDBC inspects the identifier property of the given entity. If the identifier property is
null, then the entity is assumed to be new. Otherwise, it is assumed to not be new.
In our code since our id field is set to non-null value before persistence, Spring concludes that we are trying to persist an existing object! And hence tries to read the existing value using a SELECT. To work with our manually assigned identifier we will have to go with the second listed approach. i.e. Using
A common pattern for working with manually assigned identifiers is to
- Use a common base class which has a transient flag defaulting to indicate a new instance.
- Use JPA lifecycle callbacks “@PrePersist”and “@PostLoad” to flip that flag on persistence operations. The flag must get flipped to indicate an existing entity after a repository call to
With the updated entity class we no longer see the unnecessary SELECT queries.
Batching UPDATE queries
Update queries do get batched, but like before we can see an addtional SELECT executed for each update query.
Behaviour of Spring Data JPA batching when using custom HQL queries
Add a custom query that will let us update the Employee entity WITHOUT the select that gets included when we use
saveAll(). However this executes as individual statements and not as a batch.
Spring Data JPA makes it simple to add JDBC to a Spring application and reduces boiler plate for implementing commonly used JDBC features. But the higher level of abstraction can make it less flexible compared to JDBC Template when impementing un-common scenarios. For use cases where performance is an important requirement, its necessary to understand how it works under the hood.