I feel a bit sad when looking back to what I have just experienced, but I guess I should be happy, since I have learned a lot. In this post I will share my mistakes and insights to GAE transactions and Entity Groups.
Together with Andreas I'm developing a little sample that consists of 3 interacting applications. Customer, Supplier and Profile apps. User stories for the initial sprint:
- As a customer I want to specify a request for consultants so that I can allocate resources to my project.
- As a salesman (supplier) I want to be notified when a customer enters a request for consultants so that I quickly can create an offer to that request.
- As a salesman I want to offer consultants to a customer so that I can sell our services.
- As a customer I want to see up to date information in the profiles so that I know that it is not obsolete.
I was developing the form enter of the inquiry in the customer app. I saved the form data in an Inquiry object and sent the request to the supplier app using RestTemplate. No problems so far.
We are using the new REST features in Spring 3.0 and have done some adjustments to Sculptor to make it generate JPA code that is compliant with GAE datastore.
Since one inquiry should be sent to many suppliers it didn't feel very scalable to send them all in the form entry request. Therefore I separated the sending to a separate job, which would be invoked by the cron service (later, better with task queue). This is not only more scalable, it is also more fault tolerant, since supplier apps may not be available all the time. By separating it we can easily retry later.
I created a Supplier entity also. In the sendToSuppliers job I got the first problem:
IllegalArgumentException: can't operate on multiple entity groups in a single transaction
Since I had two entities, Inquiry and Supplier and I was using both in the transaction I assumed that it was not allowed to query the Suppliers and update the Inquiries in the same transaction. I based that on the GAE documentation:
All datastore operations in a transaction must operate on entities in the same entity group. This includes querying for entities by ancestor, retrieving entities by key, updating entities, and deleting entities.
That assumption was a fatal mistake that got me on the wrong track. I started to separate the the retrieval of Suppliers and update of Inquiries in separate transactions.
I learned from the documentation that it was possible to disable transactions, but that it was a temporary workaround.
After removing all code except the update of the Inquiries I realized that the Inquiry instances themselves belonged to separate entity groups. I was looping over all Inquiries that had not been sent to suppliers, i.e. I was updating several instances. Of course, they belong to separate entity groups, otherwise it would not scale when the number of objects increase.
Then I redesigned the sending job so that it would only send and update one Inquiry instance. The job will have to be run many times to send all Inquiries.
On the way I learned some more things about GAE datastore:
* A transaction is necessary for some operations, such flush, otherwise; "This operation requires a transaction yet it is not active"
* Queries also require a transaction, otherwise when iterating over the result;
"Object Manager has been closed"
* Modification several times; "can't update the same entity twice in a transaction or operation"
In the end I think the defaults for transactions in Sculptor are alright. Normally we define transaction boundary at the service layer. This is ok for many cases when using GAE also, but one have to design the operations so that they only update one instance (entity group).
There is probably a need for more fine grained transaction control at the repository level. E.g. starting a new transaction for some repository operations. I think we should implement this with @Transactional annotations. Is it possible to mix txAdvice (defaults) with @Transactional (deviations from default)?