When it comes to ORMs in PHP, Doctrine is one of the most common libraries and for many developers the first choice for their projects. A Symfony Framework standard installation already has Doctrine as a dependency by default and it can be easily added to other frameworks like Laravel or Zend. While working with a simple CRUD, a developer doesn't even have to think about which database is actually behind the ORM, but when queries get more complex, they have to realize that the database can't be ignored anymore. Not only the query performance can be a bottleneck now, but also performance issues that can come with Doctrine ORM, have to be dealt with.
Reading the documentation shows some nice insights about what can be done and what shouldn't be done with Doctrine ORM and with a little bit of SQL knowledge, Doctrine's own query language DQL can be mastered without much effort. Especially Partial Objects seem to be a special case that needs a lot of reflection by the developer before it can be considered. The Partial Object documentation is just a small part of Doctrine and pretty short in its description, but it has many controversial aspects, that may discourage not to use it at all.
Blog
, that represents a table containing information of blog posts like title and content. To get all information of all blog posts, we use this simple DQL:<?php
$results = $em->createQuery("SELECT b FROM Blog b");
$results[0]->id; // 1
$results[0]->title; // "My blog post title"
$results[0]->content; // Lorem Ipsum and much more because this is the text of the blog post...
If we just want to create a list with the title and creation date of all blog posts, we don't need the content or other values from the blog table.<?php
$results = $em->createQuery("SELECT PARTIAL b.{id, title, creationDate} FROM Blog b");
The key-word PARTIAL is the main difference compared to a regular DQL. The only columns we need are id, title and creationDate so the resulting Blog
objects only offer those three values. Because Doctrine is handling entities, we always need the identifier id for a Partial DQL, otherwise a semantical error happens. Other properties from the Blog
objects than the selected ones return null
.<?php
$results[0]->id; // 1
$results[0]->title; // "My blog post title"
$results[0]->content; // null
EntitiyManager
. When you try to get an already fetched entity by its id, there will be no request to the database and the existing object from the Unit Of Work will be returned. That's pretty neat to save a request to the database, but when we use a Partial DQL, like the one before, and try to fetch the same entity by identifier with find()
, we'll still get the existing Partial Object.<?php
$results = $em->createQuery("SELECT PARTIAL b.{id, title, creationDate} FROM Blog b");
$results[0]->content; // null
$blog = $em->find(Blog::class, 1);
$blog->content; // null
$blog->content
still returns null
because of the Partial query we used at first. If you don't know this behaviour and read the code example, you would came to the wrong conclusion, that find()
will return a fully hydrated object. In case you need all values for an entity, you need to refresh it instead.<?php
$results = $em->createQuery("SELECT PARTIAL b.{id, title, creationDate} FROM Blog b");
$results[0]->content; // null
// ...
$em->refresh($results[0]); // request to the database
$results[0]->content; // Lorem Ipsum and much more because this is the text of the blog post...
If you know that there is a Partial Object, you need to use refresh()
on the entity, but this is exactly the real problem: How do you know if an entity is only partially populated? Entity objects of the same type have all the same API and you won't know if it is a Partial Object without backtracking the whole code that was covered by a request. The accessors of the object are always available despite its state.$results[0]->content
, Doctrine won't track that change and doesn't update the content on flush()
.<?php
// Partial object
$blog->content = 'new value';
$em->flush();
$blog->content; // new value
$em->refresh($blog);
$blog->content; // Lorem Ipsum and much more because this is the text of the blog post...
Summarized: Partial Objects can change the behaviour of your Doctrine application and it can make it difficult to find a possible bug in your project, that is actually just a feature of Doctrine. So why should anybody use this feature in their code?SELECT b0_.id AS id_0, b0_.title AS title_1, b0_.creation_date AS creation_date_2 FROM blog b0_;
It's pretty simple as it is, but depending on the amount of data rows, even a query like this can be slow despite the fact that there are just three columns in use. If we create an index with title and creationDate (in that order) for our blog table, MySQL choose an execution plan that is an advantage for tables with a huge amount of rows.+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
| 1 | SIMPLE | b0_ | index | NULL | list_idx | 775 | NULL | 4 | Using index |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
The SQL query uses an index only scan, that doesn't need to access the table and therefore may run much faster. When we filter by title with a "title = ?" where-clause, the index only can still be used, so a suitable index combined with a Partial DQL can make your database queries more efficient. The primary key id is implicitly used by the index list_idx because the chosen engine InnoDB can do a clustered index with it.SELECT b0_.id AS id_0, b0_.title AS title_1, b0_.creation_date AS creation_date_2 FROM blog b0_ WHERE b0_.title = 'Foo';
+----+-------------+-------+------+---------------+----------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+----------+---------+-------+------+--------------------------+
| 1 | SIMPLE | b0_ | ref | list_idx | list_idx | 767 | const | 1 | Using where; Using index |
+----+-------------+-------+------+---------------+----------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)
But what if we replace MySQL with another database like PostgreSQL and test the same DQL? Running the same DQL results in the same SQL we experienced with MySQL and we get the same result set, but a difference can be seen when we compare the execution plan.QUERY PLAN
---------------------------------------------------------------------------
Index Scan using list_idx on blog b0_ (cost=0.14..8.16 rows=1 width=528)
Index Cond: ((title)::text = 'Foo'::text)
(2 rows)
It is still a fast index usage of course, but with PostgreSQL it is no index only anymore, even when the same index was created with the same entity mapping like we used above. I'm nitpicking here with this example but my point is, that switching the database behind Doctrine can have different results on the same DQL query.