Doctrine Partial Objects

Doctrine Partial Objects - A closer look

2/8/17

When it comes to ORMs in PHP, Doctrine is one of the most common libraries and for many developers the first choice for their projects. A Symfony Framework standard installation already has Doctrine as a dependency by default and it can be easily added to other frameworks like Laravel or Zend. While working with a simple CRUD, a developer doesn't even have to think about which database is actually behind the ORM, but when queries get more complex, they have to realize that the database can't be ignored anymore. Not only the query performance can be a bottleneck now, but also performance issues that can come with Doctrine ORM, have to be dealt with.

Reading the documentation shows some nice insights about what can be done and what shouldn't be done with Doctrine ORM and with a little bit of SQL knowledge, Doctrine's own query language DQL can be mastered without much effort. Especially Partial Objects seem to be a special case that needs a lot of reflection by the developer before it can be considered. The Partial Object documentation is just a small part of Doctrine and pretty short in its description, but it has many controversial aspects, that may discourage not to use it at all.

What is a Partial Object?

By default, loading an entity always results in a fully populated object with all its values from the referring table row, that is represented by it. You don't have to know how the columns/properties are mapped when you fetch a single row by its primary key, because the entity manager offers you a simple API for getting the data you want in form of value objects. Then there is DQL, the query language Doctrine offers for fetching data from the database into objects, but compared to SQL, it seems like you can only get all values from the selected rows into entity objects.

The following examples use a class named Blog, that represents a table containing information of blog posts like title and content. To get all information of all blog posts, we use this simple DQL:

<?php

$results = $em->createQuery("SELECT b FROM Blog b");

$results[0]->id; // 1
$results[0]->title; // "My blog post title"
$results[0]->content; // Lorem Ipsum and much more because this is the text of the blog post...

If we just want to create a list with the title and creation date of all blog posts, we don't need the content or other values from the blog table.

A Partial Object offers you to fetch only data of the needed columns. The same class/table is still in use and the hydration only populates the properties which are involved in the query. The DQL for getting Partial Object entities is pretty simple and is slightly different to a common SELECT DQL:

<?php

$results = $em->createQuery("SELECT PARTIAL b.{id, title, creationDate} FROM Blog b");

The key-word PARTIAL is the main difference compared to a regular DQL. The only columns we need are id, title and creationDate so the resulting Blog objects only offer those three values. Because Doctrine is handling entities, we always need the identifier id for a Partial DQL, otherwise a semantical error happens. Other properties from the Blog objects than the selected ones return null.

<?php

$results[0]->id; // 1
$results[0]->title; // "My blog post title"
$results[0]->content; // null

Where is the catch?

As you should know when you use Doctrine, after an entity was fetched from the database and hydrated into an object, it will be kept in the Unit Of Work, that is used inside an EntitiyManager. When you try to get an already fetched entity by its id, there will be no request to the database and the existing object from the Unit Of Work will be returned. That's pretty neat to save a request to the database, but when we use a Partial DQL, like the one before, and try to fetch the same entity by identifier with find(), we'll still get the existing Partial Object.

<?php

$results = $em->createQuery("SELECT PARTIAL b.{id, title, creationDate} FROM Blog b");
$results[0]->content; // null

$blog = $em->find(Blog::class, 1);
$blog->content; // null

$blog->content still returns null because of the Partial query we used at first. If you don't know this behaviour and read the code example, you would came to the wrong conclusion, that find() will return a fully hydrated object. In case you need all values for an entity, you need to refresh it instead.

<?php

$results = $em->createQuery("SELECT PARTIAL b.{id, title, creationDate} FROM Blog b");
$results[0]->content; // null

// ...

$em->refresh($results[0]); // request to the database
$results[0]->content; // Lorem Ipsum and much more because this is the text of the blog post...

If you know that there is a Partial Object, you need to use refresh() on the entity, but this is exactly the real problem: How do you know if an entity is only partially populated? Entity objects of the same type have all the same API and you won't know if it is a Partial Object without backtracking the whole code that was covered by a request. The accessors of the object are always available despite its state.

In addition to this disadvantage for read procedures, there is also one for entity updates, that are supposed to write changes into the database. All properties, that aren't part of the Partial DQL, won't be affected by changes of state. If we try to set a new value to $results[0]->content, Doctrine won't track that change and doesn't update the content on flush().

<?php

// Partial object
$blog->content = 'new value';
$em->flush();
$blog->content; // new value

$em->refresh($blog);
$blog->content;  // Lorem Ipsum and much more because this is the text of the blog post...

Summarized: Partial Objects can change the behaviour of your Doctrine application and it can make it difficult to find a possible bug in your project, that is actually just a feature of Doctrine. So why should anybody use this feature in their code?

Think about the database

Some developers prefer to fetch every column of a table just for the case they'll need it in the future, but as long as you don't need it, you shouldn't get all columns. Don't forget that Doctrine is using a database behind all that code and that there will always be SQL involved, reading and filtering table-rows which consumes memory, IO and can have an impact on query speed depending on the used indices. But this is so well hidden behind Doctrine, that you have to take a look into the behaviour of Doctrine. Furthermore, you have to take a look at how the resulting SQL query looks like that is going to be used for your request to the database.

Let's take a look how the previous Partial DQL looks like as SQL:

SELECT b0_.id AS id_0, b0_.title AS title_1, b0_.creation_date AS creation_date_2 FROM blog b0_;

It's pretty simple as it is, but depending on the amount of data rows, even a query like this can be slow despite the fact that there are just three columns in use. If we create an index with title and creationDate (in that order) for our blog table, MySQL choose an execution plan that is an advantage for tables with a huge amount of rows.

+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
|  1 | SIMPLE      | b0_   | index | NULL          | list_idx | 775     | NULL |    4 | Using index |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+

The SQL query uses an index only scan, that doesn't need to access the table and therefore may run much faster. When we filter by title with a "title = ?" where-clause, the index only can still be used, so a suitable index combined with a Partial DQL can make your database queries more efficient. The primary key id is implicitly used by the index list_idx because the chosen engine InnoDB can do a clustered index with it.

SELECT b0_.id AS id_0, b0_.title AS title_1, b0_.creation_date AS creation_date_2 FROM blog b0_ WHERE b0_.title = 'Foo';

+----+-------------+-------+------+---------------+----------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key      | key_len | ref   | rows | Extra                    |
+----+-------------+-------+------+---------------+----------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | b0_   | ref  | list_idx      | list_idx | 767     | const |    1 | Using where; Using index |
+----+-------------+-------+------+---------------+----------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)

But what if we replace MySQL with another database like PostgreSQL and test the same DQL? Running the same DQL results in the same SQL we experienced with MySQL and we get the same result set, but a difference can be seen when we compare the execution plan.

QUERY PLAN                                 
---------------------------------------------------------------------------
 Index Scan using list_idx on blog b0_  (cost=0.14..8.16 rows=1 width=528)
   Index Cond: ((title)::text = 'Foo'::text)
(2 rows)

It is still a fast index usage of course, but with PostgreSQL it is no index only anymore, even when the same index was created with the same entity mapping like we used above. I'm nitpicking here with this example but my point is, that switching the database behind Doctrine can have different results on the same DQL query.
Just because MySQL seems to use indexes more efficiently for our blog list doesn't mean that the queries are faster than the PostgreSQL ones. Important is, that the configuration and entity mapping should fit to the database you're currently using. Nevertheless, the Partial DQL are in both scenarios more efficient than reading all columns of the selected row.

It's about what you need

You should code for the project state you have, not for the state it is supposed to be. Projects can change during development so you won't be able to see every possible bottleneck in advance that can occur. While you should keep an eye on performance, premature optimizations may result into YAGNI problems that can bloat your project with meaningless or unnecessary code. If you don't get any benefits from Partial Objects for your use case, then there is no reason to use it.

I would like to thank Markus Winand for helping me with some facts about indices and to the Doctrine team for making my time as a developer easier, harder and more interesting.

About Claudio Zizza
I'm working as a software developer for more than 15 years and used many coding languages and frameworks during that time, like PHP (of course), ASP, .NET, Javascript, Symfony, Laminas and more. I also contribute to the open source communities and am a part of the PHP Usergroup in Karlsruhe, Germany.
Follow me on Twitter (@SenseException)