DevStak: entity framework

In our current day the usage of Object-Relational Mappers (ORMs) in almost any data access application is highly prevalent. Whilst popular choices like Entity Framework provide many powerful features and reduce the amount of code required to be written for data access they sometimes can bring problems when an optimal solution is required.

One of the pitfalls with Entity Framework which I find is that when I run a Linq query to get an entity using a primary key and then subsequently run that exact same query again the context will indeed produce two identical SQL statements and run the queries against my DBMS separately.

//query 1
var order1 = context.Orders.Single(order => order.OrderId == 1);

//query 2
var order2 = context.Orders.Single(order => order.OrderId == 1);

Upon the first query executing the order entity is cached internally within the framework. Upon the second query executing another SQL statement is executed and the current order entity in the cache is updated. The references of order1 and order2 would point to the same object in memory. One might argue that the second query shouldn't execute a SQL statement as we simply retrieved the previous object and it's located in the internal cache - in other words the repository Orders doesn't have a caching mechanism where it can return the cached object by primary key instead of executing the same subsequent query. This problem is very specific to the retrieval of ONE entity from the repository using the primary key. As the lamba expression for querying the Orders set changes I can understand that executing a SQL statement is necessary.

A second pitfall I find frustrating is that Entity Framework isn't thread-safe, which means if I run the above two queries concurrently on two different threads then an exception will be thrown by the context. Therefore if I wanted to have a multi-threaded application which ran Linq queries concurrently against my data source then I would have to scope the contexts to their own threads - probably making the context short-lived for each call.

Whilst many developers don't see these as big issues there are times when I needed to use the repository pattern for an aggregate root within my domain model because of these issues.

For a simplied aggregate root of Order:

    public class Order
    {
        public int OrderId { get; set; }
        public DateTime OrderDate { get; set; }
    }

My repository implementation would be as follows:

    public class OrderSqlRepository : IRepository<Order, int>
    {
        string _connStr;

        public OrderSqlRepository(string connStr)
        {
            _connStr = connStr;
        }

        public Order Get(int key)
        {
            using (SqlConnection conn = new SqlConnection(_connStr))
            {
                conn.Open();
                using (SqlCommand cmd = new SqlCommand("usp_GetOrder", conn))
                {
                    cmd.CommandType = System.Data.CommandType.StoredProcedure;
                    cmd.Parameters.AddWithValue("@OrderId", key);
                    using (SqlDataReader reader = cmd.ExecuteReader())
                    {
                        if (reader.Read())
                        {
                            Order order = new Order()
                            {
                                OrderId = Convert.ToInt32(reader["OrderId"]),
                                OrderDate = Convert.ToDateTime(reader["OrderDate"])
                            };
                            return order;
                        }
                    }
                }
            }

            return null;
        }

        public void Save(int key, Order entity)
        {
            using (SqlConnection conn = new SqlConnection(_connStr))
            {
                conn.Open();
                using (SqlCommand cmd = new SqlCommand("usp_SaveOrder", conn))
                {
                    cmd.CommandType = System.Data.CommandType.StoredProcedure;
                    cmd.Parameters.AddWithValue("@OrderId", key);
                    cmd.Parameters.AddWithValue("@OrderDate", entity.OrderDate);
                    cmd.ExecuteNonQuery();
                }
            }
        }
    }

Whilst the above repository implementation doesn't actually solve any of my problems I then wrap this above repository inside a cache repository which uses the same interface through use of a composite pattern. Inevitably allowing me to cache entities within my own internal caching mechanism.

    public class CacheRepository<TEntity, TKey> : IRepository<TEntity, TKey>
        where TEntity : class
    {
        readonly IRepository<TEntity, TKey> _repository;
        readonly ConcurrentDictionary<string, object> _lockers = new ConcurrentDictionary<string, object>();
        readonly MemoryCache _cache;
        readonly CacheItemPolicy _cachePolicy;

        public CacheRepository(IRepository<TEntity, TKey> repository)
            : this(repository, new TimeSpan(0, 5, 0))
        {
        }

        public CacheRepository(IRepository<TEntity, TKey> repository
            , TimeSpan slidingEntityExpiration)
        {
            if (repository == null)
                throw new ArgumentNullException("repository");

            _repository = repository;
            _cache = MemoryCache.Default;

            _cachePolicy = new CacheItemPolicy()
            {
                SlidingExpiration = slidingEntityExpiration,
                RemovedCallback = entry =>
                {
                    /*
                     * when an object is removed from the cache it
                     * would be sensible to keep the locker collection
                     * in sync so that we don't hold unnecessary lock
                     * objects for entities which no longer exist
                     */
                    object removedEntity;
                    _lockers.TryRemove(entry.CacheItem.Key, out removedEntity);
                }
            };
        }

        void SetCachedEntity(string key, TEntity entity)
        {
            /*
             * if we're updating an entity in the cache
             * we want to remove the original first
             */
            _cache.Remove(key);
            if (entity != null)
                _cache.Add(key, entity, _cachePolicy);
        }

        TEntity GetCachedEntity(string key)
        {
            return (TEntity)_cache.Get(key);
        }

        object GetLocker(string key)
        {
            return _lockers.GetOrAdd(key, new object());
        }

        public virtual TEntity Get(TKey key)
        {
            if (key == null)
                throw new ArgumentNullException("key");

            string keystr = key.ToString();

            TEntity cachedEntity;
            lock (GetLocker(keystr))
            {
                //try and retrieve the entity from the local cache
                cachedEntity = GetCachedEntity(keystr);
                if (cachedEntity == null)
                {
                    //object doesn't exist - therefore try and load it from the underlying repository    
                    cachedEntity = _repository.Get(key);
                    if (cachedEntity != null)
                    {
                        //entity loaded successfully therefore set it in the local cache
                        SetCachedEntity(keystr, cachedEntity);
                    }
                }
            }

            return cachedEntity;
        }

        public void Save(TKey key, TEntity entity)
        {
            if (key == null)
                throw new ArgumentNullException("key");

            string keystr = key.ToString();

            lock (GetLocker(keystr))
            {
                /*
                 * try and persist the entity to the underlying
                 * repository prior to updating the local cache
                 */
                _repository.Save(key, entity);

                //save was successful therefore update the local cache
                SetCachedEntity(keystr, entity);
            }
        }
    }

Whilst both repository implementations above are thread-safe, we may run into a problem in the CacheRepository where we may end up modifying the same entity on two separate threads. Take the following ordered thread execution as an example:

Thread 1	Thread 2
var order = repository.Get(1);
	var order = repository.Get(1);
order.CustomerId = 1;
	order.OrderItems.Add(new OrderItem());
repository.Save(order);

As you can see in the sequencing of the two threads above, an OrderItem is added to the same Order which is referenced in the two threads. The execution then reverts to thread #1 where the order is then saved. Thread #1 isn't expecting a new OrderItem to be included as part of the Order which is currently being modified - the only change which should be saved is the change of CustomerId. This problem occurs because both threads retrieve the same instance of the object of Order #1 from the CacheRepository.

To resolve this issue we can override the Get method in a derived class (CloneCacheRepository) to clone the entity which is returned. This means any changes made between entity retrieval and persistence is only visible on the current thread doing the modifications - no other thread can see these changes. Upon the entity being saved in the CacheRepository the internal cache is updated with the modified entity - no override was required for this.

    public class CloneCacheRepository<TEntity, TKey> : CacheRepository<TEntity, TKey>
        where TEntity : class, IClone<TEntity>
    {
        public CloneCacheRepository(IRepository<TEntity, TKey> repository)
            : base(repository, new TimeSpan(0, 5, 0))
        {
        }

        public CloneCacheRepository(IRepository<TEntity, TKey> repository
            , TimeSpan slidingEntityExpiration)
            : base(repository, slidingEntityExpiration)
        {
        }

        public override TEntity Get(TKey key)
        {
            var entity = base.Get(key);
            return entity != null ? entity.DeepClone() : entity;
        }
    }

With the above implementation of the CloneCacheRepository class we now clone the entity which is returned - we implement IClone on the aggregate root entity to allow cloning. This should be a deep clone of all entities which are related and could potentially be modified, otherwise a shallow clone would be sufficient.

When the same thread sequence occurs, the only change which is persisted is that to the CustomerId property. Subsequently, if thread #2 tries to persist it's changes to the repository after thread #1 has completed it should throw an exception if concurrency checking is in place on the database server.

We have a few notable elements to outline in the above CacheRepository class. We are able to add functionality to the repository without having to change an existing concrete implementation, this can be achieved by implementing a new class which derives from the same IRepository interface and then injecting other Repository classes during instantiation. This is beneficial when adhering to SOLID design principles because this follows the open-closed principle whereby we're adding functionality by creating a new implementation which accepts an existing repository for which to wrap - we're not required to modify an existing class which would break this principle.

Although switching to this repository pattern forfeits the ability for us to use Linq it does allow us to use this repository in a multi-threaded environment and also minimises the number of underlying calls to the database as the cache is called prior to the underlying repository used for data access. I have used this repository pattern in micro-services where speed is essential.

The full source code of the Composite Repository Pattern can be found at GitHub.

A problem I recently ran into with Entity Framework 6 was where I wanted to use two features at the same time, specifically the Table-Per-Hierarchy mapping and Table Splitting. However, when using both of then at the same time it caused a conflict with the framework which it was unable to handle.

The following database schema was used for mapping to my domain model where I wanted the Person.PersonType column as a discriminator for my derived types.

My domain model was as follows (PersonType property has been omitted as discriminators cannot be mapped to properties):

class Person
{
 public int PersonId { get; set; }
 public string Name { get; set; }
}

class Customer : Person
{
 public string CustomerNumber { get; set; }}

class Operator : Person
{
 public string OperatorNumber { get; set; }
}

The model mapping I had in place was as follows:

  modelBuilder
    .Entity<Person>()
    .HasKey(n => n.PersonId)
    .Map(map =>
    {
        map.Properties(p => new { p.Name });
        map.ToTable("dbo.Person");
    })
    .Map<Customer>(map =>
    {
        map.Requires("PersonType").HasValue("C");
        map.Properties(p => new { p.CustomerNumber });
        map.ToTable("dbo.Customer");
    });

Which caused the following SQL exception to be thrown:

EntityCommandExecutionException
An error occurred while executing the command definition. See the inner exception for details
Invalid column name 'PersonType'.Invalid column name 'PersonType'.

The SQL which was generated by the entity framework was as follows:

SELECT 
    CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN '0X' WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN '0X0X' ELSE '0X1X' END AS [C1], 
    [Extent1].[PersonId] AS [PersonId], 
    [Extent1].[Name] AS [Name], 
    CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[CustomerNumber] END AS [C2], 
    CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) ELSE [Project2].[OperatorNumber] END AS [C3]
    FROM   [dbo].[Person] AS [Extent1]
    LEFT OUTER JOIN  (SELECT 
        [Extent2].[PersonId] AS [PersonId], 
        [Extent2].[CustomerNumber] AS [CustomerNumber], 
        cast(1 as bit) AS [C1]
        FROM [dbo].[Customer] AS [Extent2]
        WHERE [Extent2].[PersonType] = N'C' ) AS [Project1] ON [Extent1].[PersonId] = [Project1].[PersonId]
    LEFT OUTER JOIN  (SELECT 
        [Extent3].[PersonId] AS [PersonId], 
        [Extent3].[OperatorNumber] AS [OperatorNumber], 
        cast(1 as bit) AS [C1]
        FROM [dbo].[Operator] AS [Extent3]
        WHERE [Extent3].[PersonType] = N'O' ) AS [Project2] ON [Extent1].[PersonId] = [Project2].[PersonId]

As you can see from the generated SQL it is trying to use the discriminator column on our satellite tables (dbo.Operator and dbo.Customer) rather than on our base table (dbo.Person). As of yet I haven’t found a way to change the way in which the entity framework mapping works for the discriminator column to achieve both TPH mappings and Table Splitting. To get around this problem I had to create a view on top of all of the table schemas involved in the hierarchy mapping:

create view dbo.vw_PersonExtended
as

    select
        p.Name, p.PersonId, p.PersonType, c.CustomerNumber, o.OperatorNumber
    from
        dbo.Person p
        left join dbo.Customer c on c.PersonId=p.PersonId
  left join dbo.Operator o on c.PersonId=o.PersonId

And mapping this view to the base class type Person and removing the derived class table mapping as follows:

   modelBuilder
    .Entity<Person>()
    .HasKey(n => n.PersonId)
    .Map(map =>
    {
        map.Properties(p => new { p.Name });
        map.ToTable("dbo.vw_PersonExtended");
    })
    .Map<Customer>(map =>
    {
        map.Requires("PersonType").HasValue("C");
        map.Properties(p => new { p.CustomerNumber });
    });

The derived types Person and Operator are no longer mapped to their individual concrete tables, but instead their values come from the mapping in the view.

Additionally, we have to create either an INSTEAD OF INSERT trigger on top of this view or create a stored procedure to allow our DB context to support persisting new entities.

The stored procedure would be:

create procedure dbo.usp_InsertCustomer
    @Name varchar(50),
    @CustomerNumber varchar(50)
as
begin

        set nocount on
        declare @id int

        insert into dbo.Person (Name, PersonType)
        values (@Name, 'C')
        set @id = scope_identity()

        insert into dbo.Customer (PersonId, CustomerNumber)
        values (@id, @CustomerNumber)

        select @id as PersonId

end

And mapped as follows:

modelBuilder
        .Entity<Customer>()
        .MapToStoredProcedures(map => map.Insert(i => i.HasName("usp_InsertCustomer")));

The obvious drawback to this approach is that you have to create views and stored procedures on the database side when you want to implement the discriminator hierarchy pattern in Entity Framework as well as the plumbing work in the database context mapping. This however shouldn't impede the performance of the implementation, it'll just make the implementation more time consuming.

DevStak

Saturday, 12 March 2016

Composite Repository Pattern

Saturday, 10 January 2015

EF Mapping with Discriminator Column and Concrete Tables for Derived Types