Saturday, 12 March 2016

Composite Repository Pattern

In our current day the usage of Object-Relational Mappers (ORMs) in almost any data access application is highly prevalent.  Whilst popular choices like Entity Framework provide many powerful features and reduce the amount of code required to be written for data access they sometimes can bring problems when an optimal solution is required.

One of the pitfalls with Entity Framework which I find is that when I run a Linq query to get an entity using a primary key and then subsequently run that exact same query again the context will indeed produce two identical SQL statements and run the queries against my DBMS separately.

//query 1
var order1 = context.Orders.Single(order => order.OrderId == 1);

//query 2
var order2 = context.Orders.Single(order => order.OrderId == 1);


Upon the first query executing the order entity is cached internally within the framework.  Upon the second query executing another SQL statement is executed and the current order entity in the cache is updated.  The references of order1 and order2 would point to the same object in memory.  One might argue that the second query shouldn't execute a SQL statement as we simply retrieved the previous object and it's located in the internal cache - in other words the repository Orders doesn't have a caching mechanism where it can return the cached object by primary key instead of executing the same subsequent query.  This problem is very specific to the retrieval of ONE entity from the repository using the primary key.  As the lamba expression for querying the Orders set changes I can understand that executing a SQL statement is necessary.

A second pitfall I find frustrating is that Entity Framework isn't thread-safe, which means if I run the above two queries concurrently on two different threads then an exception will be thrown by the context.  Therefore if I wanted to have a multi-threaded application which ran Linq queries concurrently against my data source then I would have to scope the contexts to their own threads - probably making the context short-lived for each call.

Whilst many developers don't see these as big issues there are times when I needed to use the repository pattern for an aggregate root within my domain model because of these issues.

For a simplied aggregate root of Order:

    public class Order
    {
        public int OrderId { get; set; }
        public DateTime OrderDate { get; set; }
    }

My repository implementation would be as follows:

    public class OrderSqlRepository : IRepository<Order, int>
    {
        string _connStr;

        public OrderSqlRepository(string connStr)
        {
            _connStr = connStr;
        }

        public Order Get(int key)
        {
            using (SqlConnection conn = new SqlConnection(_connStr))
            {
                conn.Open();
                using (SqlCommand cmd = new SqlCommand("usp_GetOrder", conn))
                {
                    cmd.CommandType = System.Data.CommandType.StoredProcedure;
                    cmd.Parameters.AddWithValue("@OrderId", key);
                    using (SqlDataReader reader = cmd.ExecuteReader())
                    {
                        if (reader.Read())
                        {
                            Order order = new Order()
                            {
                                OrderId = Convert.ToInt32(reader["OrderId"]),
                                OrderDate = Convert.ToDateTime(reader["OrderDate"])
                            };
                            return order;
                        }
                    }
                }
            }

            return null;
        }

        public void Save(int key, Order entity)
        {
            using (SqlConnection conn = new SqlConnection(_connStr))
            {
                conn.Open();
                using (SqlCommand cmd = new SqlCommand("usp_SaveOrder", conn))
                {
                    cmd.CommandType = System.Data.CommandType.StoredProcedure;
                    cmd.Parameters.AddWithValue("@OrderId", key);
                    cmd.Parameters.AddWithValue("@OrderDate", entity.OrderDate);
                    cmd.ExecuteNonQuery();
                }
            }
        }
    }

Whilst the above repository implementation doesn't actually solve any of my problems I then wrap this above repository inside a cache repository which uses the same interface through use of a composite pattern.  Inevitably allowing me to cache entities within my own internal caching mechanism.

    public class CacheRepository<TEntity, TKey> : IRepository<TEntity, TKey>
        where TEntity : class
    {
        readonly IRepository<TEntity, TKey> _repository;
        readonly ConcurrentDictionary<string, object> _lockers = new ConcurrentDictionary<string, object>();
        readonly MemoryCache _cache;
        readonly CacheItemPolicy _cachePolicy;

        public CacheRepository(IRepository<TEntity, TKey> repository)
            : this(repository, new TimeSpan(0, 5, 0))
        {
        }

        public CacheRepository(IRepository<TEntity, TKey> repository
            , TimeSpan slidingEntityExpiration)
        {
            if (repository == null)
                throw new ArgumentNullException("repository");

            _repository = repository;
            _cache = MemoryCache.Default;

            _cachePolicy = new CacheItemPolicy()
            {
                SlidingExpiration = slidingEntityExpiration,
                RemovedCallback = entry =>
                {
                    /*
                     * when an object is removed from the cache it
                     * would be sensible to keep the locker collection
                     * in sync so that we don't hold unnecessary lock
                     * objects for entities which no longer exist
                     */
                    object removedEntity;
                    _lockers.TryRemove(entry.CacheItem.Key, out removedEntity);
                }
            };
        }

        void SetCachedEntity(string key, TEntity entity)
        {
            /*
             * if we're updating an entity in the cache
             * we want to remove the original first
             */
            _cache.Remove(key);
            if (entity != null)
                _cache.Add(key, entity, _cachePolicy);
        }

        TEntity GetCachedEntity(string key)
        {
            return (TEntity)_cache.Get(key);
        }

        object GetLocker(string key)
        {
            return _lockers.GetOrAdd(key, new object());
        }

        public virtual TEntity Get(TKey key)
        {
            if (key == null)
                throw new ArgumentNullException("key");

            string keystr = key.ToString();

            TEntity cachedEntity;
            lock (GetLocker(keystr))
            {
                //try and retrieve the entity from the local cache
                cachedEntity = GetCachedEntity(keystr);
                if (cachedEntity == null)
                {
                    //object doesn't exist - therefore try and load it from the underlying repository    
                    cachedEntity = _repository.Get(key);
                    if (cachedEntity != null)
                    {
                        //entity loaded successfully therefore set it in the local cache
                        SetCachedEntity(keystr, cachedEntity);
                    }
                }
            }

            return cachedEntity;
        }

        public void Save(TKey key, TEntity entity)
        {
            if (key == null)
                throw new ArgumentNullException("key");

            string keystr = key.ToString();

            lock (GetLocker(keystr))
            {
                /*
                 * try and persist the entity to the underlying
                 * repository prior to updating the local cache
                 */
                _repository.Save(key, entity);

                //save was successful therefore update the local cache
                SetCachedEntity(keystr, entity);
            }
        }
    }

Whilst both repository implementations above are thread-safe, we may run into a problem in the CacheRepository where we may end up modifying the same entity on two separate threads.  Take the following ordered thread execution as an example:

Thread 1 Thread 2
var order = repository.Get(1);
var order = repository.Get(1);
order.CustomerId = 1;
order.OrderItems.Add(new OrderItem());
repository.Save(order);

As you can see in the sequencing of the two threads above, an OrderItem is added to the same Order which is referenced in the two threads.  The execution then reverts to thread #1 where the order is then saved.  Thread #1 isn't expecting a new OrderItem to be included as part of the Order which is currently being modified - the only change which should be saved is the change of CustomerId.  This problem occurs because both threads retrieve the same instance of the object of Order #1 from the CacheRepository.

To resolve this issue we can override the Get method in a derived class (CloneCacheRepository) to clone the entity which is returned.  This means any changes made between entity retrieval and persistence is only visible on the current thread doing the modifications - no other thread can see these changes.  Upon the entity being saved in the CacheRepository the internal cache is updated with the modified entity - no override was required for this.

    public class CloneCacheRepository<TEntity, TKey> : CacheRepository<TEntity, TKey>
        where TEntity : class, IClone<TEntity>
    {
        public CloneCacheRepository(IRepository<TEntity, TKey> repository)
            : base(repository, new TimeSpan(0, 5, 0))
        {
        }

        public CloneCacheRepository(IRepository<TEntity, TKey> repository
            , TimeSpan slidingEntityExpiration)
            : base(repository, slidingEntityExpiration)
        {
        }

        public override TEntity Get(TKey key)
        {
            var entity = base.Get(key);
            return entity != null ? entity.DeepClone() : entity;
        }
    }

With the above implementation of the CloneCacheRepository class we now clone the entity which is returned - we implement IClone on the aggregate root entity to allow cloning.  This should be a deep clone of all entities which are related and could potentially be modified, otherwise a shallow clone would be sufficient.

When the same thread sequence occurs, the only change which is persisted is that to the CustomerId property.  Subsequently, if thread #2 tries to persist it's changes to the repository after thread #1 has completed it should throw an exception if concurrency checking is in place on the database server.

We have a few notable elements to outline in the above CacheRepository class.  We are able to add functionality to the repository without having to change an existing concrete implementation, this can be achieved by implementing a new class which derives from the same IRepository interface and then injecting other Repository classes during instantiation.  This is beneficial when adhering to SOLID design principles because this follows the open-closed principle whereby we're adding functionality by creating a new implementation which accepts an existing repository for which to wrap - we're not required to modify an existing class which would break this principle.

Although switching to this repository pattern forfeits the ability for us to use Linq it does allow us to use this repository in a multi-threaded environment and also minimises the number of underlying calls to the database as the cache is called prior to the underlying repository used for data access.  I have used this repository pattern in micro-services where speed is essential.

The full source code of the Composite Repository Pattern can be found at GitHub.

Tuesday, 9 February 2016

Retry Pattern for failed Invocations

There are numerous times when invoking a delegate may fail due to an unexpected exception occurring in which we would like to automatically retry the invocation up to a certain number of retries.  This can easily be achieved using the Retry Pattern as shown below:

public static bool TryInvoke<TException, T>(Func<T> del, out T value, out IList<Exception> exceptions, int attempts = 3, IInterval interval = null)
 where TException : Exception
{
 if (del == null) throw new ArgumentNullException("del");
 if (attempts <= 0) throw new ArgumentOutOfRangeException("attempts");

 exceptions = new List<Exception>();

 for (int i = 0; i < attempts; i++)
 {
  try
  {
   value = del();
   return true;
  }
  catch (TException ex)
  {
   exceptions.Add(ex);
   if (interval != null)
    interval.Sleep();
  }
 }

 value = default(T);
 return false;
}

The above generic implementation of the Retry Pattern also follows the structure of the Try-Parse Pattern in that if the invocation of the delegate fails on all of it's attempts then the "out" parameter for the return value of the delegate is set to it's default value and the method returns false.  Conversely, if the delegate succeeds then the return value of the delegate is assigned to the "out" parameter and returns true - this means we can invoke delegates as follows:

int value;
Func<int> calculateValue = null;
if (Invoker.TryInvoke<int>(calculateValue, out value))
{
 //print value
 Console.WriteLine("Value: {0}", value);
}
else
{
 //show error message to user
 Console.WriteLine("Error calculating value after X attempts");
}

It is also possible for us to gracefully retry on specific types of exceptions and throw exceptions on others.  For example, I may expect a failed attempt whilst downloading some data from an external source, therefore I may want to retry ONLY when a DownloadException is thrown, all other exceptions should cause an unhandled exception to be thrown:

string data;
Func<string> getData = null;
IList<Exception> exceptions;
try
{
 if (Invoker.TryInvoke<DownloadException, string>(getData, out data, out exceptions))
 {
  //succeeded
 }
 else
 {
  //gracefully handle exceptions
 }
}
catch (Exception ex)
{
 //unexpected exception
}

We may also want to run this logic asynchronously, for example when a user clicks a button on a GUI and requests to download some data from an external source:

Func<string> downloadData = () =>
{
 return new Downloader().Download();
};
var task = await Invoker.TryInvokeAsync<string>(downloadData);
if (task.Success)
{
 //invocation passed
 Console.WriteLine(task.Value);
}

Note how the syntax of the TryInvokeAsync method has changed slightly in that it no longer contains an out parameter for the assigned value - async methods are unable to support out parameters.

The complete source code of the Retry Invoker can be found on GitHub.

Wednesday, 12 August 2015

Enumerable Timer with Stopwatch using an Extension Method

There have been numerous times when I’ve wanted to run a deferred query using either Linq-to-SQL (IQueryable<T>) or Linq-to-Objects (IEnumerable<T>) to capture the execution time for analysis.  Doing this for Linq-to-Objects is relatively easy using an extension method:

 public static IEnumerable<T> WithTimer<T>(this IEnumerable<T> source, Stopwatch watch)
 {
#if TRACE
  watch.Reset();
  watch.Start();
  foreach (var item in source)
   yield return item;
  watch.Stop();
#else
  return source;
#endif
 }

We simply pass in a StopWatch which is started and stopped pre- and post- enumeration respectively.  This allows us to get the total elapsed time outside of the enumeration as follows:

 Stopwatch watch = new Stopwatch();
 Enumerable.Range(0, 1000000).Where(n => n % 2 == 0).WithTimer(watch).ToList();
 Console.WriteLine(watch.ElapsedMilliseconds);

However, doing this for Linq-to-SQL is marginally more difficult as YIELD RETURN isn't supported for IQueryable<T> as it isn't an iterator interface type.  We therefore have to create our own Queryable wrapper (TimerQueryable<T>) for the IQueryable source and then wrap the enumerator which is returned by GetEnumerator into a TimerEnumerator<T> to control starting and stopping the timer.

    public class TimerQueryable<T> : IQueryable<T>
    {
        readonly IQueryable<T> _queryable;
        readonly IQueryProvider _provider;
        readonly Stopwatch _watch;

        public TimerQueryable(IQueryable<T> queryable, Stopwatch watch)
        {
            _watch = watch;
            _queryable = queryable;
            _provider = queryable.Provider;
        }

        public IEnumerator<T> GetEnumerator()
        {
            return new TimerEnumerator<T>(this._provider.CreateQuery<T>(Expression).GetEnumerator(), _watch);
        }

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return this.GetEnumerator();
        }

        public Type ElementType
        {
            get { return _queryable.ElementType; }
        }

        public System.Linq.Expressions.Expression Expression
        {
            get { return _queryable.Expression; }
        }

        public IQueryProvider Provider
        {
            get { return _queryable.Provider; }
        }
    }

    public class TimerEnumerator<T> : IEnumerator<T>
    {
        IEnumerator<T> _enumerator;
        Stopwatch _watch;

        public TimerEnumerator(IEnumerator<T> enumerator, Stopwatch watch)
        {
            _enumerator = enumerator;
            _watch = watch;
        }

        public T Current
        {
            get { return _enumerator.Current; }
        }

        public void Dispose()
        {
            _watch.Stop();
            _enumerator.Dispose();
        }

        object System.Collections.IEnumerator.Current
        {
            get { return this.Current; }
        }

        public bool MoveNext()
        {
            if (!_watch.IsRunning)
            {
                _watch.Reset();
                _watch.Start();
            }
            return this._enumerator.MoveNext();
        }

        public void Reset()
        {
            this._enumerator.Reset();
        }
    }
The StopWatch in the TimerEnumerator<T> class is started upon the enumeration in MoveNext and stopped upon the enumerator being disposed of in Dispose.  The timer isn't disposed of as we'd like to get the total elapsed time outside of the enumerator.  As IEnumerator<T> derives from IDisposable it would be fair to assume that the enumerator will be disposed correctly if the developer is adhering to the IDisposable pattern.

As there is only one instance of the StopWatch created and passed into multiple TimerEnumerator<T> instances it would be useful to mention that this isn't inherently thread-safe, but then again, do you know of an IQueryable or IEnumerable source which is?  Trying to execute two queries at the same time against an Entity Framework context in different threads would most certainly cause exceptions to be thrown.

Additionally, as we don't want this overhead in production code I have used the TRACE constant so that the timers will only work with the TRACE constant defined in the project - it doesn't make sense to run these timers when efficiency is paramount.

The full source code of the Enumerable Timer can be found on GitHub.

Wednesday, 10 June 2015

Return Await vs Returning Task

I'm a true believe in trying to keep code as simple as possible by minimising the amount I have to write whilst not compromising functionality and readability.  I ran into a problem the other day when using the async/await pattern along with the Task Parallel Library (TPL).  Observe the following code snippet:

Task<int> CalculateValue()
{
    using (var calc = new Calculator())
    {
        return calc.Calculate();
    }
}

First thing to mention is I like to conform to standards - if my class is directly using unmanaged resources or any indirect unmanaged resource through another managed class then I like to implement the IDisposable pattern to ensure there are no memory leaks throughout the lifetime of the application.

So back to the problem at hand - the Calculate method on the Calculator class returns a Task<int> as this is a long running calculation which is done in another thread.  All is good except for the crucial part that upon the Calculate method being called the execution is returned back to the CalculateValue method whereby the Dispose method on the Calculator class is called long before the Calculate method completing.  The reason is simply because I didn't decorate the calc.Calculate call with an await and the CalculateValue method with an async keyword.

To resolve this problem I should change to the following:

async Task<int> CalculateValue()
{
    using (var calc = new Calculator())
    {
        return await calc.Calculate();
    }
}

So once the execution of the Task returned by the Calculate method is complete the Dispose method on the Calculator is subsequently called.

The reason I didn't implement it like this in the first place is because I don't like to add complexity to where I believe it isn't required.  Adding the await keyword on the calc.Calculate call would imply it has code to run after the call is complete - in my opinion it didn't have anything further to call apart from correctly disposing of the calculator which I easily missed.  Implementing the call this way would have been perfectly fine (no async and await keywords necessary):

Calculator _calc = new Calculator()
Task<int> CalculateValue()
{
    return calc.Calculate();
}

As I inject most of my dependencies using a DI framework it has been a rare circumstance where I've ran into this problem.

Saturday, 10 January 2015

EF Mapping with Discriminator Column and Concrete Tables for Derived Types

A problem I recently ran into with Entity Framework 6 was where I wanted to use two features at the same time, specifically the Table-Per-Hierarchy mapping and Table Splitting. However, when using both of then at the same time it caused a conflict with the framework which it was unable to handle.

The following database schema was used for mapping to my domain model where I wanted the Person.PersonType column as a discriminator for my derived types.



My domain model was as follows (PersonType property has been omitted as discriminators cannot be mapped to properties):

class Person
{
 public int PersonId { get; set; }
 public string Name { get; set; }
}

class Customer : Person
{
 public string CustomerNumber { get; set; }}

class Operator : Person
{
 public string OperatorNumber { get; set; }
}
The model mapping I had in place was as follows:
  modelBuilder
    .Entity<Person>()
    .HasKey(n => n.PersonId)
    .Map(map =>
    {
        map.Properties(p => new { p.Name });
        map.ToTable("dbo.Person");
    })
    .Map<Customer>(map =>
    {
        map.Requires("PersonType").HasValue("C");
        map.Properties(p => new { p.CustomerNumber });
        map.ToTable("dbo.Customer");
    });
Which caused the following SQL exception to be thrown:

EntityCommandExecutionException
An error occurred while executing the command definition. See the inner exception for details
Invalid column name 'PersonType'.Invalid column name 'PersonType'.

The SQL which was generated by the entity framework was as follows:

SELECT 
    CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN '0X' WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN '0X0X' ELSE '0X1X' END AS [C1], 
    [Extent1].[PersonId] AS [PersonId], 
    [Extent1].[Name] AS [Name], 
    CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[CustomerNumber] END AS [C2], 
    CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) ELSE [Project2].[OperatorNumber] END AS [C3]
    FROM   [dbo].[Person] AS [Extent1]
    LEFT OUTER JOIN  (SELECT 
        [Extent2].[PersonId] AS [PersonId], 
        [Extent2].[CustomerNumber] AS [CustomerNumber], 
        cast(1 as bit) AS [C1]
        FROM [dbo].[Customer] AS [Extent2]
        WHERE [Extent2].[PersonType] = N'C' ) AS [Project1] ON [Extent1].[PersonId] = [Project1].[PersonId]
    LEFT OUTER JOIN  (SELECT 
        [Extent3].[PersonId] AS [PersonId], 
        [Extent3].[OperatorNumber] AS [OperatorNumber], 
        cast(1 as bit) AS [C1]
        FROM [dbo].[Operator] AS [Extent3]
        WHERE [Extent3].[PersonType] = N'O' ) AS [Project2] ON [Extent1].[PersonId] = [Project2].[PersonId]

As you can see from the generated SQL it is trying to use the discriminator column on our satellite tables (dbo.Operator and dbo.Customer) rather than on our base table (dbo.Person).  As of yet I haven’t found a way to change the way in which the entity framework mapping works for the discriminator column to achieve both TPH mappings and Table Splitting.  To get around this problem I had to create a view on top of all of the table schemas involved in the hierarchy mapping:

create view dbo.vw_PersonExtended
as

    select
        p.Name, p.PersonId, p.PersonType, c.CustomerNumber, o.OperatorNumber
    from
        dbo.Person p
        left join dbo.Customer c on c.PersonId=p.PersonId
  left join dbo.Operator o on c.PersonId=o.PersonId
And mapping this view to the base class type Person and removing the derived class table mapping as follows:
   modelBuilder
    .Entity<Person>()
    .HasKey(n => n.PersonId)
    .Map(map =>
    {
        map.Properties(p => new { p.Name });
        map.ToTable("dbo.vw_PersonExtended");
    })
    .Map<Customer>(map =>
    {
        map.Requires("PersonType").HasValue("C");
        map.Properties(p => new { p.CustomerNumber });
    });

The derived types Person and Operator are no longer mapped to their individual concrete tables, but instead their values come from the mapping in the view.

Additionally, we have to create either an INSTEAD OF INSERT trigger on top of this view or create a stored procedure to allow our DB context to support persisting new entities.

The stored procedure would be:

create procedure dbo.usp_InsertCustomer
    @Name varchar(50),
    @CustomerNumber varchar(50)
as
begin

        set nocount on
        declare @id int

        insert into dbo.Person (Name, PersonType)
        values (@Name, 'C')
        set @id = scope_identity()

        insert into dbo.Customer (PersonId, CustomerNumber)
        values (@id, @CustomerNumber)

        select @id as PersonId

end

And mapped as follows:

modelBuilder
        .Entity<Customer>()
        .MapToStoredProcedures(map => map.Insert(i => i.HasName("usp_InsertCustomer")));

The obvious drawback to this approach is that you have to create views and stored procedures on the database side when you want to implement the discriminator hierarchy pattern in Entity Framework as well as the plumbing work in the database context mapping. This however shouldn't impede the performance of the implementation, it'll just make the implementation more time consuming.

Wednesday, 17 December 2014

SQL Scalar Value Function Inefficiencies

Are scalar value functions (SVF) really the best place to centralise the logic in your DB?

Although I’m against it many developers place business logic regardless of complexity into the database through the use of inline case statements where this could be a simple calculation to calculate a shipping cost for example. As the database grows and the number of SQL queries increases where they have a dependency on this calculation it becomes good practice to centralise the code for reuse rather than duplicating the logic in numerous queries.

One of the options for centralising this logic is through the use of scalar value functions where these calls are placed inline in the SQL statement, either in the SELECT or as a predicate in the WHERE clause. Using these inside a query actually impedes the performance as the SVF call is executed in a different context to the main query, thus causing an additional cost to the analyser.

If we take a simple Products data table with 100k records as an example (this simply generates some random numbers for the properties of a product):

set nocount on

create table dbo.temp_Products
(
 [ProductID] int,
 [Type] tinyint,
 [Price] float,
 [Weight] float 
)

declare @rowcnt int = 0
while (@rowcnt <= 100000)
begin
 insert into dbo.temp_Products
 select @rowcnt, 1+rand()*4, 1+rand()*100, 1+rand()*10

 set @rowcnt = @rowcnt + 1
end

And a scalar value function which does a simple logical calculation for calculating a shipping cost:

create function dbo.usvf_CalculateShipping
(
    @PricePerKG float = 1,
    @Type tinyint,
    @WeightInKG float = 1
)
returns float
as
begin

    return  /*get the appropriate factor to apply*/
            case
                when @Type = 1 then 0.1
                when @Type = 2 then 0.2
                when @Type = 3 then 0.35
                when @Type = 4 then 0.43
            end * @PricePerKG * @WeightInKG

end
go

When we call the SVF inside a query the resulting performance is impaired relative to calling the logic inline. To compare the differences between the two take the following two SQL statements:

select ProductID, case
                when [Type] = 1 then 0.1
                when [Type] = 2 then 0.2
                when [Type] = 3 then 0.35
                when [Type] = 4 then 0.43
            end *Price*[Weight] from temp_Products

select ProductID, dbo.usvf_CalculateShipping(Price, [Type], [Weight]) from temp_Products

The first query which runs the logic inline produces a result in a quickier time in comparison to the second query. The results are as follows:

(100001 row(s) affected)
Table 'temp_Products'. Scan count 1, logical reads 390, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 212 ms.

(100001 row(s) affected)
Table 'temp_Products'. Scan count 1, logical reads 390, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 842 ms,  elapsed time = 938 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

To get around this the solution would be to use an inline table-valued function (TVF) where the logic resides, this logic would then be injected into the plan of the query calling this function, affectively placing the logic inline.

create function dbo.utvf_CalculateShipping
(
    @PricePerKG float = 1,
    @Type tinyint,
    @WeightInKG float = 1
)
returns table as return 
select case
         when @Type = 1 then 0.1
         when @Type = 2 then 0.2
         when @Type = 3 then 0.35
         when @Type = 4 then 0.43
       end * @PricePerKG * @WeightInKG as Shipping
go

If I run the following two queries you’ll find the performance of both are almost identical:

select ProductID, case
                when [Type] = 1 then 0.1
                when [Type] = 2 then 0.2
                when [Type] = 3 then 0.35
                when [Type] = 4 then 0.43
            end *Price*[Weight] from temp_Products
select ProductID, tt.Shipping from temp_Products t cross apply dbo.utvf_CalculateShipping(Price, Type, Weight) tt



(100001 row(s) affected)
Table 'temp_Products'. Scan count 1, logical reads 390, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 109 ms,  elapsed time = 310 ms.

(100001 row(s) affected)
Table 'temp_Products'. Scan count 1, logical reads 390, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 109 ms,  elapsed time = 285 ms.

Monday, 13 October 2014

Thread-Safe Singleton Construction

Using the Singleton design pattern it is a common technique to ensure only one instance of a class exists during runtime:

class NonThreadSafeSingleton
{
 static NonThreadSafeSingleton instance;

 public static NonThreadSafeSingleton Instance
 {
  get
  {
   if (instance == null)
    instance = new NonThreadSafeSingleton();
   return instance;
  }
 }
}

The above implementation of the Singleton pattern may work, but it doesn't make it safe to use between multiple threads.  If multiple threads were to access the above Singleton implementation we may experience the following sequence as executed by the runtime:

Order Thread 1 Thread 2
1 if (instance == null)
2 if (instance == null)
3 instance = new NonThreadSafeSingleton();
4 instance = new NonThreadSafeSingleton();
5 return instance;
6 return instance;

As you can see in the order of execution the NonThreadSafeSingleton class is instantiated twice due to a race-condition.  When in reality our desired execution is as follows:

Order Thread 1 Thread 2
1 if (instance == null)
2 instance = new NonThreadSafeSingleton();
3 return instance;
4 return instance;

If we're going to make this thread-safe one approach in .NET 2.0 would be to use an exclusive lock:

class ThreadSafeSingletonWithLock
{
 static ThreadSafeSingletonWithLock instance;
 static object locker = new Object();

 public static ThreadSafeSingletonWithLock Instance
 {
  get
  {
   lock (locker)
   {
    if (instance == null)
     instance = new ThreadSafeSingletonWithLock();
   }
   return instance;
  }
 }
}

But looking at the sequence of execution we now run into the problem of a lock always being acquired even if the instance is already constructed (executions 1 and 6 listed below):

Order Thread 1 Thread 2
1 lock (locker)
2 if (instance == null)
3 instance = new ThreadSafeSingletonWithLock();
4 return instance;
5 lock (locker)
6 return instance;

This causes an additional overhead which isn't desired.  It is actually possible to achieve the same requirement of synchronising the object construction whilst minimising the amount of locking required:

class ThreadSafeSingletonWithDoubleCheckLock
{
 static ThreadSafeSingletonWithDoubleCheckLock instance;
 static object locker = new Object();

 public static ThreadSafeSingletonWithDoubleCheckLock Instance
 {
  get
  {
   if (instance == null)
   {
    lock (locker)
    {
     if (instance == null)
      instance = new ThreadSafeSingletonWithDoubleCheckLock();
    }
   }
   return instance;
  }
 }
}

It is still possible for two threads to "lock" during object construction, although if  the object construction is quick the likelihood of this condition occurring is minimised.  Any additional instance retrievals after instantiation are done outside of the lock providing an improvement during runtime.  Although the above implementation of the thread-safe Singleton pattern is better than the prior implementation, it kind of feels a bit messy having to implement the condition to check if the instance is equal to null two times, an inexperienced developer may think this is unnecessary and run the risk of the line of code being removed.

An even better implementation is provided in .NET 4.0 BCL with Lazy<T>.  This allows a single instance of a class to be instantiated upon request in a thread-safe way.   An example of it's usage is shown below:

class ThreadSafeLazySingleton
{
 readonly static Lazy<ThreadSafeLazySingleton> instance = new Lazy<ThreadSafeLazySingleton>(() => new ThreadSafeLazySingleton());

 private ThreadSafeLazySingleton(){}

 public static ThreadSafeLazySingleton Instance
 {
  get { return instance.Value; }
 }
}

The Lazy<T> overloaded constructors provide performance options during construction.  For example, if the instance isn't going to be used between multiple threads then it is possible to define the Lazy instance as non-thread safe to ensure the highest possible performance.

By using the Lazy class we achieve the following:
  1. Encapsulating the locking mechanism: As the synchronisation of instantiation is internal to the Lazy<T> class any future optimisations done by the .NET team should not effect the usage of this class unless they intend of implementing a breaking-change;
  2. Allows the singleton instance to be used in a thread-safe environment;
  3. Gracefully encapsulates the double-check locking inside the class without having the developer to implement this themselves
The private constructor in the examples above have been omitted for brevity purposes.