The way you handle issues can set the difference between a good and a bad EntityFramework review. Like everything else, EF has qualities and flaws…

Before moving on, we need to answer an important question: what is the EntityFramework (EF)?

EF is an ORM (Object-Relational Mapping) framework created with ADO.NET data access in mind. ORM’s allow creating a virtual object database to act as a bridge and make operations to a database easier, following the structure of object-oriented languages in a way that we simulate Database objects as OOL ones.

Nine years have passed since the first release of the EntityFramework. Although the fundamentals are the same, it has evolved, allowing more features and mostly has become a lot more stable (before .NET framework 3.5, EF wasn’t the most reliable ORM to use, I admit).

However, the .NET programmers are still pretty much divided between the qualities and the flaws of the EF.

I hear a lot:

  • “EntityFramework?! I refuse to work with that trash!”
  • “EntityFramework?! That can’t be used to process everything that has more than one thousand records. It bottlenecks everything!”
  • “I’ve used it before, but it was very heavy and slow, so I didn’t like and never tried to use it again!”

And I also hear a lot:

  • “I only use EntityFramework; it’s the best thing created for DB access I’ve ever seen!”
  • “I love EF, I don’t use or need anything else to do operations/queries to the DB!”

I believe most .NET programmers fall into one of these two extreme “Love-Hate” categories.

Who’s right?

Well, if you’re an EF lover, who uses it “directly for everything”, I’m sorry to disappoint you, but I don’t think you’re right.

And you ‘hater’ thinking “Haha, I told you so, EF is no good”, sorry to disappoint you too, but you aren’t right either.

The answer is the most obvious now, there are no rights or wrongs to this question, only choices, and different uses. There are things where EF is wonderful, and others where it’s not. Not much different from any other ORM (like hibernate, for instance).

Is EF able to deal with many records?

The answer is yes – if you know how to use it, and when to use it. If you’re new to EF and not sure how it works internally, it’s best for you to do some research before falling into “EF is slow and only gives me problems”.

Let’s get to that.

Suppose we have the following tables from the ‘good old’ (and maybe not so used nowadays) ASP.NET Membership:

Let’s assume we want to get all the users who don’t log in for a month and, afterward, we want to filter even more and join another table to get those users by Username and E-mail filtered by specific mailserver “@SomeEmailServer.com”.

Take a look at the example below:

EFTest.EFTestEntities dataCtx = new EFTestEntities();

List<aspnet_Membership> usersNotLoggedInForAMonth = dataCtx.aspnet_Membership
                .Where(u => u.LastLoginDate.CompareTo(DateTime.Now.AddMonths(-1)) < 0).ToList();

var usersNLData = (from memb in usersNotLoggedInForAMonth
                               join user in dataCtx.aspnet_Users on memb.UserId equals user.UserId
                               where memb.Email.TrimEnd().EndsWith("@SomeEmailServer.com")
                               select new { user.UserName, memb.Email }
                               );

foreach (var user in usersNLData)
{
      SendEmail(user.UserName, user.Email, string.Format("Hey {0}, we noticed that you were away for a while! Please come back!", user.UserName));
}

What’s wrong with it?

List<aspnet_Membership> usersNotLoggedInForAMonth = dataCtx.aspnet_Membership
                .Where(u => u.LastLoginDate.CompareTo(DateTime.Now.AddMonths(-1)) < 0).ToList();

EF queries are converted to SQL queries and are executed on the SQL Server side. This makes it very efficient getting data because, apart from some extra work (and it’s not as heavy on resources as many people can think of), it is pretty much the same (at least for most cases) as you are doing your own queries with the almost basic-level SQLCommands. Great, isn’t it?

The most common issue usually is on the programmer’s side: firing the SQL execution on middle-query time, when it’s not supposed to get the data yet and the query results being extensive (too many records) without the programmer even thinking about it. In programming tests, it may go unnoticed (as usually there aren’t many records as in Production), but over time the application will get slower as time elapses and the record count increases.

Looking at the example above, imagine the aspnet_Membership had 1.000.000 records with users not logged in for a month, meanwhile only 100 of these had the email server you are looking for (“@SomeEmailServer.com”). You would be retrieving 999.900 useless records, and each one of them with useless data (since you only want two fields, Username, and Email), slowing down hundreds of times the query (and your application), and I’m not even mentioning the impacts on the DB server and network. Therefore, you need to be very careful with the way you treat your data and queries with EF.

 

How can we avoid slowness?

Simple! Keep your data queries as IQueryable all the time and make sure you don’t use any functions which fire data query execution before you have your query formalized with only the data you need.

IQueryable<aspnet_Membership> usersNotLoggedInForAMonth = dataCtx.aspnet_Membership
                .Where(u => u.LastLoginDate.CompareTo(DateTime.Now.AddMonths(-1))<0);

Be careful with ToList, ToArray, and even with a simple foreach. All these (and several others), if used before the time is right, will make the disaster call.

Now that we’ve checked one of the most common EF Linq-To-SQL errors, let’s check another one.

EF allows making simple, direct and quick DB queries as we’ve seen. The problem is when we tend to overdo it and don’t think much about it.

Let’s look at the quite typical example above.

Our point is to get all the “Annes” from the Users, and we want to block all the ones that have the “Admin” role.

EFTest.EFTestEntities dataCtx = new EFTestEntities();

var users = dataCtx.aspnet_Users.Where(u => u.UserName.StartsWith("Anne "));

foreach (var user in users)
{
      var userRoles = (from UserRoles in dataCtx.vw_aspnet_UsersInRoles
                                 join Roles in dataCtx.aspnet_Roles on UserRoles.RoleId equals Roles.RoleId
                                 where UserRoles.UserId == user.UserId
                                 select Roles.RoleName
                                 );

      if (userRoles.Contains("Admin"))
      {
                    var userMemb = dataCtx.aspnet_Membership.Where(m => m.UserId == user.UserId).First();
                    userMemb.IsLockedOut = true;
                    dataCtx.SaveChanges();
      }
}

What’s wrong with this one?

We have two EF queries and a SaveChanges (in this case, will be an update) operation to DB, and it’s inside a foreach. If we have 1000 “Annes”, this cycle will run 1000 times. This usually isn’t relevant in terms of processing in local application runtime. However, when we are making calls to outside services and DB operations (where there are always substantial latencies and response times), everything is different. And we make 3 DB operations to the DB in each cycle, which turns out to be 3000 operations.

If the access to the DB has a 50ms average latency, this would mean 150.000ms (or 2.5 minutes), which is extreme for such a simple task, and I’m not even considering the time SQL Server spent on executing and returning it.

 

So, what would be the solution?

Queries should be outside cycles, and the same applies to SaveChanges. The less total operations for some task usually the better and the fastest, like this example below (only 1 SaveChanges operation and only 1 query are done to the DB):

EFTest.EFTestEntities dataCtx = new EFTestEntities();

var usersMemb = (from Users in dataCtx.aspnet_Users
                         join UserMemb in dataCtx.aspnet_Membership on Users.UserId equals UserMemb.UserId
                         join UserRoles in dataCtx.vw_aspnet_UsersInRoles on Users.UserId equals UserRoles.UserId
                         join Roles in dataCtx.aspnet_Roles on UserRoles.RoleId equals Roles.RoleId
                         where Users.UserName.StartsWith("Anne ") && Roles.RoleName.Contains("Admin")
                         select UserMemb);

foreach (var user in usersMemb)
{
      user.IsLockedOut = true;
}

dataCtx.SaveChanges();

Meanwhile, EF converts this SaveChanges to several update queries and it’s still not optimal. For a massive number of records, I would recommend the creation of a Stored Procedure and call it (you can use EF for that too), like this:

using (EFTest.EFTestEntities dataCtxNC = new EFTestEntities())
{
    dataCtx.BlockAdmins("Anne");    
}

There are other ways to improve EF efficiency, such as disable change tracking, like the example above (or use AsNoTracking(), when applied just to a specific query and not to all the DBContext):

using (EFTest.EFTestEntities dataCtxNC = new EFTestEntities())
{
    dataCtx.Configuration.AutoDetectChangesEnabled = false;
}

But it is just an improvement and not a definitive solution for very big row numbers.

Conclusion

EF should be considered a great ORM framework which allows faster development, easier and quicker operations to the DB, as long as you are careful and know how it works in order to avoid certain mistakes and create performance problems.

Therefore, EF can be very useful and might help you in many tasks, but it may not be the best option for big operations on the DB, which you should leave it for the DB Server itself.

Make sure you use Stored Procedures on more extensive operations and SqlBulkCopy (contained in the namespace System.Data.SqlClient) when you need to do massive inserts from the code. Also, there are some EF extensions based on SqlBulkCopy that may help you improve your performance even using EF.

 

Written by Dinis Ferreira, Developer at Cleverti