Syed Humza Shah

Logo

Engineering leader. Likes coffee. Loves to travel. Runs on a combination of optimism and pragmatism.

Read more about him here.

A Problem to Brush Up Your ActiveRecord & SQL Skills

Some people like using Arel. I don’t. I stick with ActiveRecord + some SQL when I’m querying the database from Rails.

We were having this discussion at the office today and I reminisced about an interesting problem somebody had shared on StackOverflow recently. I thought I’d share it here. It can be a fun interview question, if you’re into torturing the candidate.

I’ll lay out the problem as clearly as I can:

Conditions:

Solution:

First, I jumped at the following solution:

# ruby code

report_ids = [1, 2]

users = User.
          joins(:reports).
          where(reports: {id: report_ids}).
          group("users.id")

But that’s wrong. Suppose user1 could have report 1, 2, and 3; he will fulfill conditions in the code above but fail our requirements which are: user should have either only one or only two reports (with our specific report IDs).

Once I realized this, my next solution was this:

# ruby code

ignore_user_ids = User.
                    joins(:reports).
                    where("reports.id NOT IN (?)", report_ids).
                    pluck("DISTINCT users.id")

users = User.
          joins(:reports).
          where("users.id NOT IN (?)", ignore_user_ids).
          group("users.id")

This was acceptable since the question didn’t specify any query-conditions.

But it bothered me. I wanted ONE query! WITHOUT subqueries! I wanted blood!

I exaggerate. But I did think about it some more and found a one-query solution that involved writing SQL. Here:

# ruby code

ris = report_ids.join(',')
user_columns = User.column_names.join(',')

users = \
  User.
  joins("INNER JOIN reports AS r1 ON (
    (r1.user_id = users.id) AND
    (r1.id IN (#{ris}))
  )"). # inner join our required reports
  joins("LEFT OUTER JOIN reports AS r2 ON (
    (r2.user_id = users.id) AND
    (r2.id NOT IN (#{ris}))
  )"). # outer join all reports we don't require
  select(
    # select user columns and
    # count the number of reports NOT required
    "#{user_columns}, COUNT(r2.id) AS r2c"
  ).
  where(
    # specify the condition on the reports you DON'T want
    "r2c = 0"
  ). # or having("r2c = 0").
  group("users.id")

Now is it wise to actually use that solution in production code? Absolutely not! You don’t want later developers to look at that monstrosity and spend more time than necessary trying to understand what’s going on.

BUT exercises such as these can keep your SQL skills sharp - which is an area where many Rails developers (including myself) have a lot of room for improvement.

Incidentally, if there’s a better way to accomplish what I said, PLEASE let me know! I’m always on the lookout for doing things better.

Link to said StackOverflow post.