A Developer with a Pencil

It Is Time to Stop Using Acts_as_taggable

Have you ever added a tagging functionality to your Rails application? Then you have probably used either the acts_as_taggable gem, or its younger brother the acts_as_taggable_on gem.

These two gems are great, but they have some drawbacks that were unavoidable during the time of creation. Both of these gems rely on an RDBMS database that generally looks like this:

Tagging, RDBMS style

ActAsTaggable in all of its generations - was based on this model schema:

  • a Tags table that held the information about a specific tag (Basically, only the tag name)
  • a Taggings table, that held polymorphic associations references to the tagged instance (taggable) and the tagger instace (tagger).

So basically, when you wanted to get a tag list for some kind of a taggable instance or to see all the tags a tagger had made, you’d have to JOIN those 2 tables together. always.

Now, joining isn’t really bad - it is there for a reason - but it could be one of some serious issues arising from this schema in certain circumstances.

1. JOINing tables from different servers

What happens when you have 10M tags and 40M taggings? your MySQL / Postgres / You-name-it-db needs some kind of an extended server setup that includes more than one instance of your db server, and if you are splitting the data - you might want to split your data and JOIN between 2 database servers.

Yes, it is possible, MySQL supports the Federated Storage Engine that allows you to join and share query information between 2 or more servers, MSSQL has the linked-servers feature that is very similar to that and some of the other databases have it. The problem with this feature is that is far from being easy and simple to setup or maintain so by default if you are have a lot of tags or tagging and you want to add some sharding to the party, you are in a jiffy.

2. Indexing polymorphic association columns

Although these gems provide the necessery indices as part of the migration generator template, the fact that polymorphic association in Rails is composed out of a string (taggable_type) and an integer (taggable_id) is making the index’s diversity ratio rather low - meaning there are too many similar grouped entries in the index.

3. Autocomplete

Back to the 10M tags in the table example. Providing an autocomplete engine for this size of a table is horrific. You’ll have to use some kind of a full text engine like Solr or ElasticSearch to provide matching tags in real time.

4. Uniquness of tag name

How do you know if you create a newly provided tag or if you need to add tagging to an existing one? you first have to find if the tag exists already. 10M tags? good luck. Again, a full text search engine will provide a decent solution to this problem.

The solution: Redis

I love Redis. When it fits, it sits. If you are using Redis superpowers when you need to use them - it is an awesome tool. Redis provides several value types, each of them has it’s own superpower aimed for a specific problem - SET being the one we chose.

Storing tags in Redis for easy access

Redis Sets are basically arrays with unique members, for the following example we will use User as the tagger class, and Photo as the tagged class. Noticed there aren’t Tag or Tagging classes? We don’t need them anymore.

When User with ID 10 is tagging the Photo with ID 9 with the tag “Dog” we simply create a bunch of Redis sets that will allow easy access to any slice of data we might need:

Storing tags in redis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Add specific tagging of photo by a user
$redis.sadd "user:10:photo:9:tags", "dog"

# Add to photo specific tag set
$redis.sadd "photo:9:tags", "dog"

# Add a list of tagged photos to a tag set
$redis.sadd "tag:dog:photos", 9

# a list of photos tagged by a specific user
$redis.sadd "user:10:tagged_photos", 9

# Increase the usage counter for the "dog" tag

$redis.inc "tagged_by:dog"

Now we can have simple accessors to this information, for example:

photo.rb
1
2
3
4
5
6
7
class Photo

  # Get tags
  def tags
    $redis.smembers "photos:#{self.id}:tags"
  end
end

or for the Tag class:

tag.rb
1
2
3
4
5
class Tag
  def tagged_photo_ids
    $redis.smembers "tag:#{self.name}:photos"
  end
end

Generally, this is just an outline with a single rule - Normalize your data - instead of doing complicated join queries use a simple namespaced key value access to your data.

Ok, no joins. what about autocomplete?

Autocomplete is a PITA, but by using redis - We can maintain a list of your tag prefixes as keys to tags lists, for example the tag “liverpool” will be broken in to smaller pieces:

autocomplete.rb
1
2
3
4
5
6
7
$redis.add "tags:start_with:liv", "liverpool"
$redis.add "tags:start_with:live", "liverpool"
$redis.add "tags:start_with:liver", "liverpool"
$redis.add "tags:start_with:liverp", "liverpool"
$redis.add "tags:start_with:liverpo", "liverpool"
$redis.add "tags:start_with:liverpoo", "liverpool"
$redis.add "tags:start_with:liverpool", "liverpool"

This breakdown will allow us to easily access the list of tags (3 letters and up):

tag.rb
1
2
3
4
5
6
7
8
class Tag
  ...
  def Tag.tags_starting_with(tag_starts_with = "")
    $redis.smembers "tags:start_with:#{tag_starts_with}"
  end
end

Tag.tags_starting_with("liver") # => ["liver", "liverpool", "liverani",...]

Intersections!

Redis can provide an intersection between 2 sets, meaning you can “merge” between 2 sets and find either the indentical or different elements in both sets.

For example - if we would like to know which photos are tagged by both “dog” and “cat” will intersect those 2 sets.

intersection
1
$redis.sinter "tag:cat:photos", "tag:dog:photos" # => [12,93,94, ...]

Conslusion

Again, this is just an outline. There are many improvements to be added but we at ShinobiDevs are working on releasing a gem that could do just that - ideas are welcome. Redis is a powerfull tool, there is probably no need to store the tagged data in an RDBMS structure but to find a better one maybe just like the one suggested above.

Comments