These two gems are great, but they have some drawbacks that were unavoidable during the time of creation. Both of these gems rely on an RDBMS database that generally looks like this:
Tagging, RDBMS style
ActAsTaggable in all of its generations - was based on this model schema:
Tagstable that held the information about a specific tag (Basically, only the tag name)
Taggingstable, that held polymorphic associations references to the tagged instance (
taggable) and the tagger instace (
So basically, when you wanted to get a tag list for some kind of a taggable instance or to see all the tags a tagger had made, you’d have to
JOIN those 2 tables together. always.
Now, joining isn’t really bad - it is there for a reason - but it could be one of some serious issues arising from this schema in certain circumstances.
1. JOINing tables from different servers
What happens when you have 10M tags and 40M taggings? your MySQL / Postgres / You-name-it-db needs some kind of an extended server setup that includes more than one instance of your db server, and if you are splitting the data - you might want to split your data and JOIN between 2 database servers.
Yes, it is possible, MySQL supports the Federated Storage Engine that allows you to join and share query information between 2 or more servers, MSSQL has the linked-servers feature that is very similar to that and some of the other databases have it. The problem with this feature is that is far from being easy and simple to setup or maintain so by default if you are have a lot of tags or tagging and you want to add some sharding to the party, you are in a jiffy.
2. Indexing polymorphic association columns
Although these gems provide the necessery indices as part of the migration generator template, the fact that polymorphic association in Rails is composed out of a string (taggable_type) and an integer (taggable_id) is making the index’s diversity ratio rather low - meaning there are too many similar grouped entries in the index.
Back to the 10M tags in the table example. Providing an autocomplete engine for this size of a table is horrific. You’ll have to use some kind of a full text engine like Solr or ElasticSearch to provide matching tags in real time.
4. Uniquness of tag name
How do you know if you create a newly provided tag or if you need to add tagging to an existing one? you first have to find if the tag exists already. 10M tags? good luck. Again, a full text search engine will provide a decent solution to this problem.
The solution: Redis
I love Redis. When it fits, it sits. If you are using Redis superpowers when you need to use them - it is an awesome tool. Redis provides several value types, each of them has it’s own superpower aimed for a specific problem -
SET being the one we chose.
Storing tags in Redis for easy access
Redis Sets are basically arrays with unique members, for the following example we will use
User as the tagger class, and
Photo as the tagged class. Noticed there aren’t
Tagging classes? We don’t need them anymore.
When User with ID 10 is tagging the Photo with ID 9 with the tag “Dog” we simply create a bunch of Redis sets that will allow easy access to any slice of data we might need:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Now we can have simple accessors to this information, for example:
1 2 3 4 5 6 7
or for the
1 2 3 4 5
Generally, this is just an outline with a single rule - Normalize your data - instead of doing complicated join queries use a simple namespaced key value access to your data.
Ok, no joins. what about autocomplete?
Autocomplete is a PITA, but by using redis - We can maintain a list of your tag prefixes as keys to tags lists, for example the tag “liverpool” will be broken in to smaller pieces:
1 2 3 4 5 6 7
This breakdown will allow us to easily access the list of tags (3 letters and up):
1 2 3 4 5 6 7 8
Redis can provide an intersection between 2 sets, meaning you can “merge” between 2 sets and find either the indentical or different elements in both sets.
For example - if we would like to know which photos are tagged by both “dog” and “cat” will intersect those 2 sets.
Again, this is just an outline. There are many improvements to be added but we at ShinobiDevs are working on releasing a gem that could do just that - ideas are welcome. Redis is a powerfull tool, there is probably no need to store the tagged data in an RDBMS structure but to find a better one maybe just like the one suggested above.