Recently we’ve dropped mongodb as our primary nosq database. We’re either switching to Riak or Cassandra. (Already using Hbase)

Don’t get me wrong Mongodb is great. But as every nosql’er would tell, it depends on your use case. Here are the reasons why.

  1. Lousy write performance: MongoDB is very slow in writing. Esspecially if you start using sharding. Replicas tend to get easily out of sync. The storage implementation enforces a write-lock. That’s just how the way it is under serious write load. At the beginning we didn’t anticipate the rate of writes we’re experiencing right now.
  2. Querybility: At first we did not know how we would query our data. That was the reason why we chose mongodb. But it turns out that without an index you cannot query at all when your records reaches to a 100s of millions.
  3. Lack of compression: This is ridiculus. Mongodb does not have compression. All your keynames take space in the storage. consider a JSON document with 100 numeric fields and with fields names around 40 characters. That means mongodb is storing (40 x 100 characters) 4K plus 100 integers for every record. Also if you happen to have long textual content, they don’t get compressed too. I’m sure our data will take 1/3 the space it takes in mongodb in any other  storage.
  4. Memory monster: Mongodb claims the whole memory in the host it’s running. Plain as in daylight. Nothing to do about that.
  5. Lack of tools: I know this will cause some controversy but there are no good clients. Even the command line does not permit you to be productive.
  6. Undependable tools: Trying to export a large collection and all of a sudden you get an error. Only an error. No decent explanation, no pointer for even manually resuming. Also call me if you can export a collection by filering date values.
  7. Huge storage requirements for operations: Theory might sound great. But when it comes to reality you face the ugly side of the beast. There are times and cases where you’d need to restructure your data, dump and repair etc. With mongodb you need twice the size of your data. More specifically the size of your collection. We had a collection of 4 TB and it took weeks to repair after an electric outage. Replicas? Well they were just out of sync.
  8. Operations: Ask this to my sys admin but he didn’t sleep for weeks. Be careful about that.

All the things I’ve written about is valid  for our usecases. To be specific 8TBs of data with around 750 millions of records, with burst data IO  and 1/1000 read/write ratio. Simply put if you’ll have high number of writes and reads on the same data, think twice before using Mongodb.