Monday, April 28, 2014

Competitors testing your product, testament for your product strength

Accidentally while investigating a file upload issue from MixPanel saw an email from a user who is a developer at #1 player in consumer cloud storage market and a direct competitor to us. As usual I was curious so I ran canary query for emails ending with #1 player in enterprise cloud storage market and wow I did found that they had registered several trial accounts.

As usual the product team wants to disable the accounts but how can you stop anyone from creating a trial, they can register any fake email account and test the product so its a cat and mouse game.

But good thing is that this is testament to our product strength

Keeps reminding me of Google trying to invent Buzz to copy Twitter or Google+ to copy Facebook. Even Facebook tried to copy whatsapp before acquiring them.

Thursday, April 24, 2014

High Scalability:Denomalizing data for Billions of files for scaling snapshot times

We store metadata about billions of files in Mysql shards and each shard has Folder,File and version table. The schema looked like  this.

Now each file can have n versions and some customers want infinite versions. Problem with scaling snapshot query is that the file system snapshot required information about latest version only. To get latest version you need to join folder, file, version table and discard older rows. I had described the consistent scaling challenge and the improvements we had done to improve snapshot times for our cloud file system in

The latest improvement we did is to denormalize the information about latest entry on to file table itself.

Now to generate a snapshot we just need to join Folder and File table. And the improvements are huge. We use box ananometer to log slow queries over all databases and the second query is normalized query and third query is denormalized query. Across all databases we are tracking the time has reduced 3 times  (486 sec to 159 sec), also no of rows examined and rows sent has reduced by half.

Here is the normalized query graph

Here is the denormalized query graph
Now denormalizing billions of rows have unique challenges and you need to do it without impacting customers. We spread billions of rows across 28 master and there are 28 slaves for these masters. 

To do performance testing we took the biggest database with customers having 27M rows and imported it in a test environment and migrated it.

27M versions snapshot times before denormalization = 1.2 hours
27M versions snapshot times from denormalized tables = 6.5 minutes constant with or without caching

Off course as we are doubling the data we need to optimize the database tables after denormalization else we were running into row chaining problem.

For production go live as usual we started with feature flags and we added 2 flags latest_entry_migrated and latest_entry_active field on customer model. Then we wrote code that on basis of latest_entry_active flag would execute normalized or denormalized query. Once the code was live in all services then we began migration for few workgroups to test for any bugs. We migrated each data centre every weekend and within a month we had all databases upgraded with denormalized rows and snapshot queries have even gone from slow query logs from many databases.

One sideeffect of this denormalization as this opens up gateway for us to implement infinite versions because now we can sub shard versions on different tables and even in different databases.

Monday, April 21, 2014

Iphone vs Android switch to new phone experience

I signed up 2 year contact and got new phones today for me and my wife. My wife upgraded her Samsung Infuse to Samsung Galaxy S4 and I upgraded my iphone 3gs to iphone5.  Off course the first thing you want to do is dont have any change in the life and just upgrade the hardware.

Well that is exactly I got from Iphone. Apple really nailed this thing. All I did was went to Icloud and did a backup which was off course incremental so took 2-3 minutes. Then I setup my new iphone as a phone with existing icloud backup. All I needed was to provide my Apple Id 2-3 times during restarts/restore and within 15 min I was up and running and ditched my old phone for good.

Also AT&T activation was a piece of cake never seen someone porting numbers between services within 5 min. Best part is I didn’t needed to interact with any human at all.

Now I needed to repeat it for android and it was mess.  God knows what was wrong but I didn’t found a similar backup/restore on android, there all these 3rd party apps that will do the backup restore but I don’t trust any of them.  So I compromised and said to my wife that I would import your contacts and rest we can just do via usb cable which is mostly photos. And importing contacts is a mess. it seems I or she had turned on syncing new contacts to gmail contacts when they were added or may be I imported it from sim card 2 years back but now the android wants me to save it one by one which is a mess as she had 700+ contacts (many duplicate as her contact list was a mix of facebook + gmail contacts and most of them without phone numbers).

Anyway long story short I had just 1 hour as I was tired so I just synched up my gmail phone contacts on her by adding my gmail account to galaxy S4 in addition to hers, we share 80% of our contacts so would at-least get her going for tomorrow and rest she or I can input manually if needed.

But I got to say just because of this feature I may never go back to Android.

Thursday, April 17, 2014

GPL for seeds

Heard an interesting story about how some university seed breeders borrowed the idea of opensource from computer industry to seeds.

Plant Breeders Release First 'Open Source Seeds'

In US and world farming is leading more towards monocultures  and we need more diversity. Not sure how much this open source seeds initiative will work but I wish it becomes as successful as open source software. The good thing is that it works like GPL where if you use any of these seeds to produce a new seed then you cant patent it and you have to open source that also, what a brilliant idea :).  As per wikipedia

David A. Wheeler argues that the copyleft provided by the GPL was crucial to the success of Linux-based systems, giving the programmers who contributed to the kernel the assurance that their work would benefit the whole world and remain free, rather than being exploited by software companies that would not have to give anything back to the community.[11]


I wish this open source seeds project get same success as linux.

Wednesday, April 16, 2014

HBO show about silicon valley

saw the first episode of the series. Sounds interesting but felt more like only few % of this is really true.

Spending more time on Quora/Techcrunch/VentureBeat than Netflix

It seems Netflix content has become boring. I was the cord cutter as soon as my son was born, for first 2 years I didnt had TV and for local news I had an indoor antenna

Image copied from amazon site

Now a days it seems the only person watching netflix in house is my son. My wife watches mostly indian soap operas posted on youtube.

As for me for past 3-4 months the time spent on netflix has died down drastically in evenings and I am spending more time on Quora/Techcrunch/VentureBeat. It seems netflix content has become boring or I have seen most of it.  I am a big fan of chinese war dramas  and lately only once in a blue moon a new one would arrive.

I got the same impression from other friends that netflix content has started to fade and even hulu or redbox has same issue. Only time will tell that are we entering into an era where people are moving away from TV content or am I just an anomaly?