Friday, November 29, 2013

Black Friday or Black Thursday

I remember 8 years ago when I came to US people used to wake up early to get in line for 6:00 AM or 8:00 AM on Friday after Thanksgiving.  Then 2 years ago retail stores started opening around 12:00 in night and then last year some stores opened at 10:00 but this year stores opened on 6:00 PM thanks giving day itself.

I am not an Indian and havent started celebrating Thanksgiving yet but even to me this sounds like BS as it seems Black Friday is becoming Black Thursday.

Monday, November 25, 2013

Perception of downtime

5 years ago when I joined the startup if a node went down then you get sometime to analyze it and bring it up at your pace. Now a days as the startup has grown to millions of users, today I was called in because the node was inaccessible and I asked 10 min and ops guy was like "huh this is an emergency", I was like it a big stacktrace so give me 10 min to analyze it.

Downtime perception changes when you grow :).

Friday, November 15, 2013

why people think I get better projects to do than them

I got this feedback from two of my colleagues at company. One joined and left in a year because he said the three initial engineers at the startup gets all the best projects and he joined late.  I call this BS, there is one principal engineer who joined after him and delivered a game changing product at the company and earned respect from the peers.

This week I got feedback from one other colleague that I am doing the complex and better projects hmm... I was like wth.

I do crappy projects also but may be people see is the better ones but not the crappy ones that gets delivered side by side.

Also in a startup it all depends on who took the ball and get it rolling. We recently uncovered security issues and as I was curious so I took the initiative and got the ball rolling and got it done across all products that I can.

Similarly the project of converting from BDB or Ldap 2 mysql  was known to be a done for a long time but no one was owning it and I took initiative and got the ball rolling. Now once its rolling for even top management its sometimes difficult to change gears or turn directions. Also once top management knows that you can handle bigger projects you will get approval to do more and more of those.

So I think the idea is to have  some junior resources in the team that you can design and hand over the projects. The key is to do a sound architecture review and code review. So delegate what you think you can do yourself and take on projects that are hard on you is the key to doing better projects at a startup.

Doing side project keeps you motivated and make you smart in your real job

I am already in 5th year in the company and its the usual 4 year itch. Things have not become boring in the startup but for past year I have become a sweeper. It seems we found this nirvana in mysql and I am transforming all products to store the data in a sharded mysql to solve the scalability issue in that component.  Cloud file system is a field that is yet to be cracked and we scale now to store billions of rows and we can continue to add mysql and scale but there are still interesting problems like how to scale a single customer to 100M+ files. We currently can scale to 10M+ files for live data and 25M for long lived archival data.  But again daily its boring as nothing new to learn.

So how to motivate yourself when you work from home and your work is becoming boring. Well do a side project, open source or something that you want to do but cant introduce at your company. I started reading hadoop but again I was just reading but not doing it.

Then one of my friend came up to me and said he wanted to move the build system for his side project into EC2 and can I help him. I was like why not, I did it free of charge over 1 hour daily in night for 2-3 months but in this process I learnt a lot about :

  1. Jenkins and how to spin new slaves and launch remote builds
  2. EC2
  3. how to install CentOS on EC2 and use yum
  4. Nginx
  5. Monit
So doing side project keeps you on toes and in turn makes you smart because now I can see how can you make things better at current company also. But in our case looks like our startup was already better at these components but hey now when I am debugging a monit or nginx issues I can figure things on my own rather than being dependent on someone. So in  turn it makes me fast and smart at my day job. Also I know little bit about EC2 and CentOS.

Thursday, November 7, 2013

install ubuntu on dell lattitude 6430 and UEFI mode

I got this dell laptop from my employer and it had windows7 on it. I installed ubuntu on it and with windows side by side and it wont show ubuntu option during boot. I tried installing multiple times, ultimately I thought I would wipe out windows and install ubuntu. I did that and suddenly I got this message "Invalid partition table" I was like wtf happened.

I tried booting many times and same option, I thought I would press F12 and boot, and luckily i saw a weird option ubuntu under UEFI so I gave it a chance and it booted up. Now daily I had to press F12 to boot ubuntu and then after getting tired of this I remembered I had seen this UEFI somewhere in BIOS settings.  I rebooted and pressed F2 and I saw under Boot sequence there was Legacy and UEFI.  I selected UEFI and chose ubuntu as the option.  Thats it no more invalid partition table on boot.


LDAP loses to msyql when it comes to HA

Before I joined my employer we were already using ldap for storing users and customer data. The reason to pick ldap was that it matched with active directory.

It seems we can no longer scale on ldap. The main reasons for us to move away from ldap:

1) Index creation requires restarting ldap. WTF.  this is a big no no for any decent size company because this makes ldap a Single point of failure.
2) Schema changes requires ldap restart.
3) No a very big community support like we have for mysql and other relational databases.
4) Scaling developers is tough, for most people ldap is alien technology.
5) Loading 5K users in ldap took 2 hours as after a point like 5-10M users in a ldap the insert performance just sucks.

we recently migrated some customer from ldap2mysql and all above 5 points are solved by moving to mysql.  Even the performance of insert rocks in mysql.

Dumping ldap schema

I recently ran into an issue where I checked in an updated schema but it wont get reflected. The ops kept saying its updated but the app code wont show it.

Anyways to prove I thought how can I dump the ldap schema and finally found


ldapsearch -LLL -x -h "xxx.xxx.202.131" -b cn=Subschema -s base '(objectClass=subschema)' +

Ensure you dont miss the + sign in the end as I didnt added it and spent 15 extra min debugging it.

Tuesday, November 5, 2013

Dell lattitude 6430u drifting mouse pointer

Today i received  a brand new laptop shipped to my home  from my company. I opened it and booted it and immediately I realized something strange that I cant use the touchpad as no matter what I do the pointer would just drift randomly to one corner of the screen. I tried various options and I was surprised to see that even when I touch the sides of laptop that were not part of touchpad area the pointer would drift.  I thought may be its the first boot and somehow using keyboard I finished the initial windows boot. But again after restart same issue.

I plugged in an external mouse and it would work fine. I tried using the joystick pointer and it would also drift.

I tried various options and then thought I got a lemon or something went weird in shipping.  I read a forum about pressing Fn + F6 to disable joystick.  I tried that and it worked like a charm.

WTF this is second issue I saw with Dell, not sure what their QA team tested before certifying the laptop.  Anyways I work 90% using external mouse so I am ok.

I will anyways wipe windows clean and install ubuntu. Hoping I wont run into any more surprises.

Dell latitude 6430u boot order issue

Got a new Dell laptop from company and first thing I saw that it came with a external usb pluggable DVD drive.  I inserted a ubuntu CD in it and rebooted the computer. I pressed F2 to change boot order to DVD and then USB and then HDD. After restarting it wont boot from ubuntu CD.  I tried various options but then gave up.


Suddenly I realized about F12 so i rebooted and pressed F12 during boot and from there when I chose DVD it worked fine.

God knows what issue it was but it seems either I am missing something very minor or Dell BIOS is missing something major.

mysql weird trailing space in where query

I was surprised to see that mysql can be this dumb

The varchar field preserves trailing spaces but you can do any query like below and all three will match '/Shared/test'

select  * from folders where path ='/Shared/test'
select  * from folders where path ='/Shared/test '
select  * from folders where path ='/Shared/test      '

as per http://bugs.mysql.com/bug.php?id=64772  its a feature not a bug. WTF

Sunday, November 3, 2013

Scaling pains changing ldap to mysql was like changing engine of a nascar during race

When you have a big customer base then you get lots of interesting problems like :
  1. Customers trying to store 400K files in one folder
  2. Customers trying to dump 100+ TB of files in one account
  3. Customers with more than half of their data in Trash
  4. Customers wanting to create> 64K users in an account
  5. Customers with 25K+ users trying to do search/load/sort on users listing
Our startup used to use LDAP to store users and customer metadata and Mysql to store files/folder metadata. 

Lately we had been hit by #4 and #5 item and that is causing scaling issues because no matter what we do ldap write performance sucks when we go beyond some million users.  Earlier ldap used to scale because we had 30-40 ldaps but to reduce ops management issues we consolidated then to 4 per DC and it worked initially but lately with #4 and #5 its not scaling fine.  Ldap is an alien to most programmers, there are only 2-3 guys in the team that knows a little bit about it so most of the times it remains orphan and being involved in every prod issue regarding ldap is not fun.

So we started on a project to replace ldap with mysql but problem is that ldap was in the company from day one so replacing it isnt that easy.  We use SOA and there are many moving parts that uses ldap, converting ldap to mysql required upgrading all services in all data centers at same time which sounds both risky and a hard sell to ops/management team.  So how do you change an engine of a running race car?

Well thats where interface based programming and feature flags comes into picture.

We have all our code related to ldap in one layer implementing DirectoryService Interface



To move to mysql we introduced two fields on each customer ldap_url and mysql_url and prepopulated ldap_url for all customers. We then created RoutingDirectoryService that for each request will determine whether customer is on ldap or mysql and routes the call accordingly to LdapDirectoryService or SqlDirectoryService.

We had to follow the same pattern in all other services and some were written in python so we had to duplicate the effort.

But luckily we were able to achieve the objective with less disruption to release process and we released the code in sleeper mode to prod one service at a time. The code was in sleeper mode because no customers were moved to mysql until we upgraded all services to use router logic.

Last night we migrated 100 customers to mysql and so far we saw only one issue.  We would continue to migrate more customer over the month  and eventually when ldap is gone we would remove RoutingDirectoryService.

4 years ago if we had done this then it would have been easy as no of customers were less but now we have so many customers that  we need to go feature flag way. Also customers rely on us 24X7 so we cant take risk of changing the engines of all car in a running race at the same time so we are now doing one car at a time :) and putting them back on race track.