Skip to main content


Showing posts from April, 2012

From NOSQL to MySql (BDB to mysql) - Part3

We finally migrated 14 nodes this weekend. As there were millions of files to be migrated even after running 5 parallel thread on each node it took close to 3-4 hours per node. We were running 3 nodes at a time otherwise the db load was shooting up high. As we cant afford to have a downtime on weekdays, the migration has to happen on weekend nights. On Friday night it we were up till 6:00 AM and on Saturday night we were up till 4:00 AM. We wanted to do more nodes but the no of files per host was going overboard and I wanted to be conservative to start with, if the mysql servers can handle more then later we would consolidate shards or move them to this host. New mysql hosts are going to be provisioned next week so hoping to migrate more nodes this week.  Monday was a calm day after long time as we chose all nodes that were spiking to be migrated first.

Db loads in all DCs are low and even app nodes are low, Surprisingly slave dbs are getting pounded more than master db after last wee…

Mysql Sharding at our company- Part1

Sharding is a double edged sword, on one hand it allows you to scale your applications with increased growth in customers and on other hand it very hard for junior developers to grasp the concept. I try to abstract and encapsulate as much complexity as I can in the framework.

Typically people either do
1. Functional Sharding/Vertical sharding: For huge datasets this will only buy you time. When I joined the company everything was stored in Berkely db or Lucene or in some files on filesystems. I tried moving all the things to Mysql in one project but it became a monster so I started moving pieces of applications out into mysql and moving them to production. This also boosted confidence of team in mysql as more feature went live on Mysql and they saw that Mysql was reliable and scalable. I started with smaller pieces that didn't had >10-20M rows but needs mysql for its ACID properties. Anticipating the growth we decided to create one schema per functionality and avoided joins bet…

From NOSQL to MySql (BDB to mysql) - Part2

So we are now in third week of Mysql deployment and slowly we are moving nodes from BDB to Mysql  Results so far have been promising.
1) We had ran into some query optimisation issues with Mysql that I blogged here
2) Just ran into a 4 byte unicode issue with Mysql where utf8 with mysql wont support 4 bytes characters and apparently you have to use utf8mb4 for it, how stupid of Mysql.  Will write a separate blog post for it.

But as far as performance is concerned its rocking so far. On 20th we did the deployment and as soon as the node went live with Mysql you can see immediate drop in load avg consistently for this week. This is because most of the processing that was done by in memory db on app node is now transferred to Mysql.  But in second graph you would see that Mysql is sitting ducks and its no where close to app node load avg. Now this is with 3 nodes pounding, we would add 10+ nodes to start with on ea…

Quartz delete/unschedule a job at runtime

I had a requirement to remove a quartz job from a tomcat without restarting it. So best solution is to write a jsp to do this. I had earlier written a Quartz Admin jsp blog so I went and added a Delete button to that jsp and it appears all I need to do is call
    if("Delete".equals(btnInterrupt)) {
         scheduler.deleteJob(jobNameToRun, groupNameToRun);
    Job <%=jobNameToRun%> Deleted.


There is a unschedule  api also but that will only unschedule the job's next execution and I wanted to remove the job.  When tomcat is restarted the job will come back.

Adding Jitter to cache layer

Thundering herd is an issue common to webapp that rely on heavy caching where if lots of items expire at the same time due to a server restart or temporal event, then suddenly lots of calls will go to database at same time. This can even bring down the database in extreme cases. I wont go into much detail but the app need to do two things solve this issue.

1) Add consistent hashing to cache layer : This way when a memcache server is added/removed from the pool, entire cache is not invalidated.  We use memcahe from both python and Java layer and I still have to find a consistent caching solution that is portable across both languages. hash_ring and spymemcached both use different points for server so need to read/test more.
2) Add a jitter to cache or randomise the expiry time: We expire long term cache  records every 8 hours after that key was added and short term cache expiry is 2 hours. As our customers usually comes to work in morning and access the cloud file server it can happen …

shell file $10 parameter

Normally you don't have a shell file that calls a java program with 10 parameters but I had a migration script had 10 arguments and I kept getting first parameter in the 10th argument as shell was seeing $10 and it was passing first param concatenated with 0. The solution is to enclose all parameters > 9 with curly brackets.

so use something like

$JAVA_HOME/bin/java -verbose:gc -Duser.timezone=GMT -Dfile.encoding=UTF8 -server -Xms512m -Xmx2048m $1 $2 $3 $4 $5 $6 $7 $8 $9 ${10} ${11} ${12}

instead of

$JAVA_HOME/bin/java -verbose:gc -Duser.timezone=GMT -Dfile.encoding=UTF8 -server -Xms512m -Xmx2048m $1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12

mysql subquery multiple columns

if you need a subquery to return multiple columns for comparison purpose then its allowed so you can do something like

select x from table1 where (a,b)= (select a,b from table2)
but if you want something like

select (select a,b from table1 where  table1.x=table2.x) fields, table2.y, table2.z from table2 join table3 on where table2.xx=10
then mysql wont allow you to select 2 columns in the first subquery.

To avoid this you can turn the query into a join query and write a wrapper query on top like

select a,b,y,z from (table1 join table2 on table1.x =table2.x) join table3 on where table2.xx=10
but I ran into an issue on weekend where table1 had 1.4M rows and table 2 had 40K rows and table3 had 3K rows.  Now joining 1.4M to 44K rows was not coming out of Mysql database even in 10 minutes so this approach was out of question. The final result set size was 44K.

So finally the best solution I could find on weekend to fix this to concatenate columns and sp…

java Spring Shard datasource with Mysql/Oracle

If you are implementing database sharding and using Spring JDBC then you are out of luck to using declarative transactions and find a Datasource with Spring that would handle sharding. I had to implement my own Datasource manager and own annotations to use declarative kind of transactions to hide complexities from average developers.  Its very important to abstract out cross cutting concerns as sharding and transactions so that any junior developers wont be confused and start copying code left and right without understanding the impact of their changes globally. 

So the idea is that
1) You would implement a ShardDataSourceManager that would be basically pool of connection pools and you would lookup a datasource by shard id.
2)You would define your own Transactional annotations and annotate methods with it
3) You need to write an interceptor at dao layer that would read annotations on method and some context info. From the context info you would lookup shard id and lookup datasource a…

Annotating SQL queries

I have been using oracle and mysql top 10 queries view to proactively diagnose slow queries and optimize them for next release. But sometimes the same query will have different plans depending on data set its executed on. So if you have a sharded database then a query like select * from files where file_path=? and customer_id=? would run fine locally and for most shards but not for some shards.

To diagnose live production issues or issues that occurred in past it would be good to know some query context and one trick to do this is to annotate your queries with context information in comments.

    public static String annotateSql(String contextInfo, String sql) {
        return "  /*" + contextInfo + "*/" + sql;

now your top queries report over last day would have top queries along with context info. This context info can be customerId or shard id.

From NOSQL to MySql (BDB to mysql) - Part1

Before I joined the company it was evolving from a search and share company to a cloud file server and we were using lucene as a datastore for metadata but it was always lagging in updates as  you cant update data realtime in lucene. Then we were going to NFS for metadata but it was not scaling for metadata calls as NFS protocol was never meant to scale to web2.0 volume. So someone picked up Berkeley DB as a backend. The only reason to not pick any other NoSQL products was that none of them had multi row transactions capabilities and relational db was not picked as it would take time to port code to relational db. I had recently joined at that time and then company became a hit and we scaled from 3-4 nodes in one data centre to 60-100 nodes across multiple datacentre. We are still going through the scaling phase and believe me it is not fun scaling Berkeley db.  Berkeley db works fine if you can hold your dataset in memory but as soon as you start hitting dbs of size 10G or more it be…

Jersey file upload ContentDisposition getSize

if you are using Jersey for file upload and using "@FormDataParam("file") InputStream file" to receive the file and  "@FormDataParam("file") FormDataContentDisposition fileDetail" to receive filemetadata then you will run into and issue where fileDetails.getSize() is coming as -1.  This will only get populated if the "size" header is passed in  multipart request. For me FF was not sending when I was uploading file.

The only way I could make it work was to add a TeeInputStream and calculate the length.  I wrote a class

    private static class OutputStreamSizeCalculator extends OutputStream {
        private long length;

        public void write(int b) throws IOException {

        public long getLength() {
            return length;

and then wrapped my InputStream around it
        OutputStreamSizeCalculator out = new OutputStreamSizeCalculator();
        TeeInputStream tee = new …