Hello there...
This one took a little longer than expected, but here I'm with another post.
This time is about a collaboration with some of the guys in my lab, that between ideas, data and some work we have created a neat Web Interface to visualize protein-protein Interaction. For now this is Tuberculosis (TB) data, but the idea can be apply to any other protein interaction dataset. This was one in about a week, so don't expect anything more than a prototype.
So, two weeks ago in our weekly seminar, one of the other students of CBIO was presenting something about the interaction of proteins between TB and human. This is an approach used to try to understand how the mycobacteria affects the normal behaviour of a human, or something like that :-P
Anyways, she was presenting this graphs representing the interactions and some questions appear about the surrounding proteins, and how those interact and if there is any correlation between the unconnected parts of the graph. Obviously I was not understanding much of it, but I though that an interactive visualisation of that data may help this clever people to answer their questions.
The graphics that my friend was showing were generated by cytoscape, a very well known tool for this kind of data. I don't know much details of this tool, I kind of remember playing with it a while ago, so to be honest I cannot talk much about its strengths or weaknesses. The only think was the in that moment it was not available, and therefore the valid biological questions were not answered in the seminar room.
Part of the discussion went about the available visualisation tools for this data and how useful (or not) they are. I thought that a web visualisation of this should be, not just possible but simple enough for me to implement, obviously hoping to find a generic graph library. And once displaying the data, it would be possible to play with the representation using web techniques.
After the talk a very enthusiastic friend came to me and started talking about the same idea, plus he actually knew a couple of libraries to displays networks: protovis and Jit. I also did my homework and found out that protovis have become d3 and found another option as well called arbor.
Before seen those examples my worry in this was an algorithm to organize the layout of the nodes in the network, to avoid overlapping and to see the network in a nice way. But if you check those examples the are using a very cool algorithm called force, which pretty much simulates repulsion forces among nodes and tension forces using the links and it does it on the flight, creating a very cool effect.
While I was busy playing with different libraries my friend got some of the data and create a dummy example using protovis. One of the ways to input data in protovis is to have a json file with a particular structure. So, what my friend did was to create a python script to convert a subset of the data into json, and use that to create the graph. Simple and nice as I like it! Below is an image of a part of the generated graph, and here is the link for you to see the example.
But here is our first real challenge, that was a subset of 2000 proteins, and the performance of that is very poor as you can see. Clearly we were asking too much to the SVG engine of the browser. The big problem is that the original file have over 300000 interactions, we were not using 1% of the data and we were reaching the limits of the browser.
In Web development there is a pattern that I have followed for a while, try to make it happen in the client, but if not possible, simply do it in the server. OK, thats not exactly a massive piece of wisdom, specially when in web development there are virtually those two options: server or client. However I know of many developers who insist in doing stuff in the server that now is totally possible in the client, were the latter gaves lots of potential advantages, specially in the usability point of view.
So what I did was to create a Solr server with the protein data, this will open the possibility of doing interesting queries with very short response time, and really easy to implement. Here I make a parenthesis, If you are a developer and you dont know what is Solr, make yourself a favor and go to learn about it, is totally worth the effort.
Back in track, for this proof of concept I am using a very simple dataset of just over 66000 interactions, each of those been the 2 protein interacting and a score of how reliable is this information. in conclusion, for now is just possible to query for protein or score, but the potential of Solr is there and is just a matter of putting more info, such as annotation of each protein.
There is a nice javascript library to deal with Solr queries called AJAX Solr, I have used before in other projects, so I was familiar with it, besides is easy to use and it has a well organize code structure. Its concept is basically to have widgets that react to the moment a Solr response is catched. Moreover, those widgets are also able to execute new queries.
So what I did was to create a widget for AjaxSolr, that uses d3 to generate the graph based in a Solr response, for example, if I query for all the interactions were the protein with UniProt Id Q10387 is presented, I got 45 interactions, so the widget creates nodes for each protein and links for each interaction and just let d3 to do its job!
The rest was just to adapt some of the widgets that the AJAX Solr tutorial has, like the autocomplete or the list of current queries. Here the main change is that the tutorial work joining the queries as a disjunction, I mean using AND operators: query1 AND query2 AND query3. but in our case we require to use conjunctions (OR). and that changes a little of the logic there, but is not big deal.
So here is How it looks, and if you want to play with it you can go to this URL: http://biosual.cbio.uct.ac.za/interactions/. We have many ideas to implement here, and for that we need to put more data in the server, and also the way of drawing the graph can get improved, however we think this is a very nice start point.
This one took a little longer than expected, but here I'm with another post.
This time is about a collaboration with some of the guys in my lab, that between ideas, data and some work we have created a neat Web Interface to visualize protein-protein Interaction. For now this is Tuberculosis (TB) data, but the idea can be apply to any other protein interaction dataset. This was one in about a week, so don't expect anything more than a prototype.
So, two weeks ago in our weekly seminar, one of the other students of CBIO was presenting something about the interaction of proteins between TB and human. This is an approach used to try to understand how the mycobacteria affects the normal behaviour of a human, or something like that :-P
Anyways, she was presenting this graphs representing the interactions and some questions appear about the surrounding proteins, and how those interact and if there is any correlation between the unconnected parts of the graph. Obviously I was not understanding much of it, but I though that an interactive visualisation of that data may help this clever people to answer their questions.
The graphics that my friend was showing were generated by cytoscape, a very well known tool for this kind of data. I don't know much details of this tool, I kind of remember playing with it a while ago, so to be honest I cannot talk much about its strengths or weaknesses. The only think was the in that moment it was not available, and therefore the valid biological questions were not answered in the seminar room.
Part of the discussion went about the available visualisation tools for this data and how useful (or not) they are. I thought that a web visualisation of this should be, not just possible but simple enough for me to implement, obviously hoping to find a generic graph library. And once displaying the data, it would be possible to play with the representation using web techniques.
After the talk a very enthusiastic friend came to me and started talking about the same idea, plus he actually knew a couple of libraries to displays networks: protovis and Jit. I also did my homework and found out that protovis have become d3 and found another option as well called arbor.
Before seen those examples my worry in this was an algorithm to organize the layout of the nodes in the network, to avoid overlapping and to see the network in a nice way. But if you check those examples the are using a very cool algorithm called force, which pretty much simulates repulsion forces among nodes and tension forces using the links and it does it on the flight, creating a very cool effect.
While I was busy playing with different libraries my friend got some of the data and create a dummy example using protovis. One of the ways to input data in protovis is to have a json file with a particular structure. So, what my friend did was to create a python script to convert a subset of the data into json, and use that to create the graph. Simple and nice as I like it! Below is an image of a part of the generated graph, and here is the link for you to see the example.
15 - Network of 2000 human protein interacting. |
In Web development there is a pattern that I have followed for a while, try to make it happen in the client, but if not possible, simply do it in the server. OK, thats not exactly a massive piece of wisdom, specially when in web development there are virtually those two options: server or client. However I know of many developers who insist in doing stuff in the server that now is totally possible in the client, were the latter gaves lots of potential advantages, specially in the usability point of view.
So what I did was to create a Solr server with the protein data, this will open the possibility of doing interesting queries with very short response time, and really easy to implement. Here I make a parenthesis, If you are a developer and you dont know what is Solr, make yourself a favor and go to learn about it, is totally worth the effort.
Back in track, for this proof of concept I am using a very simple dataset of just over 66000 interactions, each of those been the 2 protein interacting and a score of how reliable is this information. in conclusion, for now is just possible to query for protein or score, but the potential of Solr is there and is just a matter of putting more info, such as annotation of each protein.
There is a nice javascript library to deal with Solr queries called AJAX Solr, I have used before in other projects, so I was familiar with it, besides is easy to use and it has a well organize code structure. Its concept is basically to have widgets that react to the moment a Solr response is catched. Moreover, those widgets are also able to execute new queries.
So what I did was to create a widget for AjaxSolr, that uses d3 to generate the graph based in a Solr response, for example, if I query for all the interactions were the protein with UniProt Id Q10387 is presented, I got 45 interactions, so the widget creates nodes for each protein and links for each interaction and just let d3 to do its job!
The rest was just to adapt some of the widgets that the AJAX Solr tutorial has, like the autocomplete or the list of current queries. Here the main change is that the tutorial work joining the queries as a disjunction, I mean using AND operators: query1 AND query2 AND query3. but in our case we require to use conjunctions (OR). and that changes a little of the logic there, but is not big deal.
So here is How it looks, and if you want to play with it you can go to this URL: http://biosual.cbio.uct.ac.za/interactions/. We have many ideas to implement here, and for that we need to put more data in the server, and also the way of drawing the graph can get improved, however we think this is a very nice start point.
16- TB protein protein interactions viewer |
And you might noticed I didn't put any code in this blog, oh well if you want some code the good news is that because is a collaborative project with my friends at the lab we set up a google code repository, so you can go and get all the code of this, and everytime any of us make some improvement is gonna be there for you: http://code.google.com/p/biological-networks/.
Chao gente!!!
PS: After having develop the prototype a friend told me about a HTML5 library of cytoscape, I guess i didn't did my state of the art homework as good as I though, so I am going to have to check this and see if is worth to use it instead of d3.
Chao gente!!!
PS: After having develop the prototype a friend told me about a HTML5 library of cytoscape, I guess i didn't did my state of the art homework as good as I though, so I am going to have to check this and see if is worth to use it instead of d3.