BIOsual: April 2012

Tuesday, 24 April 2012

Sketches (Part II)

Hello There,

Sooner than expected I'm here again writing a new post. To be honest this is some material that I had last week, but the previous post was getting too long and it was taking me a lot of time, therefore I split it and here is a second part. So continuing the idea here I'm gonna show some sketches i made to start having a better understanding of the project.

In this post I will focus on the configuration of a Domain in the Multi-Domain Configuration MDC mode (Remember the 3 parts the project is going to be divided). Here I'm assuming that a lot of components have previously registered in the BIOsual repository, this is important to me because is helping me to define the requirements for the repository mode.

So, once a domain is in the dashboard it needs to be configure, and to open the configuration panel a click in the domain is all am asking for. If is a predefined domain most of the configuration is done, if is a new Domain, the information has to be fill from scratch.

The way that I imagine the domain configuration now, includes 5 different sections, that in the sketches I have organise as tabs: Reference, Tracks, Entry Points, Components and Filters. Obviously at this stage everything can change, but this is the route I'm tracing and as long I reach the goal I'm willing to make changes once on it.

Domain - Reference

Every domain has to have a reference source of information, for instance, the whole genome of an species organised by chromosome for a DNA domain or all the available sequences for the Protein domain.

This information can be acquired in different ways.

4 - Setting up the reference of a domain.

So in the sketch I'm showing 3 options to load the reference:

By using a DAS source, this can be done by selection one of the more than 190 DAS reference sources or by directly inputting the URL of the DAS source.
Sequence Files (web): Some NGS files like BAM allow random access through http requests. This make possible to use a remote file as a reference without having to load the whole file.
The full genome can be divided in several of those files, so the system should allow to load more than one, and provide a way to identify them. Thats why I add the ID field.
Sequence Files(local): Other option might be to have local files as reference. For the case that the files are in the user machine. This might be tricky because of the limited access of javascript into the client machine.

BTW, does anybody knows if there is anything similar to BAM that just includes the sequence, like a fasta file that can be accessed in a particular position??

Other ways of loading a reference can be using databases, web services, or many other ways that I can not even thing about it and thats why this is based on modules that have to be registered in the BIOsual repository.

Domain - Tracks

And here guys you will have to apologize me, this is the most confusing of my sketches so far, and that's just a sign that I'm not sure about how to deal with this info.

Tracks are the classical way of displaying biological information in context to a reference (eg. Ensembl, Dasty3, UCSC genome browser, etc.) And basically a track consists in drawing a box in a position that aligns directly with the reference, so you can see where an annotation is located in a chromosome, protein or any other sequence.

5 - Setting up the visible tracks of a domain.

So each domain can have tens of tracks, coming from different locations in different formats to be displayed in different ways. Here is another place to modularise.

I will need adaptors for reading formats, I have to get a consensus of which information is relevant here and been able to put it in the same way no matter how is the source defined.

The system also requires to have options to visualise this info, the boxes idea explained above might be generic one, but for instance genes are composed by a series of exons and introns and the common track representation is boxes for the exons and lines for introns. Other annotations have an score, and a histogram representation might be more adequate.

So the sketch number 5 shows that when in the tab to configure the tracks, a list of the already set tracks is displayed, info in the list can be edited, tracks can be removed from the list and the tracks should be draggable to organise its order.

An option to add a new track is presented and if clicked a form should be injected in the config panel. This form depends on the type of source, because the way of access info changes, the form has to be defined as part of the module. So for example if the type of source is DAS the interaction should be similar to the one for the reference.

The visualisation method use for a particular track is also chosen in this form, and it might contain advanced setting for it, such as to associate a CSS style file.

Domain - Entry Points

The way of accessing the information vary from domain to domain, for example in genomes, isnormal to go to a particular location indicated by the chromosome and the range of base to visualise, but in proteins there are not chromosomes, and the way usual way to access is by the accession number and the location is not that important because the length of those sequences is not really big.

6 - Setting up Entry Points for a Domain

And then, here again the idea of modularise, visual components have to be register to select a region, and is the task of the designer of a MDC to select the appropriate Entry Point selector.
The image shows that for DNA a karyotype selector is set, to allow seen the high level structure of the whole genome.

But more than one Entry Point selector can be desired in a Domain, in the image I also consider that a location selector, and a search in track, the first one can be as simple as text boxes to capture the coordinates, or a slider to select a region or any other interface that allows the selection a region in a genome.

The search in Track is the way I can think in a generic component that can offer the functionality of accessing directly a gene for example.

Domain - Components

Sorry is getting long again is just 2 more to go, the other tab to configure a domain is to include other components that are not displayed as tracks, mainly because there are not associated with a location or because its content cannot be represented in a unidimensional graphic interface. For instance the 3D structure of a protein or its non positional annotations.

7 - Other Visual Components

This components have to be also registered in the BIOsual repository, and each of them have to have their own configuration values to be display in this tab.

Domain - Filters

And finally and optional feature that i thing can be really useful, a post query filter based on tracks where the designer can define a rule to accept or reject an of the annotations in the tracks.

8 - Filtering the collected data

For now I think that the rule should be define by indicating, the track is going to affect, if is inclusive or exclusive, the parameter to filter, and operator and a value. So in the image, a rule is defined to exclude all the features in Track 1 with a score higher or equal than 0.8

As an optional feature this might not be part of the firs prototypes but I think is something cool to have in mind.

I was getting quite technical for some moment do it might be a boring post, I had fun drawing this sketches, but the most I write the most i realise in what kind of project I'm embarking myself. But i suppose is part of the fun. So if you made it to this part of the post and have any comment/idea/question please let me know.

Un fuerte abrazo,

Gustavo.

Friday, 20 April 2012

Sketching (Part I)

Hey there people!

So I still quite excited about this project, and I hope this attitude last for long, but I know myself and that's usually not the case. So, trying to use this enthusiasm in the best way, I decide to start by creating some sketches of how I imagine the app has to be, and hopefully this will help me to define a good set of requirements. All this with the idea that whenever I'm not so inspired i can just take a requirement and work on it, and use that to find the enthusiasm again, or at least to progress even in the not so cool times.

Let's stop the chit-chat and let me show you which ideas I have have this week.

Firstly, I think the key for a system like this(at least one on the keyring) is to modularize. From the input methods to the visual components. And when I talk about modularize I'm thinking in defining a plug-in architecture that supports the control of the different components, and for instance, if a new format for dealing with annotations is found, a input component can be created and the rest of the system can then used without interfering with other parts.

To achieve this a think is necessary to divide the system in 3 parts (you know our all friend "divide & conquer):

A repository for components of different types(input, filters, visualisation, etc.).
A Multi-Domain Configuration (MDC) editor, that allows to define which domains are going to take part of a visualisation, how to relate those domains, which annotations will be displayed, etc.
An the visualisation itself, where the user for example choses a location in a chromosome, and the related annotations are displayed, and the other domains also get updated in the corresponding region.

And for now my ideas are being centered in the MDC editor (and yeah, im gonna start using that acronym a lot). So here I'm going to explain how I imagine this side of the system, and for that I am putting some handdraw sketches that I did and scan, so lets start for apologising for my awful handwriting, but I'm blaming the QWERTY keyboard for it(no really i had awful handwriting before been tied to a computer).

The approach I'm using here is not formal at all, and I'm continuously moving from web design characteristics to core requirements, look&feel idea, algorithms and implementations. I just hope that by writing about it I can start organising my chaos.

Anyways here some of those sketches with a bit of explanation of it.

MDC Menu

The first one is a to show the high level options that the MDC editor should have, I initially though I was going to call this menu dashboard, but then i decide is better to called MDC and the dashboard is the area where the action will happen(Next images).

1 - MDC Menu. High level options of the MDC Editor.

Classic File menu options are here, not very exciting. New, will clean the dashboard and will take the normal preventive actions to avoid losing changes. For Open and Save I will have to create a file format that includes all the configuration details of a MDC. I've been a XML fan for years, but all this been JavaScript, makes me think that this format has to be JSON.

Notice that those options dont include any server processing, I'm thinking here in creating the file in JS and the user can save this file locally, and to load his MDC it uses the Open option. Having MDC saved on the cloud is also an option, but that will requires users and authentication and I'm going to focus in a application that does not require any kind of authentication.

The Templates sub-menu contains a set of previously created MDC that are almost ready to use, for example, a 2 domains MDC: Human DNA from Ensembl and Proteins from UniProt; mapping them through an alignment DAS server. So in this case a user can have all this ready, including some by default annotation tracks, like genes and OMIM for the human domain, and InterPro and PFAM for the proteins. Then the user can star adding his tracks there, or a completely new domain.

The final option is to Run whatever is on the dashboard. Obviously this will require some validations verifying that all the domains are corrected configured, but if everything is OK this is the start point for the visualisation part of the system. I'm not sure if this is the best way of connecting the parts of the system, and which validations to do, but those are problems for the future Gustavo, not me.

Adding a Domain

And now with the second sketch, where the user is defining which domains are going to be used in the current MDC.

2 - Adding a domain into the dashboard.

So from another menu, or from a button in the dashboard (I'm thinking in the kind of android ICS screen icons) a set of available domains is displayed, similar idea as in the template submenu. And domains are circles, representing a scope. The user then, can drag a domain and drop it into the dashboard, it will be automatically aligned at the left with any other Domain that has been added.

If the user doesn't want to use any of the templates, it can drag the New Domain, which is a complete empty template that has to be completely defined, from its name to the style used at the visualisation stage.

So the circle idea is inspired in google+ hope that wont creates here a patent issue. ;-)

And I already found some jquery plugins that can create a similar effect, have a look to this demos:

Relations between Domains

Once the user has more than one domain in the dashboard, a relationship can be created between them, and basically my idea is that the relationship is created by a drag an drop gesture:

3 - Creating a relationship by drag&drop domains

I hope the sketch is self explanatory, but basically what I want is that if a domain is drag and dropped over another domain a relationship between them is created. Logically there is a lot more to define, and therefore a configuration panel for the relationship has to be open once a relationship is created.

Tons of things to define here, like what if other configuration panel is already open? how to make sure that changes are not lost? should I allow to create relationships while a configuration panel is open? And we haven't even touch the topic of how to define a relationship. More issues for the future Gustavo, that dude is gonna have lots of problems.

Some examples of drawing arrows in the web:

But it really depends of what is going to be my general strategy for drawing, if you have an opinion about how should i do the visualisation please let me know, I'm considering things like Raphael, or working with SVG, or Canvas, or just plain alignment of DIV elements... again that shows that i have a long way to go, but I have to start somewhere.

OK this Post is getting long and I dont want to bore my crowd of readers (about 2 friends and my girlfriend). So I will try to post soon to show you more sketches and hopefully a first version of the requirements document.

Hasta la próxima!

Friday, 13 April 2012

BIOsual Idea!

So, I've been playing around trying to define my PhD project. My interests are in visualisation and integration of biological information. I've done some other small projects in this field, but I haven't been able to find a project that fully interests me and that I can use what I have been working on for the last couple of years. Until now...

In the last meeting with my supervisor we were talking about different projects that our lab is involved in, and a common problem on those was the difficulty of moving the information from different domains, which means, for example, that the data obtained in a microarray experiment is hard to put it in context with genomic and proteomic information at the same time, i mean that is possible but it always requires extra work.

Other big issue is that thanks to high throughput technology the amount of generated data is massive!! I'm talking of files in the order of tens of gigabytes. And thats the kind of data we have to deal with. So whats my idea? Simply a tool fully configurable to visualize biological information from multiple domains. Pretty cool eh?

Well if you are in the bioinformatics field that sounds like one of the holy grails in our field. So yeah, thats what I'm gonna do. Oh well, thats pretty ambitious, but there is a lot of work done on it and I think big part of this mission is putting pieces together, for instance a good friend is working on a library of JavaScript components for biology, so I plan to use a lot of those. Another friend developed a protein annotation viewer based on DAS called Dasty, and it has a really nice architecture based in plugins, I know this because I helped in the design :-). Other friends have worked visualising genomic information, specially focus in personal genomics(myKaryoView). I also did some collaboration there.

And that's why I think I can came up with something useful and cool, but knowing myself I have to keep the motivation high, and thats where this blog became important. I will try to post at least once every 2 weeks reporting the progress that I have done, if any, and in that way force myself to do something to be able to write something. We'll see how it goes.

For now I started writing the requirements and drawing some sketches of my ideal interface, I will show those in my next post. So, for now wish me luck and please give me ideas about the project.
Just to finish, I want to say that although the genomic information maybe the central part of my project the motto of this project will be BIOsual: Not just another genome viewer.