Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
{anchor:obs}

h1. *BioMate*

Technology that suits the skills and needs of lab-bench biologists.

{html}<STYLE>.pagetree2 a{font-size:16px; }</STYLE>{html}
{pagetree2:BioMate}

_Click on a page above for a specific section._

h2. *User Analysis*

h3. *Observations and Interviews*

h5. *1. V.*

V. is a postdoc working at a Biology lab at MIT that requires a lot of data processing support. V. herself does not do any computational work, although often solicits the help of computational biologists in analyzing her data.

V. is relatively satisfied with her interactions with the computer scientists, although she mentions that efficient communication is important because the computational biologist she works with needs to understand exactly what biological questions she is interested in. Where possible, V. uses workflows such as Galaxy (a software for performing basic computational biology tasks such as finding the intersection of two sets of regions). She mentions it is also sometimes difficult to wait until a computational biologist is free to do the work for her, as the available computer scientists are often oversubscribed. When asked if she often needs the computational biologist to perform similar tasks repeatedly, or to perform permutations of the same tasks, she said yes and agreed that it would be nice if she could perform the various permutations of a task herself rather than soliciting the help of a computational biologist each time.

When asked if there were other problems she encounters in her day-to-day life, she said that it is difficult to keep track of the various permutations of an experimental protocol that exist, and wished for a centralized way of knowing what people have tried in the past and what worked. For instance, she is currently working on a protocol for chromatin fractionation and is trying to gather as much information about it as she can. When we asked if we could see how she went about gathering information about a protocol, she revealed a large stack of papers and spent considerable time navigating it to find her desired paper, which was the experimental protocol published by the Young Lab. She then pulled up another set of papers which had the protocol that she had obtained from talking to the Young Lab directly. The latter protocol had several handwritten annotations and was messy to read, but also contained substantially more detail than the official published protocol. Photographs of the two papers side by side are included below:

!p1.PNG|thumbnail, vspace=10, border=2!
V also mentioned that even within the lab, variations of a protocol (eg: for chromatin immunoprecipitation) exist for different antibodies and conditions, and the only way to gather all the necessary information is to go talking to people which is inefficient. Other members in the lab who were present seconded this opinion. A quick look at the lab notebooks revealed that Biologists would copy protocols into their lab notebook even though they are present on the lab’s online wiki in order to make annotations like the one shown below (“JW” is another member of the lab):

 !Untitled.png|thumbnail,vspace=10, border=2!
Furthermore, in optimizing an experimental procedure, biologists also keep track of their previous attempts and their mistakes, so that they know what worked and what did not. Very often biologists will commit an error in a step and will proceed with the protocol anyway, as it is a learning experience in how sensitive individual steps in the protocol are. An example of such an annotation is shown below:

!p3_2.png|thumbnail,vspace=10, border=2!

As the years progress, however, lab notebooks can become bulky, and the task of flipping through them to identify the desired iteration of an experiment can be tedious. Some biologists use electronic lab notebooks, but they are still not optimized for tracking the various iterations of an experiment.

h6. *Lessons learned:*

* Lab bench biologists like V. are comfortable using GUI workflows to perform computational tasks (eg: the Galaxy webserver).
* If there is no available GUI, lab bench biologists like V. are dependent on a computational biologist for performing the analysis. However, they must work closely with the programmer to ensure that there is full understanding of the question under investigation. Communication is paramount.
* Biologists like V. do not personally run scripts that a programmer has written, even if the programmer has a readily useable script that could perform the desired analysis.

* Novel experimental protocols are not set in stone; the level of detail that is published in a journal is much less than the level of detail that one could obtain from talking to the biologists who developed the protocol.
* Experimental protocols also vary depending on the conditions/reagents used (eg: Chromatin immunoprecipitation is different for different antibodies).
* The “little details” are often critical to understanding how to get your experiment to work. It is important to gather as much information as possible directly from the people who have done the experiment in the past.
* Negative results are important too; if an experiment fails, it is valuable to remember why it failed.
* There is currently no efficient way to share the ‘little details’ about an experimental procedure between biologists.



h5. *2. S.*

S is a graduate student in the Biology department at MIT and needs to do biological data processing for her research. She needs to run big computational jobs on the lab’s cluster occasionally. She often notes down the required commands and parameters for a program she needs to run, in EverNote (A free note taking software for macbook users), so that she can copy paste them later. But, she often forgets where she has put her notes and also finds it difficult to search and find the location of the input files and scripts on the server if she needs to do the same analysis again. When she can’t locate her notes about the required commands, she looks at the help file in the command line but does not find it very helpful since it has a lot of text and she is only interested in finding how to run the program. She feels that, it would be easier for her to do data analysis if she could just fill some boxes with the parameters and select the command to run. She would also want to see examples, caveats and history of her previous selections.

An example screenshot of the command line help written by another programmer in the lab is shown below (S acknowledges that the information provided is sufficient to figure out how to run the script, but it is not easy to understand):

 !screenshotHelp.PNG|thumbnail, vspace=10, border=2!

h6. *Lessons learned*

* Uncomfortable in using the commands and parameters to run a script, often forgets them
* Finds it difficult to locate the scripts to run and the input files on the server's file system
* Feels comfortable in filling in forms and selecting commands
* Looks up the command line help file to see how to run a script, but finds it hard to understand
* Interested to look at only the format of the commands and the meanings of the optionsparameters
* Would like to see some examples and possible caveats
* Wants to see the history of commands she has run in the past


h5. *3. X.*


X. is currently primarily a computational biologist, but has a background in lab-bench biology. He has performed experiments before and is familiar with the needs of wet-lab biologists, but he himself rarely performs experiments at present (but will in the future).


X.’s work involves writing scripts which analyze biological data. However, it is often the case that X.’s code must be run by lab-bench biologists who want to vary certain parameters within the script to fit their needs. These biologists often have little, if any, knowledge of scripting and command-line utilities. To mitigate this issue, X. often finds that he must explain the use of his scripts to the biologists, which can often be difficult given the knowledge gap. More frequently, though, it is the case that X. or other computational biologists within a lab are called on to run the scripts themselves with parameters modified to fit a particular biologist’s request. This is time-consuming for all involved and distracts X. from performing other work, such as writing new scripts for different tasks.

X. said that he would like to have some mechanism to present the functionality of a script he has written to lab-bench biologists in a way where he would not need to be consulted, especially at a time when he may have forgotten how to use that particular script. X. also said that right now each set of varied parameters for any particular script must be analyzed using another, separate script which invokes his original script. These jobs which all use slightly varied parameters are tedious to schedule through the command line. To this end, X. said that he could greatly benefit from an interface to schedule batch jobs where he could specify sets of parameters to vary with each different job.

Finally, X. also agreed with V.’s assessment of the difficulty consolidating information about experiment protocols. X. said that the current method of combining handwritten notes is grossly inefficient and error-prone. Like V., X. would like to have some interface which provides an easy way to take shareable e-notes about particular experiment protocols with step-by-step instructions and comments from different team members about their experiences implementing that protocol.

h6. *Lessons Learned*

* Spends too much time supporting biologists who wish to use the scripts he has already developed instead of spending time writing new scripts
* Not only is it difficult to explain the use of his scripts to biologists, but he himself is often called on to run the scripts for the biologists
* Writing scripts to run batch jobs that each use slight variants of a parmeter is tedious; X. would appreciate a user-friendly interface to schedule these batch jobs
* A way to share notes on experimental protocols is sorely needed

h5. *4. J.*

&nbsp;
J. is a postdoc in a biology lab at MIT. We spoke with him about issues he has with finding a usable protocol for experiments in the lab, as well as issues with communicating with his collaborators in computational fields.

When trying to find a protocol for a particular experiment, he finds that published papers do not usually include all the details required for running a particular experiment under certain conditions. For example, _Nature Protocols_ has some standardized protocols available, but with none of the details that biologists typically require to actually run the experiment under specific conditions. He said that he often has to spend a lot of time asking advice from many different people in his lab or other labs who have experience with the particular experiment or conditions he is investigating. He thinks it would be helpful if there were one centralized version of each important protocol for the whole lab to avoid this process, but he noted that most labs would not want to share this protocol with other labs except for their collaborators until they published a paper about it.

J. also spoke about issues he has with collaborating with computer scientists or computational biologists. One common scenario is that he wants to perform some analysis on his data, so he needs the collaborator to write some computer code to run the analysis. Often it is more efficient for the collaborator to give the completed code to J. than for J. to make many requests for different datasets and different parameters, but then J. has to figure out how to run the code on the command line. Although the collaborator might give good instructions and documentation, J. is not used to using a Unix terminal and finds it intimidating and confusing.

We were interested in J.’s difficulty with using a command-line interface, so we asked him to run some code for us, such as a tool commonly used for RNA transcript assembly called Cufflinks. The first issue he ran across was that he couldn’t remember his password to ssh to the server, since he rarely needs to perform this task. Once he finally connected, he said that he didn’t have time to show us a realistic example, since before running any code (such as Cufflinks), he would need to spend a lot of time reading about the program. &nbsp;Instead, we asked him to perform some simple tasks, such as finding the number of lines in a file using the command line. He had the following difficulties:
* He didn’t know off the top of his head how to find the number of lines in a file, so he googled for “unix line numbers” and found some entries about the command “nl”
* Once he realized that nl did not do what he wanted, he searched more carefully and found the command “wc \--l”, but misread the l as a 1.
* Eventually he tried just running “wc”, which outputs three unlabeled numbers. He guessed that the smallest of the three numbers was the number of lines, but was not sure.

h6. *Lessons learned*

* Feels that published protocols do not contain all the necessary details
* Would like a way to easily share protocols and related notes within the lab as well as with collaborators
* Finds common tasks on a Unix terminal very difficult due to a lack of learnability of commands and command-line options
* Typically resorts to outside sources such as Google for documentation since it is easier to find usage examples
* Confusions such as “-l” with “-1” can occur, so they must be avoided from a programmer standpoint.


h3. *User Classes*

Based on our interviews, we identified the following two user classes:

h5. Wet-Lab Biologists

* Most of their day-to-day work involves performing experiments in a wet lab
* They often need computational support for analyzing their experimental results
* Generally not comfortable with using a command-line interface
* Most of their detailed notes are kept in a lab notebook, not on the computer

h5. Computational Biologists

* They are typically computer scientists by training
* Typically have a basic knowledge of biology, but often need help from biologists to interpret biological significance of results
* Their work often involves writing scripts for analyzing biological data, but they do not have the resources to take the time to create user-friendly programs


h3. *Needs and Goals*

h5. *1) A more efficient way for biologists to find the optimal experimental procedure*

*Need:* A better way for biologists to communicate their hard-earned knowledge with each other - both what worked, and what didn’t work.

Biologists have to navigate a large amount of information and go through tedious trial-and-error to identify which particular variant of an experimental protocol is appropriate for their situation. At present, there is a communication barrier: a lot of “little details” on a procedure are absent from the “official” published protocol in a journal, and interviewing other biologists one-on-one is time consuming. There is a need for a better way for biologists to communicate their hard-earned knowledge with each other - both what worked, and what didn’t work. This would hopefully reduce the trial-and-error process, as people would avoid doing things that were found to fail.

*Goal:* To improve knowledge sharing between biologists about these poorly-understood experimental protocols.


In achieving this goal, it would also be important to keep track of _why_ a particular variation of a protocol exists, because the conditions for one experiment may be different in important ways from the conditions of another (eg: ChIP protocols vary based on the antibody used).

h5. *2) An easier way for biologists to analyze their data*

*Need:* Biologists need to analyze their data in order to determine the success of their experiments and publish results.

Biologists frequently collaborate with computer scientists to analyze their experimental results. Often, the computer scientists will write a command-line tool which they can easily run with data provided by the biologists to perform the analysis. However, biologists often want to run the analysis on many different data sets with many different parameters, so they either have to make many requests of the computer scientists, or run the script themselves.

*Goal:* Have an easy way for biologists to run computational tools that their collaborators create.


Biologists are generally not comfortable with a command line interface and find it daunting to run code provided by their collaborators. They would like to have an easily accessible history of commands they have run in the past, as well as a tool which would prompt them to enter the necessary options for the script and explain the meaning of each option. They would also like to see useful examples and possible caveats for each command.

h5. *3) An easier way for computational biologists to make their scripts user friendly*

*Need:* Computational Biologists need to present their work to biologists in a way which is easy for them to use without substantial help.


Computational Biologists create the scripts which are used to analyze biological data sets. Since Biologists typically lack familiarity with programming and command-line utilities, computational biologists often need to spend a lot of time explaining the use of their scripts to biologists. Furthermore, computational biologists often spend a lot of time writing helper scripts to run different analyses. This is time-consuming and generally unnecessary for the computational biologist.

Both lab-bench and computational biologists alike would also benefit from maintaining information about the script such as notes on what certain inputs mean or special caveats when using the script. This kind of information is helpful if someone wants to come back to the script later on and edit it for a slightly different purpose.

*Goal:* Create a user-friendly interface for each of their scripts which biologists can use to analyze data themselves, with minimal help.

Pursuant to the needs outlined above, computational biologists would benefit from a programmer-facing interface which would allow them to define the function of a script in a way which is easy-to-understand and maintainable (for example, if the script needs to be edited later). The programmer would also like to present special information to the biologist (or his/her later self) about the parameters of the script as well as any special caveats to know when using the script.