BioMate

Lab-bench biologists find it difficult to use many existing tools for their data analysis. These tools are generally command-line computer programs written by computational biologists. Computational biologists do not have time to create user-friendly interfaces for their programs, and often find themselves spending a lot of time helping lab-bench biologists run their programs. This creates a burden for all involved: lab-bench biologists cannot move forward with their data analysis, and computational biologists cannot move forward with their research.

html: Security restricted macro is not allowed. An edit restriction is required that matches the macro authorization list.

<STYLE mce_bogus="1">
.pagetree2 a

Unknown macro: {font-size}

</STYLE>

Unknown macro: {pagetree2}

Click on a page above for a specific section.

User Analysis

Observations and Interviews

1. V.

V. is a postdoc working at a Biology lab at MIT that requires a lot of data processing support. V. herself does not do any computational work, although often solicits the help of computational biologists in analyzing her data.

V. is relatively satisfied with her interactions with the computational biologists, although she mentions that efficient communication is important because the computational biologist she works with needs to understand exactly what biological questions she is interested in. Where possible, V. uses workflows such as Galaxy (a software for performing basic computational biology tasks such as finding the intersection of two sets of regions). She mentions it is also sometimes difficult to wait until a computational biologist is free to do the work for her, as the available computational biologists are often oversubscribed. When asked if she often needs the computational biologist to perform similar tasks repeatedly, or to perform permutations of the same tasks, she said yes and agreed that it would be nice if she could perform the various permutations of a task herself rather than soliciting the help of a computational biologist each time.

When asked if there were other problems she encounters in her day-to-day life, she said that it is difficult to keep track of the various permutations of an experimental protocol that exist, and wished for a centralized way of knowing what people have tried in the past and what worked. For instance, she is currently working on a protocol for chromatin fractionation and is trying to gather as much information about it as she can. When we asked if we could see how she went about gathering information about a protocol, she revealed a large stack of papers and spent considerable time navigating it to find her desired paper, which was the experimental protocol published by the Young Lab. She then pulled up another set of papers which had the protocol that she had obtained from talking to the Young Lab directly. The latter protocol had several handwritten annotations and was messy to read, but also contained substantially more detail than the official published protocol. Photographs of the two papers side by side are included below:


V also mentioned that even within the lab, variations of a protocol (eg: for chromatin immunoprecipitation) exist for different antibodies and conditions, and the only way to gather all the necessary information is to go talking to people which is inefficient. Other members in the lab who were present seconded this opinion. A quick look at the lab notebooks revealed that lab-bench biologists would copy protocols into their lab notebook even though they are present on the lab’s online wiki in order to make annotations like the one shown below (“JW” is another member of the lab):


Furthermore, in optimizing an experimental procedure, lab-bench biologists also keep track of their previous attempts and their mistakes, so that they know what worked and what did not. Very often lab-bench biologists will commit an error in a step and will proceed with the protocol anyway, as it is a learning experience in how sensitive individual steps in the protocol are. An example of such an annotation is shown below:

As the years progress, however, lab notebooks can become bulky, and the task of flipping through them to identify the desired iteration of an experiment can be tedious. Some lab-bench biologists use electronic lab notebooks, but they are still not optimized for tracking the various iterations of an experiment.

Lessons learned:
  • Lab-bench biologists like V. are comfortable using GUI workflows to perform computational tasks (eg: the Galaxy webserver).
  • If there is no available GUI, lab-bench biologists like V. are dependent on a computational biologist for performing the analysis. However, they must work closely with the programmer to ensure that there is full understanding of the question under investigation. Communication is paramount.
  • Lab-bench biologists like V. do not personally run scripts that a programmer has written, even if the programmer has a readily useable script that could perform the desired analysis.
  • Novel experimental protocols are not set in stone; the level of detail that is published in a journal is much less than the level of detail that one could obtain from talking to the lab-bench biologists who developed the protocol.
  • Experimental protocols also vary depending on the conditions/reagents used (eg: Chromatin immunoprecipitation is different for different antibodies).
  • The “little details” are often critical to understanding how to get your experiment to work. It is important to gather as much information as possible directly from the people who have done the experiment in the past.
  • Negative results are important too; if an experiment fails, it is valuable to remember why it failed.
  • There is currently no efficient way to share the ‘little details’ about an experimental procedure between lab-bench biologists.
2. S.

S is a graduate student in the Biology department at MIT and needs to do biological data processing for her research. She needs to run big computational jobs on the lab’s cluster occasionally. She often notes down the required commands and parameters for a program she needs to run, in EverNote (A free note taking software for macbook users), so that she can copy paste them later. But, she often forgets where she has put her notes and also finds it difficult to search and find the location of the input files and scripts on the server if she needs to do the same analysis again. When she can’t locate her notes about the required commands, she looks at the help file in the command line but does not find it very helpful since it has a lot of text and she is only interested in finding how to run the program. She feels that, it would be easier for her to do data analysis if she could just fill some boxes with the parameters and select the command to run. She would also want to see examples, caveats and history of her previous selections.

An example screenshot of the command line help written by another programmer in the lab is shown below (S acknowledges that the information provided is sufficient to figure out how to run the script, but it is not easy to understand):

Lessons learned
  • Uncomfortable in using the commands and parameters to run a script, often forgets them
  • Often forgets the location of her notes on the scripts she has run
  • Finds it difficult to locate the scripts to run and the input files on the server's file system
  • Feels comfortable in filling in forms and selecting commands
  • Looks up the command line help file to see how to run a script, but finds it hard to understand
  • Interested to look at only the format of the commands and the meanings of the parameters
  • Would like to see some examples and possible caveats
  • Wants to see the history of commands she has run in the past
3. X.

X. is currently primarily a computational biologist, but has a background in lab-bench biology. He has performed experiments before and is familiar with the needs of lab-bench biologists, but he himself rarely performs experiments at present (but will in the future).

X.’s work involves writing scripts which analyze biological data. However, it is often the case that X.’s code must be run by lab-bench biologists who want to vary certain parameters within the script to fit their needs. These lab-bench biologists often have little, if any, knowledge of scripting and command-line utilities. To mitigate this issue, X. often finds that he must explain the use of his scripts to the lab-bench biologists, which can often be difficult given the knowledge gap. More frequently, though, it is the case that X. or other computational biologists within a lab are called on to run the scripts themselves with parameters modified to fit a particular lab-bench biologist’s request. This is time-consuming for all involved and distracts X. from performing other work, such as writing new scripts for different tasks.

X. said that he would like to have some mechanism to present the functionality of a script he has written to lab-bench biologists in a way where he would not need to be consulted, especially at a time when he may have forgotten how to use that particular script. X. also said that right now each set of varied parameters for any particular script must be analyzed using another, separate script which invokes his original script. These jobs which all use slightly varied parameters are tedious to schedule through the command line. To this end, X. said that he could greatly benefit from an interface to schedule batch jobs where he could specify sets of parameters to vary with each different job.

Finally, X. also agreed with V.’s assessment of the difficulty consolidating information about experiment protocols. X. said that the current method of combining handwritten notes is grossly inefficient and error-prone. Like V., X. would like to have some interface which provides an easy way to take shareable e-notes about particular experiment protocols with step-by-step instructions and comments from different team members about their experiences implementing that protocol.

Lessons Learned
  • Spends too much time supporting lab-bench biologists who wish to use the scripts he has already developed instead of spending time writing new scripts
  • Not only is it difficult to explain the use of his scripts to lab-bench biologists, but he himself is often called on to run the scripts for the lab-bench biologists
  • Writing scripts to run batch jobs that each use slight variants of a parameter is tedious; X. would appreciate a user-friendly interface to schedule these batch jobs
  • A way to share notes on experimental protocols is sorely needed
4. J.

 
J. is a postdoc in a biology lab at MIT. We spoke with him about issues he has with finding a usable protocol for experiments in the lab, as well as issues with communicating with his collaborators in computational fields.

When trying to find a protocol for a particular experiment, he finds that published papers do not usually include all the details required for running a particular experiment under certain conditions. For example, Nature Protocols has some standardized protocols available, but with none of the details that lab-bench biologists typically require to actually run the experiment under specific conditions. He said that he often has to spend a lot of time asking advice from many different people in his lab or other labs who have experience with the particular experiment or conditions he is investigating. He thinks it would be helpful if there were one centralized version of each important protocol for the whole lab to avoid this process, but he noted that most labs would not want to share this protocol with other labs except for their collaborators until they published a paper about it.

J. also spoke about issues he has with collaborating with computational biologists. One common scenario is that he wants to perform some analysis on his data, so he needs the collaborator to write some computer code to run the analysis. Often it is more efficient for the collaborator to give the completed code to J. than for J. to make many requests for different datasets and different parameters, but then J. has to figure out how to run the code on the command line. Although the collaborator might give good instructions and documentation, J. is not used to using a Unix terminal and finds it intimidating and confusing.

We were interested in J.’s difficulty with using a command-line interface, so we asked him to run some code for us, such as a tool commonly used for RNA transcript assembly called Cufflinks. The first issue he ran across was that he couldn’t remember his password to ssh to the server, since he rarely needs to perform this task. Once he finally connected, he said that he didn’t have time to show us a realistic example, since before running any code (such as Cufflinks), he would need to spend a lot of time reading about the program.  Instead, we asked him to perform some simple tasks, such as finding the number of lines in a file using the command line. He had the following difficulties:

  • He didn’t know off the top of his head how to find the number of lines in a file, so he googled for “unix line numbers” and found some entries about the command “nl”
  • Once he realized that nl did not do what he wanted, he searched more carefully and found the command “wc --l”, but misread the l as a 1.
  • Eventually he tried just running “wc”, which outputs three unlabeled numbers. He guessed that the smallest of the three numbers was the number of lines, but was not sure.
Lessons learned
  • Feels that published protocols do not contain all the necessary details
  • Would like a way to easily share protocols and related notes within the lab as well as with collaborators
  • Finds common tasks on a Unix terminal very difficult due to a lack of learnability of commands and command-line options
  • Typically resorts to outside sources such as Google for documentation since it is easier to find usage examples
  • Confusions such as “-l” with “-1” can occur, so they must be avoided from a programmer standpoint.

User Classes

Based on our interviews, we identified the following two user classes:

Lab-bench Biologists
  • Most of their day-to-day work involves performing experiments in a wet lab
  • They often need computational support for analyzing their experimental results
  • Generally not comfortable with using a command-line interface
  • Most of their detailed notes are kept in a lab notebook, not on the computer
Computational Biologists
  • They are typically computer scientists by training
  • Typically have a basic knowledge of biology, but often need help from lab-bench biologists to interpret biological significance of results
  • Their work often involves writing scripts for analyzing biological data, but they do not have the resources to take the time to create user-friendly programs

Needs and Goals

Lab-bench Biologists

Needs

Goals

Lab-bench biologists need to run scripts written by computational biologists and remember the commands and parameters.

An easier way to run the scripts without having to remember the commands and parameters (turning the recall task into recognition task).

Lab-bench biologists need to locate the script they wish to run, and need to identify which version of a script to run if several versions exist.

A better way to find the appropriate script to run, and a confusion-free way to distinguish which particular version of a script to run.

Lab-bench biologists need to work closely and communicate results efficiently with computational biologists to ensure the computational biologists fully understand the biological questions under investigation.

A way for lab-bench biologists to share the results of running a script with the computational biologists who created it, along with comments, clarifications or further requests.

Lab-bench biologists need the option of making personalized notes on what certain inputs means and any caveats to running the script. Their notes currently often get lost as they are not directly associated with the scripts they are about.

A way for lab-bench biologists to make notes about the scripts they are running, and a way for them to find these notes easily.

Lab-bench biologists need a way to keep track of previous commands they have submitted.

Some form of command history associated with prior uses of a particular script.

Computational Biologists

Needs

Goals

Computational biologists need to help lab-bench biologists analyze their data without spending too much time running scripts themselves or explaining to how to use their scripts.

Create a user-friendly interface for each of their scripts which lab-bench biologists can use to analyze data themselves, with minimal help.

Computational biologists need a way to share scripts and updates to scripts.

Create a tool which will allow computational biologists to easily share scripts and any subsequent updates so that lab-bench biologists can easily find the right version to use.

Comments

  • Comment on wordiness: We acknowledge that our interview section is long, but this is only because we had a lot of content to cover. For convenience, we have provided a summary “Lessons learned” section at the end of every interview.
  • Comment on ‘stretch’: Only one member of the team has formerly worked in close collaboration with lab-bench biologists as a computational biologist. Furthermore, we have framed the majority of our needs and goals from the perspective of lab-bench biologists. Thus, we feel that this problem is an appropriate ‘stretch’ for our team.
  • No labels

1 Comment

  1. Unknown User (jks@mit.edu)

    Overall: Good job with your GR1 write-up, reflects your presentation (also well done). Somewhat worried about stretch - even if only one of you was a computational biologist, that personal experience changes the group's starting knowledge of the problem. However, your interviews seem to highlight a number of concrete tasks from the lab-bench biologist's perspective (communicating requirements to comp. biologists, command line recall, personalized notes). If you focus on these tasks and the design of a highly usable UI for lab-bench biologists, this would constitute as enough stretch. Going forward, prioritize this higher - try to learn more about the lab-bench biologist's workflow: where does their data come from and live, is it script-ready or do they have to pre-process it, do they face problems when trying to communicate requirements or updates to comp. biologists? Build for these kinds of tasks.

    • Problem Statement: Great problem statement. 
    • User Analysis: Good role-based division of user classes. 
    • Needs/Goals Analysis: You've misinterpreted what a goal is. The goal is a user's objective, a high-level they want or have to complete pertaining to the problem. The goals in GR1 are not goals for you as designers, or goals for the system to achieve. That said, your needs analysis gets at what we meant by needs/goals, and your 'goals' are fairly system-agnostic, so it's all good.
    • Interviews/Observation: Great interviews with clear high-level themes across both lab-bench and computational biologists