absee 1.0

absee has underwent a lot of structural changes.

First, it only retrieves the information you want instead of returning all traces and called bases.
Second, it’s a class now, so you can hold onto multiple sequencing data.
Third, it now has quality scores.

%irb
>> require ‘absee’
=> true
>> my_variable = ABSee.new()
=> #<ABSee:0x000001008599d0>
>> my_variable.read("/Users/Jenny/Desktop/my_sequence.ab1")
=> nil
>> my_variable.get_calledSequence()

Class Methods

  • read(file_location)
    • returns nil
  • get_traceA()
    • returns an array with the trace data for adenine
  • get_traceG()
    • returns an array with the trace data for guanine
  • get_traceC()
    • returns an array with the trace data for cytosine
  • get_traceT()
    • returns an array with the trace data for thymine
  • get_calledSequence()
    • returns an array with the Basecalled sequence
  • get_qualityScores()
    • returns an array with the Basecalled quality scores
  • get_peakIndexes()
    • returns an array with indexes of the called sequence in the trace

previous articles

easy as absee
absee Module

GATTACA Enzyme – Thought Experiment

Introduction

The name of GATTACA, a film about a genetically perfect distopia, was based off of the four nucleotides in DNA. Many hypothesize the name was inspired from GATATC, the DNA recognition sequence of the restriction enzyme EcoRV. Since GATATC is quite a few base pairs different from GATTACA, I became curious to see if any restriction enzyme would use GATTACA as its recognition sequence.

Methods

A short search in NEB’s Enzyme Finder uncovered BsaBI.

BsaBI has the following recognition sequence:

G A T N N / N N A T C
C T A N N / N N T A G

It nicely envelops GATTACA with its non-specific bases:

G A T T A / C A A T C
C T A A T / G T T A G

For example, the following 60bp DNA strand, when digested by BsaBI, would yield a 24bp strand and a 36bp strand, viewable on hi-density gels.

5′- TCGTACTTCGGCTCTACCAGATTA/CAATGGCCATTGTAATCTGGTAGAGCCGAAGTACGA -3′

5′- TCGTACTTCGGCTCTACCAGATTA -3′

5′- CAATGGCCATTGTAATCTGGTAGAGCCGAAGTACGA -3′

absee Module

[updated again as a Ruby class]

absee has an update!

The 0.1.0.0 version now encapsulates the methods in a Ruby Module, instead of being global functions.

Example usage:

% irb
>> require ‘absee’
=> true
>> Absee.readAB(“/Users/Jenny/Desktop/my_sequence.ab1″)

It still returns six arrays (the trace values for ACGT, called sequence, and peak indexes).

More information can be found on my previous post.

Thanks goes to Dan Cahoon for forking my absee on github.

3D-Printed Enzyme – Proof of Concept

Introduction

I had theorized in my previous post that it was possible to easily 3D print an enzyme structure given a Protein Data Bank (PBD) file. Now, a month and many iterations later, I was finally able to print restriction enzyme Fok1 [PDB ID:1fok], bound to a strand of DNA. I had not anticipated the problems of 3D-printing such a complex structure when I first started this expedition, I now have a very robust and simple way of printing these structures.

3D printed Fok1 in hand

Result

3D-printed Fok1

The printed model, Fok1 [PDB ID:1fok], is in Sculpteo’s white plastic, and is approximately 4.8cm x 4.0cm x 4.3cm. It has a slight coarse and grainy texture, but still retains tiny details like arrowheads on the model. One factor that surprised me is the model’s flexibility. Whereas my previous printed models were rigid, this printed protein strand has spring like behavior in certain areas and can withstand some serious stretching.

Method Summary

My goal was to optimize simplicity in the PDB-to-3D-model methodology. This means, I wanted minimal manual adjustment of vertices to the PDB render. On top of that, I wanted to print a ribbon style model, which was more challenging to print than a mesh based model, due to its thin components.

I started by rendering the Fok1 [PBD ID:1fok] in Chimera. Chimera conveniently can export the model in a STL format, perfect for 3D printing. Before exporting, I thickened the model to meet minimum thinness requirements. Finally, I uploaded the STL to Sculpteo to print.

Sculpteo Render

Fok1 model on Sculpteo

Method Details

1. Import PBD file into Chimera, via either a fetching of the file from PDB or a custom PDB file.

2. Select the chains containing DNA or substrate and hide the atom and bond models. The atom and bond models typically are too thin to be printed, without more finessing.

3. Select the remaining ribbon model and adjust the ribbon model attributes in Tools>Depiction>Ribbon Style Editor. The ribbon model, as is, is too thin to be printed. All attributes need to be thickened. The following is the setting I used for my Fok1 model.

Ribbon Style Editor Settings

4. Export the scene to an STL and then upload the model to Sculpteo for printing. Sculpteo offers a beautiful, almost-real-time printability check of the model, for fast design feedback turnaround times.

Caveats

The major problems I encountered was having structures too thin to be printed. I initially tried to print with Shapeways. However, the first ~10 iterations didn’t even pass their manual checking stage. Printing a wirely protein structure is definitely pushing the boundaries of 3D printing capabilities. I eventually decided to switch to Sculpteo because of their significantly faster turnaround times. Sculpteo was awesome and definitely delivered.

3D-Printed Enzyme – An Hypothesis

[updated: 3D-Printed Enzyme - Proof of Concept]

Introduction

While animating a short for the 2011 MIT iGem team, I came up with the idea to 3D print enzymes from the vast number of structure-characterized proteins in the RCSB Protein Data Bank (PDB). There are lots of slick software out there to render the PDB files into gorgeous 3D models. Exporting those models to be 3D-printing compatible is only a few clicks away.

Methods

The simplest approach is to use USCF Chimera to render a protein from PDB. Chimera can export the protein into an STL file, which can be uploaded to Shapeways or other 3D printing vendors to print.

While Chimera renders ribbon diagrams very beautifully, it lacks more sophisticated mesh-based renderings and user customization. Molecular Maya can be a good alternative. It harnesses all the customization power of Maya, while easily importing PDB files. To go the Molecular Maya route, proteins can be exported into OBJ files to upload to Shapeways. Currently, Molecular Maya does not render ribbon diagrams or secondary structure.

ecoRV

ecoRV rendered with mMaya [PDB ID: 1RVA]

Gallery

 

Graphviz + SBOLv1.0

Update:

Graphviz now has SBOLv 1.0 support!

I implemented the Synthetic Biology Open Language (SBOLv) symbols for Graphviz to easily generate SBOLv compliant diagrams.

see my original post on using Graphviz to draw genetic circuits

Gene Expression Symbols:

digraph G {
rankdir=LR;

promoter -> operator [arrowhead=none];
operator -> cds [arrowhead=none];
cds -> utr [arrowhead=none];
utr -> terminator [arrowhead=none];

promoter [shape=promoter labelloc="b"];
operator [shape=square width=0.2 label=""];
cds [shape=cds];
utr [shape=utr labelloc="b"];
terminator [shape=terminator labelloc="b"];

}

digraph G {
rankdir=LR;

insulator -> ribosite [arrowhead=none];
ribosite -> rnastab [arrowhead=none];
rnastab -> proteasesite [arrowhead=none];
proteasesite -> proteinstab [arrowhead=none];

insulator [shape=insulator label=""];
ribosite [shape=ribosite label="ribonuclease site" labelloc="b"];
rnastab [shape=rnastab label="rna stability" labelloc="b"];
proteasesite [shape=proteasesite label="protease site" labelloc="b"];
proteinstab [shape=proteinstab label="protein stability" labelloc="b"];

}

DNA Construction Symbols:

digraph G {
rankdir=LR;

origin -> primersite [arrowhead=none];
primersite -> restrictionsite [arrowhead=none];
restrictionsite -> noverhang [arrowhead=none];
noverhang -> assembly [arrowhead=none];

origin [shape=circle width=0.2 label=""];
primersite [shape=primersite label="primer site" labelloc="b"];
restrictionsite [shape=restrictionsite label="restriction site" labelloc="b"];
noverhang [shape=noverhang label=""];
assembly [shape=assembly label=""];

}

digraph G {
rankdir = LR;

fivepoverhang -> signature [arrowhead=none];
signature -> threepoverhang [arrowhead=none];

fivepoverhang [shape=fivepoverhang label=""];
signature [shape=signature];
threepoverhang [shape=threepoverhang label=""];

}

Download:

Download from the official Graphviz site

All the development snapshots newer than 2.29.20120924 should have SBOLv compliant symbols, as well as the non-SBOLv circuit symbols (lpromoter, rarrow, etc.).

Like before, I host an in-source build for Linux systems and a development build for MacOs 10.6.8.

Linux in-source build (built from Graphviz v2.29, lacks non-SBOLv circuit symbols):
Download

MacOs (Snow Leopard/10.6.8):
Download

Example Diagrams:


digraph a {
rankdir=LR;

subgraph cluster0 {
color=gray;
style=filled;
node [style=filled fillcolor=white];
a -> b [arrowhead=none];
b -> c [arrowhead=none];
}

a [shape=promoter label=""];
b [shape=cds label="rtTA"];
c [shape=terminator label=""];

Dox -> rtTA;
b -> rtTA;

subgraph cluster1 {
color=gray;
style=filled;
node [style=filled fillcolor=white];
d -> e [arrowhead=none];
e -> f [arrowhead=none];
}

d [shape=promoter label=""];
e [shape=cds label="Alfa"];
f [shape=terminator label=""];

rtTA -> d;
e -> Alfa;

subgraph cluster2 {
color=gray;
style=filled;
node [style=filled fillcolor=white];
g -> h [arrowhead=none];
h -> i [arrowhead=none];
}

g [shape=promoter label=""];
h [shape=cds label="Alfa"];
i [shape=terminator label=""];

Alfa -> g;
h -> Alfa;

}

digraph G {
rankdir=LR;

subgraph cluster0 {
color=gray;
style=filled;
node [style=filled fillcolor=white];
a -> b [arrowhead=none];
b -> c [arrowhead=none];
c -> d [arrowhead=none];
}

a [shape=promoter label=""];
b [shape=cds label="rtTA"];
c [shape=cds label="LacI"];
d [shape=terminator label=""];

Dox -> rtTA;
b -> rtTA;

subgraph cluster1 {
color=gray;
style=filled;
node [style=filled fillcolor=white];
e -> f [arrowhead=none];
f -> g [arrowhead=none];
g -> h [arrowhead=none];
}

e [shape=promoter label=""];
f [shape=cds label="Charlie"];
g [shape=cds label="Alfa"];
h [shape=terminator label=""];

rtTA -> e;

c -> LacI;
IPTG -> LacI [arrowhead=tee];

subgraph cluster2 {
color=gray;
style=filled;
node [style=filled fillcolor=white];
i -> j [arrowhead=none];
j -> k [arrowhead=none];
k -> l [arrowhead=none];
}

i [shape=promoter label=""];
j [shape=cds label="India"];
k [shape=cds label="Bravo"];
l [shape=terminator label=""];

LacI -> i [arrowhead=tee];

f -> Charlie;
g -> Alfa;
j -> India;
k -> Bravo;

subgraph cluster3 {
color=gray;
style=filled;
node [style=filled fillcolor=white];
m -> n [arrowhead=none];
n -> o [arrowhead=none];
o -> p [arrowhead=none];
}

m [shape=promoter label=""];
n [shape=cds label="Charlie"];
o [shape=cds label="Alfa"];
p [shape=terminator label=""];

n -> Charlie;
o -> Alfa;
Alfa -> m;

subgraph cluster4 {
color=gray;
style=filled;
node [style=filled fillcolor=white];
q -> r [arrowhead=none];
r -> s [arrowhead=none];
s -> t [arrowhead=none];
t -> u [arrowhead=none];
}

q [shape=promoter label=""];
r [shape=cds label="India"];
s [shape=cds label="Bravo"];
t [shape=cds label="EYFP"];
u [shape=terminator label=""];

r -> India;
s -> Bravo;
t -> EYFP;

subgraph cluster5 {
color=gray;
style=filled;
node [style=filled fillcolor=white];
v -> w [arrowhead=none];
w -> x [arrowhead=none];
}

v [shape=promoter label=""];
w [shape=cds label="Hotel"];
x [shape=terminator label=""];

Charlie -> v [arrowhead=tee];
w -> Hotel;

subgraph cluster6 {
color=gray;
style=filled;
node [style=filled fillcolor=white];
y -> z [arrowhead=none];
z -> aa [arrowhead=none];
}

y [shape=promoter label=""];
z [shape=cds label="Foxtrot"];
aa [shape=terminator label=""];

Hotel -> y [arrowhead=tee];
India -> y [arrowhead=tee];

z -> Foxtrot;

subgraph cluster7 {
color=gray;
style=filled;
node [style=filled fillcolor=white];
bb -> cc [arrowhead=none];
cc -> dd [arrowhead=none];
dd -> ee [arrowhead=none];
}

bb [shape=promoter label=""];
cc [shape=cds label="EBFP2"];
dd [shape=cds label="Foxtrot"];
ee [shape=terminator label=""];

Foxtrot -> bb;
dd -> Foxtrot;

cc -> EBFP2;

}

Caveats:

Color and labelling doesn’t work as nicely as the non-SBOLv symbols.

Labels that include a double-stranded DNA line, such as the promoter, has to have labelloc set to “b”. This is to avoid the intersection of the label and the double-stranded DNA line, since labels are automatically placed in the center of the nodes.


Hef1a [shape=promoter];


Hef1a [shape=promoter labelloc="b"];

Also, for the aforementioned shapes, colorfill has to be specified instead of color. color will color the outline of the node shape, potentially causing non-consistent colorations of the double-stranded DNA line between connected nodes.


Hef1a [shape=promoter labelloc="b" style=filled colorscheme=greens3 color=3];


Hef1a [shape=promoter labelloc="b" style=filled colorscheme=greens3 fillcolor=3];

Acknowledgements:

Thanks to Reshma Shetty and Jake Beal for the inspiration.

Graphviz

[updated: Graphviz + SBOLv1.0]

1. Introduction

Graphviz is a powerful open-source tool that visualizes graphs and networks.

By implementing a few meaningful shapes, I tapped its diagramming ability to draw genetic circuits. Genetic circuit diagrams now can be specified by a simple text file. Graphviz‘s visualization algorithms handles all placements and alignments. Only the node names and edges need to be specified. Below is an example of a genetic circuit generated by Graphviz.

I was inspired to create this tool after tediously piecing together cookie-cutter shapes for the 2011 iGEM competition. For our iGEM project, a lot of sub-circuits were recycled, without a good way of recycling their corresponding diagrams. With my new node shapes in Graphviz, specifying circuit modules is as easy as a few short lines of text. With a little scripting, Graphviz lends itself to rapid genetic circuit diagram generation, and become far more efficient and powerful than copy-and-paste methods or GUI-based tools.

an update of the project is available with SBOLv compliant symbols

2. Download

The custom genetic circuit shapes are lpromoter, larrow, rpromoter, and rarrow, each corresponding to the left and right promoter and left and right arrow shapes.

They have been incorporated into the official Graphviz node shapes.

Download Graphviz from the official site

As a backup, I also host two builds. First is a in-source build, ready-to-use for Linux. It’s the stable release of Graphviz 2.28  with the added shapes. Second is a development build of Graphviz 2.29 installable on MacOS. I’ve tested it on Mac OS 10.6.8, and it works.

Linux in-source build (modified Graphviz 2.28):
Download

MacOS/Snow Leopard build (Graphviz 2.29.20120828):
Download

3. Gallery

The following images are generated with the dot tool of Graphviz. Below the images are the contents of the text files used to generate those images.


digraph g {
rankdir=LR;

UAS -> NCAD [arrowhead=none];
UAS [shape=rpromoter];
NCAD [shape=rectangle];

}

digraph g {
rankdir=LR;

a -> b [arrowhead=none];
b -> c [arrowhead=none];
c -> d [arrowhead=none];
a [shape=rpromoter label="UAS"];
b [shape=rectangle label="LacI"];
c [shape=rpromoter label="UAS"];
d [shape=rectangle label="Reporter"];

}

digraph G {
rankdir=LR;

node [shape=rpromoter colorscheme=rdbu5 color=1 style=filled fontcolor=3]; Hef1a; TRE;
node [shape=rarrow colorscheme=rdbu5 color=5 style=filled fontcolor=3]; rtTA3; DeltamCherry;
product [shape=oval style=filled colorscheme=rdbu5 color=2 label=""];
node [shape=oval style=filled colorscheme=rdbu5 color=4 fontcolor=5];
combination [label="rtTA3 + Doxycycline"];
rtTA3protein [label="rtTA3"];

subgraph cluster_0 {
color=white;
Hef1a -> rtTA3 [arrowhead=none];
rtTA3 -> TRE [arrowhead=none];
TRE -> DeltamCherry [arrowhead=none];
}

rtTA3 -> rtTA3protein;
rtTA3protein -> combination;
Doxycycline -> combination;
combination -> TRE;
DeltamCherry -> product;

Hef1a [shape=rpromoter colorscheme=rdbu5 color=1 fontcolor=3 style=filled];
rtTA3 [shape=rarrow colorscheme=rdbu5 color=5 fontcolor=3 style=filled];
TRE [shape=rpromoter colorscheme=rdbu5 color=1 fontcolor=3 style=filled];
DeltamCherry [shape=rarrow colorscheme=rdbu5 color=5 fontcolor=3 style=filled label="Delta-mCherry"];

Doxycycline [style=filled colorscheme=rdbu5 color=4 fontcolor=5];
rtTA3protein [style=filled colorscheme=rdbu5 color=4 label="rtTA3" fontcolor=5];
combination [style=filled colorscheme=rdbu5 color=4 label="rtTA + Doxcycline" fontcolor=5];
product [style=filled colorscheme=rdbu5 color=2 label=""];
}

digraph g {
rankdir=LR;

node [shape=rpromoter colorscheme=rdbu5 color=1 style=filled fontcolor=3]; Hef1a; TRE; UAS; Hef1aLacOid;
Hef1aLacOid [label="Hef1a-LacOid"];
node [shape=rarrow colorscheme=rdbu5 color=5 style=filled fontcolor=3]; Gal4VP16; LacI; rtTA3; DeltamCherry;
Gal4VP16 [label="Gal4-VP16"];
product [shape=oval style=filled colorscheme=rdbu5 color=2 label=""];
repression [shape=oval label="LacI repression" fontcolor=black style=dotted];
node [shape=oval style=filled colorscheme=rdbu5 color=4 fontcolor=5];
combination [label="rtTA3 + Doxycycline"];
LacIprotein [label="LacI"];
rtTA3protein [label="rtTA3"];
Gal4VP16protein [label="Gal4-VP16"];

subgraph cluster_0 {
colorscheme=rdbu5;
color=3;
node [colorscheme=rdbu5 fontcolor=3];
Hef1a -> Gal4VP16 [arrowhead=none];
Gal4VP16 -> UAS [arrowhead=none];
UAS -> LacI [arrowhead=none];
LacI -> Hef1aLacOid [arrowhead=none];
Hef1aLacOid -> rtTA3 [arrowhead=none];
rtTA3 -> TRE [arrowhead=none];
TRE -> DeltamCherry [arrowhead=none]
}

Gal4VP16 -> Gal4VP16protein;
Gal4VP16protein -> UAS;
LacI -> LacIprotein;
LacIprotein -> repression;
repression -> Hef1aLacOid [arrowhead=tee];
IPTG -> repression [arrowhead=tee];
rtTA3 -> rtTA3protein;
rtTA3protein -> combination;
combination -> TRE;
Doxycycline -> combination;
DeltamCherry -> product;

}

5. Acknowledgment

Special thanks to my friends Robert McIntyre and Dylan Holmes for helping me to compile GraphViz.

easy as absee

1. Introduction

absee is a friendly ABIF reader in Ruby.

Three years ago, I desperately needed to analyze the trace values from DNA sequencing chromatograms (in the form of ABIF files). To my frustration, none of the available ABIF readers exported raw data. Even today, while lots of software are able to visualize ABIF files, very few allow for scripted inputs and custom manipulation of outputs. I want a ABIF reader that simply extracts the data and can be easily incorporated into other projects. Hence, I created absee.

absee is a Ruby gem. It has no GUI, no fluff. It simply reads the ABIF files and returns the values in six arrays, an array for each of the trace data for ACGT at discreet intervals, a called sequence, and an array of peak indexes corresponding to the called sequence.

% irb
>> require ‘absee’
=> true
>> readAB(“my_sequence.ab1″)

With a simple Ruby script, it can be incorporated to rapidly read and process many ABIF files and pipe the data for further downstream processing. absee is a very nifty tool, one that I wish I had three years ago. The above code works for versions less than 0.1.0.0.

[update: new version as a Ruby Module]

2. Background

ABIF is a binary file format, usually with an .ab1 extension. It contains a trace value for A, C, G, and T at each point for a interval. Most ABIF viewing software will interpolate those values at the points to display sinusoidal lines.

ABIF files also contain estimated bases and peak indexes. The way DNA sequencing extracts a sequence from from trace data is to use a base-calling algorithm. The base-calling algorithm will estimate a peak in the trace data and determine a called-base for the peak. If peaks from more than one trace overlap and their values are sufficiently close, the algorithm may use N to denote uncertainty of the base for that peak, and lower the quality score. The sequence of called-bases is the estimated DNA sequence corresponding to a chromatogram.

3.  Details

Converting from the ABIF binary files to readable values was no small feat. Even with its file format architecture ready, I still needed a little guidance. I found an open-source ABIF viewer years ago (now no longer available) and translated absee from its ABIF reader.

The primary method to call is readAB. It opens the ABIF file, checking the filetype and version. Major ABIF versions greater than 1 are not supported, due to possible different encodings. If the check fails, readAB will return six empty arrays.

readAB(filename)

  • parameters:
      filename: a string containing the filename (including the path and extensions)
  • returns:
      six arrays, which are trace data for A, C, G, T, called sequence, and peak indexes

There’s more documentation in absee‘s yardoc / RDoc, as well as the source code on github.

4. Source Code

The source code for version 0.0.2.3 can be found at the absee github repository.