-
Notifications
You must be signed in to change notification settings - Fork 16
Expand file tree
/
Copy pathBioJava3_NCBISequenceReader_Design.html
More file actions
108 lines (84 loc) · 3.97 KB
/
BioJava3_NCBISequenceReader_Design.html
File metadata and controls
108 lines (84 loc) · 3.97 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
<h2 id="introduction">Introduction</h2>
<p>The <strong><em>NCBISequenceReader</em></strong> class, part of the biojava-core project,
retrieves data from the <a href="http://www.ncbi.nlm.nih.gov/">NCBI</a> website
using the <a href="http://eutils.ncbi.nlm.nih.gov/">eutils</a> via a HTTP GET
request. The SOAP interface was not pursued due to advice from other
contributors to this project.</p>
<h2 id="design-overview">Design Overview</h2>
<p>The NCBISequenceReader class implements the ProxySequenceReader
interface which is in turn a subclass of the SequenceReader and Sequence
interfaces.</p>
<p><img src="Ncbisequencereader.png" alt="" title="Ncbisequencereader.png" /></p>
<p>The NCBIHelper class performs most of the heavy lifting, connecting to
the NCBI database and building the URL as well as parsing the returning
data.</p>
<h3 id="ncbisequencereader">NCBISequenceReader</h3>
<p>This class stores only that information which is needed in order to
fulfil the interface contract and behave in line with the other classes
which implement the ProxySequenceReader interface. The implementation of
the connectivity to and interpretation of the NCBI resource is delegated
to the NCBIHelper class; which is a dependancy of this class.</p>
<p>The constructor takes a simple string as the only argument, this string
is the ID of the nucleotide sequence which is to be fetched.</p>
<p>All loading of sequence data is performed lazily at the behest of the
caller, there have with some consideration given to the minimum length a
retrieved sequence can be.</p>
<h3 id="ncbihelper">NCBIHelper</h3>
<p>Ideally, the NCBIHelper class would be made a more generic class which
could read from any configured URL resource but since I am only aware of
the one source at present a hard-coded solution has been provided for
the time being.</p>
<p>Since the NCBIHelper is connecting to a remote site using the
HttpConnection java classes there are several exceptions which can be
thrown. Additionally, there are also potential failures even if a
successful connection is made (404, 403 and invalid sequence etc). This
is currently handled internally by the class and some information logged
but it does not provide a useful mechanism to the caller about what has
happened. Is the problem fatal? Is it a timeout?</p>
<p>In order to return something meaningful in the case of an error a new
exception has been added to the code I have developed so far, the
SequenceException method, in the case of a fatal problem the exception
will be caught and wrapped in a SequenceException and then re-thrown
allowing the caller can catch it, and if necessary inspect it.</p>
<h2 id="design-issues">Design Issues</h2>
<p>The following issues have come up during the implementation and need to
be resolved.</p>
<h3 id="exceptions">Exceptions</h3>
<p>The class inherits the methods <strong>getSequence()</strong> and
<strong>getSequenceAsString(Integer start, Integer end, Strand strand)</strong> from
the <strong><em>AbstractSequence</em></strong> since these methods need to establish a
connection to the NCBI website there are going to be times when the
connection will fail, an invalid nucleotide identifier is passed in to
the constructor or any other host of issues.</p>
<p>Ideally, the methods should throw a <strong><em>SequenceException</em></strong> which will
be a generic exception, wrapping any exceptions generated by the
implementing classes.</p>
<h3 id="logging">Logging</h3>
<p>using log4j would be very useful!</p>
<h2 id="testing">Testing</h2>
<p>The table below shows a summary of the unit tests created for the
NCBISequenceReader class.</p>
<table>
<thead>
<tr>
<th>Test Name (method)</th>
<th>Description</th>
<th>Seed data</th>
<th>Requires network (Y/N)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Cell 1</td>
<td>Cell 2</td>
<td>Cell 3</td>
<td>Cell 4</td>
</tr>
<tr>
<td>Cell A</td>
<td>Cell B</td>
<td>Cell C</td>
<td> </td>
</tr>
</tbody>
</table>