biojava.github.io/_site/wikis/BioJava:CookBook:BioSQL:SetupPostGre.html at third · biojava/biojava.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
<h2 id="installing-and-using-biosql">Installing and using BioSQL</h2>

<p>by <a href="User:David" title="wikilink">David Huen</a>, Last modified: 18th June 2003.</p>

<p>This document describes how to install and use Biosql.
<a href="http://www.biojava.org/download/biosql/">BioSQL</a> is a part of the
<a href="http://obda.open-bio.org/">OBDA</a> standard and was developed as a common
sequence database schema for the different language projects within the
<a href="http://www.open-bio.org/">Open Bioinformatics Foundation</a>.</p>

<p>While BioSQL is fairly vendor-neutral in its design, this tutorial is
based on the case that I know best, that is, the installation of BioSQL
on an x86 machine running RedHat 7.2. Installing Postgresql</p>

<p>If not already installed, PostgreSQL can be installed from RPMs with:</p>

<div class="highlighter-rouge"><pre class="highlight"><code>rpm -ivh postgresql-7.2.1-5.i386.rpm \
postgresql-libs-7.2.1-5.i386.rpm \
postgresql-server-7.2.1-5.i386.rpm
</code></pre>
</div>

<p>Root privileges will almost certainly be required (if not your machine
is seriously insecure!!!). You will also need a JDBC to permit Java to
connect to your PostgreSQL database and that can be installed with
postgresql-jdbc-7.1.3-2.i386.rpm. However, I would recommend downloading
the latest from here. You will end up with a jar file containing the
JDBC implementation which you will need to place in your CLASSPATH.</p>

<p>The installs will place a control script within /etc/init.d named
postgresql. When this script runs for the first time, it will create a
database cluster and initialise it. This cluster is the set of files
used by the database for storage purposes.</p>

<p>On RH7.2 the default location for the cluster in at /var/lib/pgsql/.
This is a bit of a disadvantage as /var is usually a pretty small
partition. It is possible at this stage to symlink /var/lib/pgsql to a
directory within another partition altogether to circumvent this
problem. I would suggest doing this immediately.</p>

<p>At this stage, you will need to create the database you intend using and
a user to use it. I would suggest NOT using the superuser named postgres
for anything other than occasional essential administration.</p>

<p>At this point, I will digress briefly into PostgreSQL authentication as
choices you make will affect what you can do. PostgreSQL has a variety
of routes to achieve this. The default at installation permits
connection only from local users and permits access to a database ONLY
by a user of the same username. This may be quite adequate for
experimentation but not so convenient if you want to set up a BioSQL
database for several local users or possibly even remote users.</p>

<p>PostgresQL has other mechanisms which are described in their
<a href="http://www.postgresql.org/idocs/index.php">documentation</a>.
Authentication is specifically described
<a href="http://www.postgresql.org/idocs/index.php?client-authentication.html">here</a>.
You might consider password authentication but do use md5 encryption
with this option, especially if you intend to authenticate remote users.
In the Redhat 7.2 installation, the file you will need to edit to set
these options is /var/lib/pgsql/data/pg_hba.conf. The location of this
file varies with other distributions.</p>

<p>As initially installed in RH7.2, PostgreSQL will require root privileges
to set up further. The postgres superuser cannot be logged into but you
can invoke the necessary commands from root to execute:</p>

<p><code class="highlighter-rouge">$ su postgres -c 'createdb </code><insert db="" name="" here="">`'`</insert></p>

<p>and a user created with:</p>

<p><code class="highlighter-rouge">$ su postgres -c 'createuser </code><insert user="" name="" here="">`'`</insert></p>

<p>For the purposes of this tutorial, I will not change the default
authentication so the database name should be chosen to correspond to
your user name. The user name used in this exercise is gadfly and this
will be reflected in the choice of database name and user name. One
additional change that will be necessary is to enable TCP/IP connections
as the Unix domain socket restriction of the default installation is
incompatible with the PostgreSQL JDBC implementation.</p>

<p>To do so, you need to add the “-i” flag to the startup script. Edit
/etc/init.d/postgresql and change the line:</p>

<div class="highlighter-rouge"><pre class="highlight"><code>su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl -D $PGDATA -p /usr/bin/postmaster start  &gt; /dev/null 2&gt;&amp;1" &lt; /dev/null
</code></pre>
</div>

<p>to:</p>

<div class="highlighter-rouge"><pre class="highlight"><code>su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl -o "-i" -D $PGDATA -p /usr/bin/postmaster start  &gt; /dev/null 2&gt;&amp;1" &lt; /dev/null
</code></pre>
</div>

<p>The /var/lib/pgsql/data/pg_hba.conf file will also need to be edited to
permit access via TCP/IP. This can be achieved by uncommenting:-</p>

<div class="highlighter-rouge"><pre class="highlight"><code>#host       all         127.0.0.1     255.255.255.255    trust
</code></pre>
</div>

<p>Both these operations require root access: seek advice as to the best
option given your local security circumstances.</p>

<p>One additional change is that postgresql in RH7.3 does not come with the
pgsql language enabled. As BioSQL uses that for acceleration, you will
need to enable it. This can be done within root with:-</p>

<div class="highlighter-rouge"><pre class="highlight"><code>su postgres -c 'createlang plpgsql template1'
</code></pre>
</div>

<h3 id="installing-biosql">Installing BioSQL</h3>

<p>The PostgreSQL server must be running to complete the BioSQL
installation. You can check that it is with:</p>

<p><code class="highlighter-rouge">$ /etc/rc.d/postgresql status</code></p>

<p>and doing:</p>

<p><code class="highlighter-rouge">$ /etc/rc.d/postgresql start</code></p>

<p>if it is not running. You may require root privileges for this. You
should have PostgreSQL started up during system startup with the SysV
init system that comes with most Unixen.</p>

<p>You will need three scripts that serve to initialise the new database
with the BioSQL schema and load accelerators for this schema. These
are:-</p>

<p><code class="highlighter-rouge">biosql-accelerators-pg.sql</code><br />
<code class="highlighter-rouge">biosqldb-assembly-pg.sql</code><br />
<code class="highlighter-rouge">biosqldb-pg.sql</code></p>

<p>They may be obtained from
<a href="http://www.biojava.org/download/biosql/">here</a>.</p>

<p>We now need to load the schema into the database we have created. We do
so as follows (user entries in bold):</p>

<div class="highlighter-rouge"><pre class="highlight"><code>$ psql gadfly
Welcome to psql, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

gadfly=&gt; \i biosqldb-pg.sql
CREATE
psql:biosqldb-pg.sql:13: NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index 'biodatabase_pkey' for table 'biodatabase'
CREATE
&lt;rest of output snipped&gt;
INSERT 16862 1
psql:biosqldb-pg.sql:304: NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index 'cache_corba_support_pkey' for table 'cache_corba_support'
CREATE
gadfly=&gt; \i biosqldb-assembly-pg.sql
&lt;rest of output snipped&gt;
gadfly=&gt; \i biosql-accelerators-pg.sql
&lt;rest of output snipped&gt;
gadfly=&gt; \q

$
</code></pre>
</div>

<p>Let’s walk through the session above. psql is the name of the PostgreSQL
interactive shell. We invoke it to connect to the PostgreSQL server and
accept commands for a database named gadfly that we had created earlier.
psql starts and displays its user prompt. All psql commands begin with a
backslash (\). The \i instructs psql to take input from a file. I
instruct psql to take input from the biosqldb-pg.sql,
biosqldb-assembly-pg.sql and biosql-accelerators-pg.sql successively.
psql reads the SQL statements within each of the files and proceeds to
construct the BioSQL database schema, printing out a summary of its
actions as it proceeds. Finally, I quit the psql interactive shell with
\q. At this point you have a BioSQL schema installed and ready to
run!!!</p>

<p>Do remember that if you do not explicitly load the JDBC drivers in your
code, you should set a Java environment variable to tell it what to look
for like so:-</p>

<p><code class="highlighter-rouge">java -Djdbc.drivers=org.postgresql.Driver </code><whatever your="" java="" code="" is="">` `</whatever></p>

<p><em>NOTE: If you are using the 1.3 version of Biojava with the Singapore
schema, do not install biosqldb-assembly-pg.sql or
biosql-accelerators-pg.sql as described above. All you will need is the
the new
<a href="http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/sql/?cvsroot=biosql">biosqldb-pg.sql</a>.
There appear to be performance issues in some cases when the other stuff
is installed also. This note will be updated eventually to reflect this
advice.</em></p>