Entries : Category [ PostgreSQL ]
PostgreSQL and other databases stuff
[OpenBSD]  [BSD]  [FreeBSD]  [Linux]  [Security]  [Python]  [Zope]  [Daily]  [e-shell]  [Hacks]  [PostgreSQL]  [OSX]  [Nintendo DS]  [enlightenment]  [Apache]  [Nintendo Wii]  [Django]  [Music]  [Plone]  [Varnish]  [Lugo]  [Sendmail]  [europython]  [Cherokee]  [self]  [Nature]  [Hiking]  [uwsgi]  [nginx]  [cycling]  [Networking]  [DNS] 

31 enero
2008

PostgreSQL saved my life (once more...)

or how things could be so easy...

Yes, once more, PostgreSQL saved my life. Let me show you how.

Imagine you have a database, there you have a table and inside that table you have a date field. You have populated the database and now you have more than 1000 records inside that table, something like:

FMJJ_test=# select id, apodo, nombre, fecha_nacimiento from beyle_autor where fecha_nacimiento ilike '%01/01%' order by fecha_nacimiento;
 id  | apodo |                         nombre                         | fecha_nacimiento
-----+-------+--------------------------------------------------------+------------------
 259 |       | XXXXXXXXXXXXXXXXXX                                     | 01/01/1602
 296 |       | XXXXXXXXXXXXXXXXXX                                     | 01/01/1807
 375 |       | XXXXXXXXXXXXXXXXXX                                     | 01/01/1838
 355 |       | XXXXXXXXXXXXXXXXXX                                     | 01/01/1849
 409 |       | XXXXXXXXXXXXXXXXXX                                     | 01/01/1849

Now imagine that, for some reasons, that fecha_nacimiento field has to change into a varchar field and that all those date entries should appear with only their year value (without month and day).

Probably the first thing you thought when reading the text above was:

"hey, that's pretty easy, just write a simple python/perl/php/whatever script that connects to the database, then get the records from the table, then parse them, the put them back into the table"

Well, you were right, and it would take a small ammount of time to write that script (taking care that you will need the language, the posgtresql connector for that given language and the knowledge to write it so quickly), but there is an even quicker way to do that, using PostgreSQL functions directly!

PostgreSQL have some nice already-included functions that will allow us to do some formatting as well as play with date and datetime objects.

So, let's do it, first use psql to connect to the database:

snowball:~ Wu$ psql test
Welcome to psql 8.1.5, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

test=#

NOTE: I'm using PostgreSQL 8.1.x in this example, but it will work with any other 8.x version too.

Now, we can use EXTRACT combined with to_date to do something like:

test=# update beyle_autor set fecha_nacimiento = extract(YEAR FROM to_date(fecha_nacimiento, 'DD MM YYYY'))  where fecha_nacimiento ilike '%01/01%';

There, we are telling PostgreSQL that it has to update the beyle_autor table, setting fecha_nacimiento as the YEAR part from a datetime object. We need to use to_date because the value we get from the database is already a varchar (aka character varying) and we need to convert it to a date/datetime value before passing it to EXTRACT.

Using that single query, all the records inside the table that match the given query (in that example I needed to keep complete dates that do not begin with 01/01, first of january), will be truncated and saved as only the year value. Easy, isn't it?

Let's check that it have work:

FMJJ_test=# select id, apodo, nombre, fecha_nacimiento from beyle_autor order by fecha_nacimiento;
 id  | apodo |                         nombre                         | fecha_nacimiento
-----+-------+--------------------------------------------------------+------------------
 259 |       | XXXXXXXXXXXXXXXXXX                                      | 1602
 296 |       | XXXXXXXXXXXXXXXXXX                                      | 1807
 375 |       | XXXXXXXXXXXXXXXXXX                                      | 1838
 355 |       | XXXXXXXXXXXXXXXXXX                                      | 1849
 409 |       | XXXXXXXXXXXXXXXXXX                                      | 1849

You could play a little bit with the functions separately, for example, you can get a date value from a varchar field:

test=# select to_date(fecha_nacimiento, 'DD MM YYYY') from beyle_autor where fecha_nacimiento ilike '%01/01%';
  to_date
------------
 1951-01-01
 1956-01-01
 1907-01-01
 1958-01-01
 1927-01-01
 1965-01-01

or you can get the YEAR value in a single query:

test=# select extract(YEAR FROM to_date(fecha_nacimiento, 'DD MM YYYY')) from beyle_autor where fecha_nacimiento ilike '%01/01%';
 date_part
-----------
      1951
      1956
      1907
      1958
      1927
      1965
      1923

Of course, this is only an example of the wonderful things you can get using internal PostgreSQL functions. Sometimes a little dig could avoid you endless time of no-so-useful scripting...

Posted by wu at 19:23 | Comments (0) | Trackbacks (0)
09 febrero
2009

PostgreSQL database migration, the SQL_ASCII to UTF8 problem

or how to get the proper encoding to fit your new database encoding...

Another reminder to myself, as I've faced this situation too many times (not too much lately, as I've switched definetely to UTF-8 in my database servers but I still have some old ones that fit perfectly for this post).

In order to migrate a database from one PostgreSQL server to another one (even if there is a version mismatch, for example, from the 7.x branch to the 8.x branch or between 8.1 and 8.3) all you need is pg_dump. With pg_dump you will be able to create a SQL dump of a whole database, which could be later imported into another database.

For example:

pg_dump openbsd_es > openbsd_es.sql

This way I've created a dump of a database called openbsd_es (used in the past for the defunct effort to keep the work on the .es translation of OpenBSD documentation). The dump was called openbsd_es.sql and I could import it in another PostgreSQL server, in a database with whatever name it has.

pg_dump has a number of options that will help you creating your database dump. For example, you can create a dump that will create full-insert sql statements using pg_dump this way:

pg_dump -D openbsd_es > openbsd_es.sql

(pretty useful if you need to load the sql file into MySQL, SQLite or any other database system)

Then, you only have to use psql to load the sql file back into another database:

psql openbsd_es_clon < openbsd_es.sql

At this point it is posible that you've see an error on screen, something like:

ERROR:  invalid byte sequence for encoding "UTF8": 0xf36e2020
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

AHA!, the fscked UTF-8 error!!!

This is caused by a mismatch in your database encodings, surely because the database from where you got the SQL dump was encoded as SQL_ASCII while the new one is encoded as UTF8.

You can get some information using psql with the -l option. In the old server:

psql -l

[ stripped ]

openbsd_es      | wu     | SQL_ASCII

[ stripped ]

In the new one:

psql -l

[ stripped ]

openbsd_es_clon | wu     | UTF8

[ stripped ]

Of course you can have databases with different encodings in the same PostgreSQL server, but it is not recommended (if your server data directory was created with UTF8 as its default encoding, you should stick to that for all of your databases).

So, get back to our sql file (openbsd_es.sql). We've created it from a SQL_ASCII encoded database, which results in something like:

[Fenris] ~> file openbsd_es.sql
openbsd_es.sql: ISO-8859 English text, with very long lines
[Fenris] ~>

As our new database is UTF8 encoded, it would be nice of we could change quickly the encoding of the sql file, just to import it without all those errors.

Here is where recode fits in this post. Recode is a small tool from the GNU project that let you change on-the-fly the encoding of a given file. For example, to change our sql file into a suitable UTF-8 encoding, we could call recode just like this:

cat openbsd_es.sql | recode iso-8859-1..u8 > openbsd_es_utf8.sql

And the new openbsd_es_utf8.sql file will be UTF-8 encoded:

[Fenris] ~> file openbsd_es_utf8.sql
openbsd_es_utf8.sql: UTF-8 Unicode English text, with very long lines
[Fenris] ~>

Just one more thing to do before importing the sql file into the new database, open the new UTF-8 encoded sql file with your favourite editor and change the line:

SET client_encoding = 'SQL_ASCII';

to:

SET client_encoding = 'UTF8';

as this setting will set the encoding for the psql command before executing the SQL statements within the sql file.

Now, you can import the sql file as usual with (hopefully) no errors this time:

psql openbsd_es_clon < openbsd_es_utf8.sql

(interesting: you can install recode from ports in both FreeBSD and OpenBSD, /usr/ports/converters/recode in both BSD flavours. In linux it seems that it will be installed by default in some distributions, install it from packages if not.)

Posted by wu at 23:45 | Comments (6) | Trackbacks (0)
06 marzo
2009

Accesing PostgreSQL objects information using SQL

... translating the usual psql shortcuts (\dt, \d table_name, etc)

Yesterday, one of my colleagues at work asked me an interesting question:

The answer was pretty easy:

That \d thing is a shortcut you can use to get information from any object located inside the db, while in psql (for those of you more familiar with MySQL, it is like the describe table_name command). Let me show you an example of \d table_name:

pastesite=# \d paste_source
                                    Tabla «public.paste_source»
   Columna   |           Tipo           |                       Modificadores
-------------+--------------------------+-----------------------------------------------------------
 id          | integer                  | not null default nextval('paste_source_id_seq'::regclass)
 lang_id     | integer                  | not null
 src         | text                     | not null
 name        | character varying(100)   | not null
 author      | character varying(150)   | not null
 pub_date    | timestamp with time zone | not null
 description | text                     | not null
Índices:
    «paste_source_pkey» PRIMARY KEY, btree (id)
   «paste_source_lang_id» btree (lang_id)
Restricciones de llave foránea:
    «paste_source_lang_id_fkey» FOREIGN KEY (lang_id) REFERENCES paste_lang(id) DEFERRABLE INITIALLY DEFERRED

pastesite=#

But I was assuming that he needed to know such information by himself, which was wrong. He needed to get the number of fields that view has, within some python code (aka not using psql).

We searched a little bit and finally we found how to do it.

When you go into psql, there is an option -E, which means (from the psql man page):

-E

--echo-hidden
       Echo the actual queries generated by \d and other backslash com-
       mands.  You  can  use  this to study psql's internal operations.
       This is equivalent to  setting  the  variable  ECHO_HIDDEN  from
       within psql.

So, calling psql with the -E option will means we will be able to see the SQL sentence PostgreSQL is using behind the scenes to get such information, in the previous example something like:

pastesite=# \d paste_source
********* QUERY **********
SELECT c.oid,
  n.nspname,
  c.relname
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname ~ '^(paste_source)$'
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 2, 3;
**************************

********* QUERY **********
SELECT relhasindex, relkind, relchecks, reltriggers, relhasrules,
relhasoids , reltablespace
FROM pg_catalog.pg_class WHERE oid = '168553'
**************************

********* QUERY **********
SELECT a.attname,
  pg_catalog.format_type(a.atttypid, a.atttypmod),
  (SELECT substring(pg_catalog.pg_get_expr(d.adbin, d.adrelid) for 128)
   FROM pg_catalog.pg_attrdef d
   WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef),
  a.attnotnull, a.attnum
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = '168553' AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum
**************************

********* QUERY **********
SELECT t.tgname, pg_catalog.pg_get_triggerdef(t.oid)
FROM pg_catalog.pg_trigger t
WHERE t.tgrelid = '168553' AND (not tgisconstraint  OR NOT EXISTS  (SELECT 1 FROM pg_catalog.pg_depend d    JOIN pg_catalog.pg_constraint c ON (d.refclassid = c.tableoid AND d.refobjid = c.oid)    WHERE d.classid = t.tableoid AND d.objid = t.oid AND d.deptype = 'i' AND c.contype = 'f'))   ORDER BY 1
**************************

********* QUERY **********
SELECT conname,
  pg_catalog.pg_get_constraintdef(oid, true) as condef
FROM pg_catalog.pg_constraint r
WHERE r.conrelid = '168553' AND r.contype = 'f' ORDER BY 1
**************************

********* QUERY **********
SELECT c.relname FROM pg_catalog.pg_class c, pg_catalog.pg_inherits i WHERE c.oid=i.inhparent AND i.inhrelid = '168553' ORDER BY inhseqno ASC
**************************

                                    Tabla «public.paste_source»
   Columna   |           Tipo           |                       Modificadores
-------------+--------------------------+-----------------------------------------------------------
 id          | integer                  | not null default nextval('paste_source_id_seq'::regclass)
 lang_id     | integer                  | not null src         | text                     | not null
 name        | character varying(100)   | not null
 author      | character varying(150)   | not null
 pub_date    | timestamp with time zone | not null
 description | text                     | not null
Índices:
    «paste_source_pkey» PRIMARY KEY, btree (id)
    «paste_source_lang_id» btree (lang_id)
Restricciones de llave foránea:
    «paste_source_lang_id_fkey» FOREIGN KEY (lang_id) REFERENCES paste_lang(id) DEFERRABLE INITIALLY DEFERRED

pastesite=#

Seems pretty complex, but just doing some digging you can get to something like:

pastesite=# select attname, attnum from pg_attribute, pg_class where attrelid = pg_class.oid and relname ='paste_source';
   attname   | attnum
-------------+--------
 tableoid    |     -7
 cmax        |     -6
 xmax        |     -5
 cmin        |     -4
 xmin        |     -3
 ctid        |     -1
 id          |      1
 lang_id     |      2
 src         |      3
 name        |      4
 author      |      5
 pub_date    |      6
 description |      7
(13 filas)

pastesite=#

Which is pretty much like the needs from my colleague. He could parse the results from such query to gather the information he needed or he could work a little bit more to get a more-refined sql sentence.

NOTE: My colleague needed a view information, I'm using a table as the example for this post, but the procedure would be pretty similar for a view, a sequence or any other type of object available in our db.

Posted by wu at 09:31 | Comments (0) | Trackbacks (0)
[1]