Wednesday, December 29, 2010

Bitmapset for PL/pgSQL

PostgreSQL has a nice functions for operations over bitmapset. A bitmapset is set of positive integers. It should be smaller than array, and, what is important for me. it support very fast test if some value is in set or not. PostgreSQL internally use this functions, but there are not possibility to use it from SQL (PL/pgSQL). I wrote a wrapper for this functionality and moved it to PST collection - http://pgfoundry.org/frs/download.php/2909/pstcoll-10-12-29.tgz. so there is some preview:
pavel=# select pg_column_size(pst.bitmapset '{1,2,3,4}');
 pg_column_size 
----------------
              8
(1 row)

pavel=# select pg_column_size(array[1,2,3,4]);
 pg_column_size 
----------------
             40
(1 row)

pavel=# select pst.add_members('{}', 1, 2, 4,8);
 add_members 
-------------
 {1,2,4,8}
(1 row)

pavel=# select pst.is_member(pst.add_members('{}', 1, 2, 4,8), 8);
 is_member 
-----------
 t
(1 row)

pavel=# select pst.bitmapset_union('{1,2,3}','{6,2,9}');
 bitmapset_union 
-----------------
 {1,2,3,6,9}
(1 row)
Without bitmapsets we have to use a arrays. But bitmaps are better for storing some flags - it's more adequate tool.
pavel=# select count(*) from omega;
  count  
---------
 1010000
(1 row)

pavel=# select bitmapset_collect(a) from omega;
    bitmapset_collect     
--------------------------
 {0,1,2,3,4,5,6,7,8,9,10}
(1 row)

Time: 558.667 ms

pavel=# select array_agg(distinct a) from omega;
        array_agg         
--------------------------
 {0,1,2,3,4,5,6,7,8,9,10}
(1 row)

Time: 3859.567 ms
Using a bitmapset is about 7x faster.
pavel=# select del_members('{2,3,4,5,1}',2,3);
 del_members 
-------------
 {1,4,5}
(1 row)

pavel=# select bitmapset_is_subset('{1,2,3}','{1,3}');
 bitmapset_is_subset 
---------------------
 f
(1 row)

pavel=# select bitmapset_overlap('{1,2,3}','{1,3}');
 bitmapset_overlap 
-------------------
 t
(1 row)

pavel=# select bitmapset_difference('{1,2,3}','{1,3}');
 bitmapset_difference 
----------------------
 {2}
(1 row)
Probably this simple wrapper can be enhanced - some GiST or GIN index can be nice. The bitmapset is limited only on integers. There is not simple way to use it together with enums for example.

Friday, December 10, 2010

Iteration over record in PL/pgSQL

Hello

A iteration over record in PL/pgSQL is old well known problem. When we try to write a general triggers in PL/pgSQL we can find a break, because we are not able to iterate over record. There was a some workarounds, but these solutions are simply slow and complex. So I wrote a set of functions that can helps.

First functions is record_expand. This function is similar to unnest function, but related object is record:

 
postgres=# select * from pst.record_expand(row('10',null,'Ahoj', current_date)); 
 name |   value    |   typ    
------+------------+--------- 
 f1   | 10         | unknown 
 f2   |            | unknown 
 f3   | Ahoj       | unknown 
 f4   | 2010-12-10 | date 
(4 rows) 
Now isn't problem to write general trigger for detecting a changed colums:
CREATE TABLE foo(a int, b int, c text, d int); 
 
CREATE OR REPLACE FUNCTIO update_trg_func() 
RETURNS trigger as $$ 
DECLARE r record; 
BEGIN 
  FOR r IN SELECT n.name, o.value as oldval, n.value as newval 
              FROM pst.record_expand(new) n, 
                   pst.record_expand(old) o 
             WHERE n.name = o.name 
               AND n.value IS DISTINCT FROM o.value 
  LOOP  
    RAISE NOTICE '% % %', r.name, r.oldval, r.newval; 
END LOOP; 
RETURN NULL; 
END; 
$$ LANGUAGE plpgsql; 
 
CREATE TRIGGER update_trg 
  AFTER UPDATE ON foo 
  FOR EACH ROW EXECUTE PROCEDURE update_trg_func(); 
 
postgres=# UPDATE foo SET c = 'Pavel', a = 40; 
NOTICE:  a 30 40 
NOTICE:  c omega Pavel 
UPDATE 1 
Next function allows update of any dynamic record. This function is record_set_fields: It's a variadic functions - you can enter a fields and values to change:
postgres=# SELECT pst.record_set_fields(row(0,0,'',0)::foo, 'd', 100, 'c', 'Hello');
  record_set_fields
-------------------
  (0,0,Hello,100)
It can be used for dynamic initialization of wide tables - for some OLAP purposes:
  
 CREATE TABLE t(a0 int, a1 int, a2 int, a3 int, a4 int, a5 int); 
 
-- set all null fields ax on value -1000 
 
CREATE OR REPLACE FUNCTION insert_trg_func()
RETURNS trigger as $$
DECLARE name text;
BEGIN
   FOR name IN SELECT x.name 
                  FROM pst.record_expand(new) x 
                 WHERE x.value IS NULL AND x.name LIKE 'a%'
   LOOP  
     new = pst.record_set_fields(new, name, -1000);
   END LOOP; 
   RETURN new;
END; 
$$ LANGUAGE plpgsql;  

CREATE TRIGGER insert_trg 
   BEFORE INSERT ON t 
   FOR EACH ROW EXECUTE PROCEDURE insert_trg_func(); 
 
postgres=# INSERT INTO t(a3,a5) VALUES(100,100);
INSERT 0 1
postgres=# SELECT * FROM t;
  a0   |  a1   |  a2   | a3  |  a4   | a5  
-------+-------+-------+-----+-------+-----
 -1000 | -1000 | -1000 | 100 | -1000 | 100
(1 row)
Last function is record_get_field. Its returns a value of entered field.
CREATE OR REPLACE FUNCTION insert_trg_func() 
RETURNS TRIGGER AS $$ 
DECLARE name text; 
BEGIN
  FOR i IN 0..5 LOOP
    IF pst.record_get_field(new, 'a'||i) IS NULL THEN
      new := pst.record_set_fields(new, 'a'||i, -1000); 
    END IF; 
  END LOOP; r
  RETURN new; 
END; 
$$ LANGUAGE plpgsql; 
These package is available from pgfoundry http://pgfoundry.org/frs/shownotes.php?release_id=1749.