quinta-feira, 10 de novembro de 2011

NoSQL: onde, como e por quê? Cassandra e MongoDB



Segue material da apresentação que fiz no dia 10/11 no Seminário de Gerenciamento de Dados em Software Livre no SERPRO.


Visão geral sobre bancos de dados NoSQL e detalhes técnicos dos modelos de dados e arquiteturas das implementações Apache Cassandra e MongoDB.


quarta-feira, 19 de outubro de 2011

A Linux trick on Oracle setup

Have you ever tried to install Oracle Database on GNU/Linux and faced the insufficient memory issue below?

Oracle Database 10g Express Edition requires 1024 MB of swap space. This system has ??? MB of swap space.

Well, I've just been caught by that when attempting to set it up on a virtual machine.

# dpkg -i oracle-xe-universal_10.2.0.1-1.1_i386.deb
(Reading database ... 125364 files and directories currently installed.)
Unpacking oracle-xe-universal (from oracle-xe-universal_10.2.0.1-1.1_i386.deb) ...
This system does not meet the minimum requirements for swap space.  Based on 
the amount of physical memory available on the system, Oracle Database 10g 
Express Edition requires 1024 MB of swap space. This system has 1019 MB 
of swap space.  Configure more swap space on the system and retry the installation.
dpkg: error processing oracle-xe-universal_10.2.0.1-1.1_i386.deb (--install):
 subprocess new pre-installation script returned error exit status 1
Errors were encountered while processing:
 oracle-xe-universal_10.2.0.1-1.1_i386.deb

Fortunately a true Operating System can handle it seamlessly! Here are the steps I followed.

Setup is claiming about just 5 megabytes... Well, let's give him it. But no more and no less! Let's create a zeroed dummy file with dd:

# dd if=/dev/zero of=5m.swp bs=1M count=5
5+0 records in
5+0 records out
5242880 bytes (5.2 MB) copied, 0.0132488 s, 396 MB/s

Here you go, an exactly 5 MB size zeroed file:

# ls -la 5m.swp 
-rw-r--r-- 1 root root 5242880 2011-10-19 10:39 5m.swp

Let's check out the available memory in the system using free:

# free
             total       used       free     shared    buffers     cached
Mem:       1026080     921612     104468          0      27036     674936
-/+ buffers/cache:     219640     806440
Swap:      1046524       2552    1043972

Alright, you need it, we'll give you:

# swapon 5m.swp

Are you satisfied now?

# free
             total       used       free     shared    buffers     cached
Mem:       1026080     921736     104344          0      27044     674936
-/+ buffers/cache:     219756     806324
Swap:      1051640       2552    1049088

Let's retry running Oracle installer:

# dpkg -i oracle-xe-universal_10.2.0.1-1.1_i386.deb
(Reading database ... 125364 files and directories currently installed.)
Unpacking oracle-xe-universal (from oracle-xe-universal_10.2.0.1-1.1_i386.deb) ...

That's it! Although that ridiculous additional 5 MB swap won't be automatically available for the next reboot, we still can tune Oracle to allocate less memory than standard settings.

quinta-feira, 13 de outubro de 2011

Générer des fichiers checksum pour Maven

Dans un répertoire Maven v2 les fichiers JAR ont l'intégrité contrôlée par des fichiers contenant des sommes MD5 et SHA1. En général ces fichiers auxiliaires sont appelés comme le nom du fichier JAR accompagnés des extensions ".md5" ou ".sha1". Par exemple, pour la librairie "jcommon-0.9.6.jar" il doit exister "jcommon-0.9.6.jar.md5" et "jcommon-0.9.6.jar.sha1".

Ces ".md5" et ".sha1" ne sont que des fichiers texte avec une chaîne string correspondant à la somme de contrôle calculée à partir de l'archive. Cette somme représente une signature unique pour chaque fichier et peut garantir qu'il ne soit pas craqué ou corrompu.

Pour générer ces fichiers au moment de la livraison d'une librairie Java, ils existent des plugins Maven pour les créer automatiquement. Néanmoins, on peut avoir le cas où des librairies JAR n'aient pas ses fichiers de contrôle (par exemple, si on a créé cette partie du répo manuellement). Cette façon, lors d'une résolution de dépendance au Maven, on va avoir des notifications telles qu'au-dessous :

[WARNING] *** CHECKSUM FAILED - Checksum failed on download

Pour illustrer ce soucis, voici un exemple d'un arbre partiel au Maven :

|-- jfree-former
|   |-- jcommon
|   |   `-- 0.9.6
|   |       |-- jcommon-0.9.6.jar
|   |       `-- jcommon-0.9.6.pom
|   `-- jfreechart
|       `-- 0.9.21
|           |-- jfreechart-0.9.21.jar
|           `-- jfreechart-0.9.21.pom

Il faut d'abord créer un script Shell appelée checksum.sh avec le contenu ci-dessous :

#!/bin/bash

if [ $# -ne 2 ]
then
	echo "Usage : checksum [md5|sha1] <file-name>"
	echo "Sample: checksum sha1 /tmp/dir/myfile.jar"
	exit 1
fi

format=$1
file="$2"

if [ ! -f $file ]
then
	echo "File not found: $file"
	exit 2
fi

if [ "$format" == "md5" -o "$format" == "sha1" ]
then
	${format}sum "$file" | cut -d' ' -f1 | tr -d "\n" > "$file.$format"
else
	echo "Please choose a format: md5 or sha1"
	exit 2
fi

echo "Created checksum file $file.$format"

Ensuite, on doit créer un autre script nommé generate-checksums.sh contenant ces lignes :

#!/bin/bash

EXTENSIONS="jar pom"
ALGORITHMS="sha1 md5"

PROGDIR=`dirname $0`
export PATH="$PATH:$PROGDIR"

for ext in $EXTENSIONS
do
	for alg in $ALGORITHMS
	do
		find -type f -name "*.$ext" -exec checksum.sh $alg {} \;
	done
done

N'oubliez pas de donner des permissions d'exécution à ses fichiers .sh, en roulant le commande chmod -x *.sh.

Maintenant, il faut seulement aller au répertoire désiré et ensuite exécuter le script generate-checksums.sh. Voyez :

$ cd /var/maven/repo/

$ /home/user/scripts/generate-checksums.sh
Created checksum file ./jfree-former/jcommon/0.9.6/jcommon-0.9.6.jar.sha1
Created checksum file ./jfree-former/jfreechart/0.9.21/jfreechart-0.9.21.jar.sha1
Created checksum file ./jfree-former/jcommon/0.9.6/jcommon-0.9.6.jar.md5
Created checksum file ./jfree-former/jfreechart/0.9.21/jfreechart-0.9.21.jar.md5
Created checksum file ./jfree-former/jcommon/0.9.6/jcommon-0.9.6.pom.sha1
Created checksum file ./jfree-former/jfreechart/0.9.21/jfreechart-0.9.21.pom.sha1
Created checksum file ./jfree-former/jcommon/0.9.6/jcommon-0.9.6.pom.md5
Created checksum file ./jfree-former/jfreechart/0.9.21/jfreechart-0.9.21.pom.md5

Et voici le résultat pour le cas d'exemple :

|-- jfree-former
|   |-- jcommon
|   |   `-- 0.9.6
|   |       |-- jcommon-0.9.6.jar
|   |       |-- jcommon-0.9.6.jar.md5
|   |       |-- jcommon-0.9.6.jar.sha1
|   |       |-- jcommon-0.9.6.pom
|   |       |-- jcommon-0.9.6.pom.md5
|   |       `-- jcommon-0.9.6.pom.sha1
|   `-- jfreechart
|       `-- 0.9.21
|           |-- jfreechart-0.9.21.jar
|           |-- jfreechart-0.9.21.jar.md5
|           |-- jfreechart-0.9.21.jar.sha1
|           |-- jfreechart-0.9.21.pom
|           |-- jfreechart-0.9.21.pom.md5
|           `-- jfreechart-0.9.21.pom.sha1

Voilà ! Désormais votre répertoire Maven contient des informations de vérification pour les fichiers et la résolution de dépendances ne lancera plus le message "CHECKSUM FAILED".

Si vous voulez, le code source complet peut être obtenu dans cette adresse : https://gitorious.org/shell-scripts/checksum

terça-feira, 4 de outubro de 2011

Mantendo atualizado o índice FTS no PostgreSQL



Dando sequência ao post anterior "Full Text Search em português no PostgreSQL", veremos como manter atualizado o índice de busca textual, particularmente no caso de os dados da tabela sofrerem muitas atualizações (i.e., inclusões e modificações).


Só para relembrar, o diagrama a seguir ilustra o funcionamento do mecanismo de Full Text Search no PostgreSQL, em que fazem parte processos como parser, normalizador e indexador:


Para exemplificar o problema apresentado no início desse post, considere que a tabela MUNICIPIOS esteja devidamente populada, tendo a coluna de suporte para a busca textual criada e o índice do tipo GIN (ou GiST) associado. Sendo assim, podemos fazer consultas SQL desse tipo:

SELECT nome, uf FROM municipios
WHERE busca @@ plainto_tsquery(simples('agua lindoia'));

Veja o resultado:

curso=# SELECT nome, uf FROM municipios
curso-# WHERE busca @@ plainto_tsquery(simples('agua lindoia'))
       nome       | uf 
------------------+----
 Águas de Lindóia | SP
(1 registro)

O que acontece agora se incluirmos um novo registro na tabela? Veja:

INSERT INTO municipios (codigo, nome, uf)
VALUES (100, 'Águas de Lindóia do Sul', 'RS');

Ao executarmos a busca anterior, o município recém-incluído não aparecerá... Isso porque a coluna de suporte à busca não foi atualizada. Veja o conteúdo dela:

curso=# SELECT nome, uf, busca FROM municipios WHERE codigo = 100;
          nome           | uf | busca 
-------------------------+----+-------
 Águas de Lindóia do Sul | RS | 
(1 registro)

Para manter essa coluna atualizada automaticamente em operações de INSERT ou UPDATE, precisamos de um disparador (trigger). No PostgreSQL criamos uma função em linguagem procedural (geralmente em PL/pgSQL) e em seguida a associamos a um disparador.

Sendo assim, crie a função de trigger com o código abaixo:

CREATE FUNCTION municipios_trigger()
RETURNS trigger AS $$
begin
  new.busca := to_tsvector(simples(new.nome));
  return new;
end
$$ LANGUAGE plpgsql;

Ao ser chamada, ela fará com que o conteúdo da coluna "busca" seja preenchido com o texto gerado da normalização da coluna "nome".

Em seguida, crie o disparador de atualização:

CREATE TRIGGER municipios_tsupdate
BEFORE INSERT OR UPDATE ON municipios
FOR EACH ROW EXECUTE PROCEDURE municipios_trigger();

Ou seja, antes de cada INSERT ou UPDATE na tabela, a função municipios_trigger() será invocada para preencher automaticamente a coluna "busca".

Agora experimente modificar aquele registro inserido acima:

UPDATE municipios SET uf = uf WHERE codigo = 100;

Observe como ficou o conteúdo daquela linha:

curso=# SELECT nome, uf, busca FROM municipios WHERE codigo = 100;
          nome           | uf |           busca           
-------------------------+----+---------------------------
 Águas de Lindóia do Sul | RS | 'agu':1 'lindo':3 'sul':5
(1 registro)

Finalmente efetue a busca textual que havia falhado:

curso=# SELECT nome, uf FROM municipios
curso-# WHERE busca @@ plainto_tsquery(simples('agua lindoia'));
          nome           | uf 
-------------------------+----
 Águas de Lindóia do Sul | RS
 Águas de Lindóia        | SP
(2 registros)

Tente agora fazer uma modificação no nome desse município:

UPDATE municipios SET nome = 'Águas Quentes do Sul' WHERE codigo = 100;

E então refaça a busca considerando o novo nome:

curso=# SELECT codigo, nome, uf FROM municipios
WHERE busca @@ plainto_tsquery(simples('agua quente'));
 codigo |         nome         | uf 
--------+----------------------+----
    100 | Águas Quentes do Sul | RS
(1 registro)


Índice sempre up-to-date! Muito fácil, né? :D

Referências


[1] PostgreSQL 8.3 Documentation - Full Text Search

quinta-feira, 8 de setembro de 2011

Transferring photos to Nintendo 3DS

 

Yesterday my daughter was playing with her Nintendo 3DS console and then asked me to put in there some pictures I had in the computer.

I thought that should be fairly easy, as the system is provided with a regular SD card and I already copied there some MP3 files which worked successfully.

Well, after connecting the SD card into my laptop, I just started browsing its folders and quickly found some named with the pattern "199NIN03" inside DCIM. They were there: files with JPG and MPO extensions. The latter is actually a twice-JPG, the so-called "3D picture" taken with the console's dual camera.

My first attempt was to create a new directory with the name I'd like and start copying JPG files in there. Unmounted the drive in Linux, put it on the console, start it and go to its photo browser application. Nothing!

Yes, I had previously read the f*cking manual [RTFM], but I've found nothing about transferring pictures from a computer to Nintendo 3DS. Indeed the way back is possible (i.e., to copy files from the console to a PC).

What do we do in these cases? We start gooooogleing! \o/

There were not so much entries, but this post was bulls-eye: http://techforums.nintendo.com/message/33675

"I got my N3DS the day after its release, and I've been enjoying it except there's nothing telling me how to put pictures from a PC to the SD card. I looked around and only found a guide on putting music on my SD card. Help appreciated."

Okey-dokey! That guy is sharing my sufferings. Look what the "expert" user answered:

"Sorry, but there isn't an easy way to transfer pictures from a PC to the 3DS, as the pictures would need to be in the exact same format, size and have all the information on them as if the 3DS had actually taken the picture. And it is the same as with the DSi and XL, so really the only easy work around would be to view the picture on a computer monitor and take its picture with the 3DS. ;)"

OMG, I didn't believe what I've read. To take pictures of the computer screen with Nintendo 3DS? It just can't be true, what a noob's advice!

Now that sounded like challenge for me! :D

My next step was to analyze the files taken with 3DS, as they were perfectly visible in its browsing tool. That's what I realized from the files:

1. they are named with the pattern "HNI_9999.JPG" and "HNI_9999.MPO" (when taken "in 3D")
2. their resolution was 640x480
3. their average size was 50kB
4. they had several fields in JPEG's header: camera make and model, and date/time taken
5. file command gave this output: HNI_0018.JPG: JPEG image data, EXIF standard 2.2, baseline, precision 0, 4360x480

Then I started hacking around with these f*cking procedures: plug SD card into PC, analyze and copy files, remove SD, plug it into Nintendo 3DS, and... nothing appeared! First I tried renaming folders and filenames according to the patterns the console was expecting. Second I tried to rescale the pictures to 640x480, by using ImageMagick (see my other post):

$ convert -scale 640x480 -quality 85 source/HNI_0018.JPG destin/HNI_0018.JPG

Yet nothing! I wondered the issue was with the JPEG's header. Then I asked Debian's APT for some magical tool designed to handle those cr*ppy header fields... Among several options, I chose jhead. Indeed I found jhead wonderful, so that I could easily manipulate a JPEG's EXIF header from the Linux shell.

When issuing jhead onto the original 3DS file, I had this:

$ jhead HNI_0018.JPG
File name : HNI_0018.JPG
File size : 49610 bytes
File date : 2011:09:08 20:20:09
Camera make : Nintendo
Camera model : Nintendo 3DS
Date/Time : 2011:09:05 14:31:20
Resolution : 640 x 480

So I found an interesting option on jhead: it can copy a JPEG's file header into another file! That's what I did:

$ jhead -te HNI_0018.JPG HNI_0030.JPG

Well, now at least my desired JPEG file had the proper and expected header, right? F*cking cycle again and... nothing! The console was still not recognizing my picture files copied from a PC.

I thought it could be an issue of duplicated timestamps, so I tried to change the file's timestamp (through touch command) and the date/time in JPEG's header (via jhead). No way!

That expert guy in the forum was definitely wrong: even with the proper naming and file formats, the console still does not accept pictures from the outside world! That lock-in stinks like some other company's behavior. Then I turned Nintendo 3DS over looking for some fruit logo on its back! :P

Deep breath. I realized that 3DS was able to save pictures from the Internet browser that comes with the system. Hmmm, that should be a hint! When observing JPEG files taken from Internet, I discovered that they didn't bear those blessed EXIF headers... :P (Well, at least I learned about jhead tool.)

What if I put my own picture files in the Internet and try to access the respective HTML page from 3DS? That's what I did: raised my apache2 service on Linux and pointed the browser in 3DS to my local URL address. After displaying the image, I was able to save it locally in the console! OK, that should be a solution.

Indeed the JPEG was successfully persisted in the proper folder with its preferred name. However, the original file format and EXIM header were kept (I checked it using Linux diff command).

The task of choosing which pictures to import into 3DS I gave to my daughter, but I still needed to make it easier for her. So, in order to display thumbnails rather than Apache's default file listing, I created a small Shell Script:

$ for a in *.JPG *.jpg; do echo "<a href='$a'><img src='$a' border=5 height='25%'/></a><br/>"; done > fotos.html

This was to be run inside each folder containing candidate pictures to be saved into 3DS. In Apache HTTPD's /var/www/ directory I created symbolic links to those directories.

After checking the availability of my local pages from 3DS, I finally returned this gadget to my daughter. Then I taught her how to browse the pages and save the pictures she wanted. She was quite happy for that besides the extended effort.

Wooh, what a lotta work! Life could be easier, isn't it Nintendo?

sábado, 13 de agosto de 2011

Introdução ao MongoDB




Aplicações modernas provaram que bancos de dados do tipo NoSQL são inevitáveis para o sucesso e continuidade de empresas altamente dependentes da Internet. Vide exemplos como Google, Yahoo, Amazon, Twitter e Facebook.


Todavia, existem inúmeras soluções disponíveis e nenhum padrão sobre como manipular, trafegar ou consultar as informações contidas nos bancos NoSQL. Mesmo a classificação (ou melhor dizer, a taxonomia) dessa zoologia de novos bancos ainda está (perdoem-me o trocadilho!) nebulosa... Quem sabe futuramente tenhamos um ANSI-NoSQL...



O fato é que algumas dessas tecnologias provaram ser apenas estufas para o meio acadêmico, enquanto que outras chegaram a evoluir a ponto de serem aceitas por empresas que apostam no pioneirismo. Uma dessas tecnologias de sucesso foi o MongoDB.


View more presentations from hjort.

Nesta apresentação são introduzidos conceitos como as Grandes Rupturas (IMS x RDBMS x NoSQL), o que é o MongoDB, o Modelo de Dados Orientado a Documentos, JSON e BSON, tipos de dados no MongoDB, operações (Insert, Update, Delete), Modificadores Atômicos, Linguagem de Consulta, Indexação, Agregação e Map/Reduce, Capped Collections, GridFS, Server-Side Scripting, Replicação (Master/Slave e Replica Sets), Arquitetura com Sharding, Auto-Sharding + Replicação e outras tecnologias e detalhes envolvidos no banco de dados MongoDB.


segunda-feira, 8 de agosto de 2011

Easily creating thumbnails on Linux




Have you ever needed to scale down a bunch of picture files to a smaller resolution?

If you used a GUI-based program such as gimp or Photoshop, and spent a reasonable time on that task, you'll surely appreciate this post. The same applies if you're not a slave on mice and thus prefer to open command-line terminals rather than double-clicks.




On Linux there is ImageMagick, a software suite that lets you create, edit, and compose bitmap images. It can read, convert and write images in a variety of formats including GIF, JPEG, PNG, TIFF, etc. We can use it to translate, flip, mirror, rotate, scale, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bezier curves.

The most important feature to consider here is that ImageMagick includes a number of command-line utilities for manipulating images. That is, text-mode, no GUI needed for editing your images.



We say that ImageMagick can modify or create images automagically and dynamically!

For instance, to rescale (to 800 by 600 pixels) and lower the quality (by 70%) of a JPEG file, the following instruction can be used:

$ convert -scale 800x600 -quality 70 before.jpg after.jpg



Thus, suppose you have a directory hierarchy named source filled with your pictures and you need to create a similar structure but with thumbnails instead. The resulting directory will be called destin.

First of all, open a terminal. :D Then, change to source directory and create the new destin (in this example they're parallel):

$ cd source

$ mkdir ../destin


Second, create subdirectories on destin following the existing source structure:

$ find -type d -exec mkdir -p "../destin/{}" \;



If you wish, issue a tree or simple find command on destin just to check its contents.

At last, run find command along with convert in order to create thumbnails in batch mode:

$ find -type f -exec convert -scale 1024x768 -quality 85 "{}" "../destin/{}" \;


That's it, Power to the Shell! And a Happy 20th Anniversary, Linux!