Chém gió

Chủ Nhật, 1 tháng 12, 2013

Update USD exchange rate every 30s

import urllib2
import threading
import time
from bs4 import BeautifulSoup
def scrap():
    response = urllib2.urlopen('http://www.vietcombank.com.vn/exchangerates/')
    html = response.read()
    soup = BeautifulSoup(html)
    for i in soup.findAll('td',{"class":"code"}):
        if i.contents[0] == 'USD':
            print "Sell "+ i.find_all_next()[1].text
            print "Buy "+ i.find_all_next()[3].text
#t = threading.Timer(5, scrap)
while True:
        scrap()
        time.sleep(30)


Thứ Sáu, 29 tháng 11, 2013

Regex in hive

in your databases have many fields but one filed have data something like this: a1955ce0b318391b contain only a-z0-9 but some errors occurs while collect and store to database somehow that field contains null values or some dummy charracter how to filter which entry errors use this regex:
> select idvisitor from log where day = '2013-11-29' and idvisitor not rlike '^[a-z0-9]{16}$';
what does it means in hive?
rlike mean: regular expression like to use regex in hive query.
^ : from start of line
[a-z0-9] only contain a-z or 0-9 without order.
{16} contains exact 16 characters.
$ end of line.
It save my day.
:)

Thứ Ba, 26 tháng 11, 2013

Hive authorization

scenario: in cluster run hadoop mapreduce and hive. one system run hadoop (ref here) and client run hive (ref here) to run jobs on hadoop.
you have many users access to run hadoop and also have many database tables blah blah.
How to control authorization of each hive' user.?
ok we start.

From picture above we can see two places we can apply security strategy : RDBMS store metadata and HDFS store real Data of table. In this section I will apply security at RDBMS.
In our mysql databases we already created metastore database (ref to previous tut ) now from log in to mysql using root privileges create user hive with select only to metastore database:
mysql>CREATE USER 'hivetest'@'localhost' IDENTIFIED BY 'mypass';
mysql>use metastore;
mysql>grant select on * to user hivetest;
now we use 'hivetest' and 'mypass' in hive-site.xml at client: 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
 <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://localhost:3306/metastore</value>
      <description>the URL of the MySQL database</description>
 </property>

 <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>/user/hive/warehouse</value>
 </property>


 <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
 </property>

 <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>hivetest</value>
 </property>

 <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>mypass</value>
 </property> 
</configuration>

log in to hive as previous tut create table :
hive> create table tbl_1(a int);
switch to client: 
hive> select * from tbl_1;// can't select error comes.
back to super user:
hive> grant select on table tbl_1 to user hivetest;
switch to client: now you can select tbl_1 as normal. and this user can only do something granted from super user.
how to setup: 
from your system create new user beside your current user(super user ):hivetest (no need to same user name with mysql user above.) we can log in to these users by ctrl + alt + F1 ; ctrl+alt+F2 for each user. 
This is called hive authorization at metastore level. :)
DOne.
see ya.





Thứ Hai, 25 tháng 11, 2013

Hive in cooperate with hadoop.

Download Hive-0.12.0 from apache (at the time of this article can't access to apache :( )
extract it.
Some configurations:
1. hive-env.sh (hive-0.12.0/conf) add this line to point hive to hadoop (ref here for configuring hadoop)
HADOOP_HOME=/home/thientin/packages_container/hadoop-2.2.0 
from here we can run hive as the top application of hadoop. But metadata will be store in derby.log.
Below steps to store metadata in mysql (for security reason whatever.)
Install mysql(of course) create user in mysql pass accordingly to that user.(intend for hive using.), create database for hive:
$ mysql -u root -p
Enter password:
mysql> CREATE DATABASE metastore;
mysql> USE metastore;
mysql> SOURCE /home/hive-0.12.0/scripts/metastore/upgrade/mysql/hive-schema-0.10.0.mysql.sql;

mysql> CREATE USER 'hive'@'localhost' IDENTIFIED BY 'mypassword';
...
mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hive'@'localhost';
mysql> GRANT SELECT,INSERT,UPDATE,DELETE,LOCK TABLES,EXECUTE ON metastore.* TO 'hive'@'localhost';
mysql> FLUSH PRIVILEGES;
mysql> quit;

2. create file hive-site.xml (if not exists)
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
 <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://localhost:3306/metastore</value>
      <description>the URL of the MySQL database</description>
 </property>

 <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>/user/hive/warehouse</value>
 </property>


 <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
 </property>

 <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>hive</value>
 </property>

 <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>mypassword</value>
 </property>


</configuration>
3. Copy mysql-connector-java-5.1.27.jar to /hive-0.12.0/lib/
4. Start hadoop services.
5. Test hive : 
> cd /home/hive-0.12.0/bin
> ./hive
screen like this:
try some query: 
>show databases;
>use default; show tables;create table tbl_1(key int, value string);
Done!

 

Thứ Năm, 21 tháng 11, 2013

Using regular expression format your text

find pattern in text (this kind of regular expression can be use with text editor like notepad++ or sublime):
to find any pattern you want replace text with pattern  :     ^.*pattern.*$
to find any blank line ^.\n
meaning of each character maybe in next post.
or can use sed command in linux box for search pattern:
sed '/^.*object.*$/d' ./filename.txt : this command will print all line NOT include "object" (without quote) of filename.txt in the terminal. 
or say: sed '/^.*object.*$/d' ./filename.txt >> newformatedtext.txt 
:)