安装完hadoop后便可安装oozie运行自己的工作流:
1、下载oozie压缩包,oozie-4.0.0-cdh5.0.0.tar.gz,下载地址
2、下载ext-2.2.zip:
3、下载tomcat并解压
4、下载maven,(下载的oozie是已经编译好的,如果是未编译的需要用maven先编译一下才能安装)
5、解压oozie到安装目录,解压并设置环境变量如下:
export MAVEN_HOME=/export/servers/apache-maven-3.0.5
export TOMCAT_HOME=/export/servers/apache-tomcat-6.0.26
export OOZIE_HOME=/export/servers/oozie-4.0.0-cdh5.0.0
export OOZIE_CONFIG=/export/servers/oozie-4.0.0-cdh5.0.0/confexport PATH=$JAVA_HOME/bin:JRE_HOME/bin:$HADOOP_HOME/bin:$MAVEN_HOME/bin:$OOZIE_HOME/bin:$TOMCAT_HOME/bin:$PATH使环境变量生效:source /etc/profile
6、修改oozie配置文件,conf目录结构如下:
action-conf目录下只有一个hive.xml文件,修改内容如下:
<configuration>
<!-- An example of setting default properties for Hive action. This could be useful with Hadoop versions that have deprecated HADOOP_HOME that Hive still relies on.<property>
<name>hadoop.bin.path</name> <value>/export/servers/hadoop-2.2.0/bin/hadoop</value> </property><property>
<name>hadoop.config.dir</name> <value>/export/servers/hadoop-2.2.0/etc/hadoop</value> </property> --></configuration>hadoop-conf/core-site.xml:
<configuration>
<property>
<name>mapreduce.jobtracker.kerberos.principal</name> <value>mapred/_HOST@LOCALREALM</value> </property><property>
<name>yarn.resourcemanager.principal</name> <value>yarn/_HOST@LOCALREALM</value> </property><property>
<name>dfs.namenode.kerberos.principal</name> <value>hdfs/_HOST@LOCALREALM</value> </property><property>
<name>mapreduce.framework.name</name> <value>yarn</value> </property></configuration>
hadoop-config.xml文件内容与hadoop-conf/core-site.xml文件内容一样,无需修改。
oozie-default.xml该文件改动主要有两处:
1、<property>
<name>oozie.services</name> <value> org.apache.oozie.service.SchedulerService, org.apache.oozie.service.InstrumentationService, org.apache.oozie.service.CallableQueueService, org.apache.oozie.service.UUIDService, org.apache.oozie.service.ELService, org.apache.oozie.service.AuthorizationService, org.apache.oozie.service.UserGroupInformationService, org.apache.oozie.service.HadoopAccessorService, org.apache.oozie.service.URIHandlerService, org.apache.oozie.service.MemoryLocksService, org.apache.oozie.service.DagXLogInfoService, org.apache.oozie.service.SchemaService, org.apache.oozie.service.LiteWorkflowAppService, org.apache.oozie.service.JPAService, org.apache.oozie.service.StoreService, org.apache.oozie.service.CoordinatorStoreService, org.apache.oozie.service.SLAStoreService, org.apache.oozie.service.DBLiteWorkflowStoreService, org.apache.oozie.service.CallbackService, org.apache.oozie.service.ActionService, org.apache.oozie.service.ShareLibService, org.apache.oozie.service.ActionCheckerService, org.apache.oozie.service.RecoveryService, org.apache.oozie.service.PurgeService, org.apache.oozie.service.CoordinatorEngineService, org.apache.oozie.service.BundleEngineService, org.apache.oozie.service.DagEngineService, org.apache.oozie.service.CoordMaterializeTriggerService, org.apache.oozie.service.StatusTransitService, org.apache.oozie.service.PauseTransitService, org.apache.oozie.service.GroupsService, org.apache.oozie.service.ProxyUserService, org.apache.oozie.service.XLogStreamingService, org.apache.oozie.service.JobsConcurrencyService </value> <description> All services to be created and managed by Oozie Services singleton. Class names must be separated by commas. </description> </property>将该节点的org.apache.oozie.service.JobsConcurrencyService类提至第一行,如下:
<property>
<name>oozie.services</name> <value> org.apache.oozie.service.JobsConcurrencyService, org.apache.oozie.service.SchedulerService,。。。。
2、去掉下面节点,(其实去不去无所谓,根据自己的实际应用来)
<property>
<name>oozie.service.coord.check.maximum.frequency</name> <value>true</value> <description> When true, Oozie will reject any coordinators with a frequency faster than 5 minutes. It is not recommended to disable this check or submit coordinators with frequencies faster than 5 minutes: doing so can cause unintended behavior and additional system stress. </description> </property>oozie-site.xml,修改的地方主要有以下几点:
1、<property>
<name>oozie.service.ActionService.executor.ext.classes</name> <value> org.apache.oozie.action.email.EmailActionExecutor, org.apache.oozie.action.hadoop.HiveActionExecutor, org.apache.oozie.action.hadoop.ShellActionExecutor, org.apache.oozie.action.hadoop.SqoopActionExecutor, org.apache.oozie.action.hadoop.DistcpActionExecutor </value> </property>该节点修改成如下,添加几项内容:<property>
<name>oozie.subworkflow.classpath.inheritance</name> <value>true</value> </property> <property> <name>oozie.servlet.CallbackServlet.max.data.len</name> <value>1048576</value> </property><property>
<name>oozie.service.ActionService.executor.ext.classes</name> <value> org.apache.oozie.action.email.EmailActionExecutor, org.apache.oozie.action.hadoop.HiveActionExecutor, org.apache.oozie.action.hadoop.ShellActionExecutor, org.apache.oozie.action.hadoop.SqoopActionExecutor, org.apache.oozie.action.hadoop.DistcpActionExecutor </value> </property>2、
<property>
<name>oozie.service.JPAService.jdbc.driver</name> <value>com.mysql.jdbc.Driver</value> <description> JDBC driver class. </description> </property><property>
<name>oozie.service.JPAService.jdbc.url</name> <value>jdbc:mysql://192.168.157.92:3358/oozie4</value> <description> JDBC URL. </description> </property><property>
<name>oozie.service.JPAService.jdbc.username</name> <value>root</value> <description> DB user name. </description> </property><property>
<name>oozie.service.JPAService.jdbc.password</name> <value>123456</value> <description> DB user password.IMPORTANT: if password is emtpy leave a 1 space string, the service trims the value,
if empty Configuration assumes it is NULL. </description> </property>这几个节点的作用是:oozie有一个默认的derby数据库,是用来存储oozie节点的相关信息的,如果想用自己的mysql数据库,可按照上面例子配置
3、
<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name> <value>*=/export/servers/hadoop-2.2.0/etc/hadoop</value> <description> Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the HOST:PORT of the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is used when there is no exact match for an authority. The HADOOP_CONF_DIR contains the relevant Hadoop *-site.xml files. If the path is relative is looked within the Oozie configuration directory; though the path can be absolute (i.e. to point to Hadoop client conf/ directories in the local filesystem. </description> </property>该节点是设置hadoop的配置文件目录
4、
<!-- Proxyuser Configuration -->
<property>
<name>oozie.service.ProxyUserService.proxyuser.#USER#.hosts</name> <value>*</value> <description> List of hosts the '#USER#' user is allowed to perform 'doAs' operations.The '#USER#' must be replaced with the username o the user who is
allowed to perform 'doAs' operations.The value can be the '*' wildcard or a list of hostnames.
For multiple users copy this property and replace the user name
in the property name. </description> </property><property>
<name>oozie.service.ProxyUserService.proxyuser.#USER#.groups</name> <value>*</value> <description> List of groups the '#USER#' user is allowed to impersonate users from to perform 'doAs' operations.The '#USER#' must be replaced with the username o the user who is
allowed to perform 'doAs' operations.The value can be the '*' wildcard or a list of groups.
For multiple users copy this property and replace the user name
in the property name. </description> </property>将该两个节点的注释去掉。
因为用到了mysql数据库,所以需要将mysql的jar包mysql-connector-java-5.1.20.jar拷贝到oozie的lib目录及libtools目录下
至此,oozie的配置文件修改完毕,下面做一些oozie启动前的准备工作 :
1、进入到mysql数据库,创建在oozie-site.xml文件中指定的数据库oozie:
create database oozie; (创建名称为oozie的数据库)
grant all privileges on oozie.* to 'root'@'localhost' identified by '123456'; (设置oozie数据库的访问全选,创建用户名为oozie,密码为oozie的用户) grant all privileges on oozie.* to 'root'@'%' identified by '123456'; (设置oozie数据库的访问权限) FLUSH PRIVILEGES;2、在$OOZIE_HOME/bin目录下执行以下命令,生成创建数据库表的脚本:
sh ooziedb.sh create -sqlfile oozie.sql
3、执行数据库脚本,生成相关数据库表:
sh oozie-setup.sh db create -run -sqlfile oozie.sql
至此,数据库配置完毕。
4、生成oozie.war包:
执行如下命令生成oozie.war包,还是在bin目录下执行:
sh addtowar.sh -inputwar $OOZIE_HOME/oozie.war -outputwar $OOZIE_HOME/oozie-server/webapps/oozie.war -hadoop 2.2.0 $HADOOP_HOME -extjs ext-2.2.zip
5、生成的war包可能没有带mysql-connector-java-5.1.20.jar包,所以需要将该jar包也加到war包中去,否则后面启动oozie时会报错。
6、在$OOZIE_HOME/bin目录下执行(不知道该步骤有啥用)
sh oozie-setup.sh sharelib create -fs hdfs://hadoop-master:8020 -locallib $OOZIE_HOME/oozie-sharelib-4.0.0-cdh5.2.0-yarn.tar.gz
(hadoop2多hdfs集群,hdfs://cluster1是core-site.xml中defaultFs名称:)sh oozie-setup.sh sharelib create -fs hdfs://cluster1 -locallib $OOZIE_HOME/oozie-sharelib-4.0.0-cdh5.0.0-beta-2-yarn.tar.gz
6、启动oozie:
前台运行oozie:
sh oozied.sh run
后台运行oozie:
sh oozied.sh start
启动后看看http://hadoop-master:11000/oozie如下:
7、运行ooize配置:
运行oozie工作流的目录基本机构如下:
8、将上面目录文件上传至hdfs目录下,如/user/root/oozie/workflow/oozieTest目录下
执行oozie命令如下,可以将命令直接放到一个sh文件中,下次直接sh执行即可:
run_oozie.sh:
oozie job -oozie http://hadoop-master:11000/oozie -config $1 -D nameNode=hdfs://hadoop-master:8020 -D jobTracker=hadoop-master:8032 -D queueName=root -D frequency=60 -D nolockTime=0 -D start=2013-11-22T10:00Z -D end=2014-08-30T00:00Z -run
(注:hadoop2如果是单个hdfs集群,则跟上面差不多,但是如果是多个hdfs集群,就不一样了,其中hdfs://cluster1是core-site.xml中的defaultFs名称,而且没有端口号,jobtracker端口是8032,需要按照如下格式写:)
oozie job -oozie http://hadoop-kf105.jd.com:11000/oozie -config $1 -D nameNode=hdfs://cluster1 -D jobTracker=hadoop-kf100.jd.com:8032 -D frequency=60 -D nolockTime=0 -D start=2013-11-22T10:00Z -D end=2014-08-30T00:00Z -run
执行一个工作流的话可以这样:sh run_oozie.sh oozieTest/job.properties
kill_oozie.sh:
oozie job -oozie http://hadoop-master:11000/oozie -kill $1
kill一个工作流使用:sh kill_oozie.sh jobId
好了,oozie先介绍到这包,以后有什么新内容再继续补充。