??斗地主捕鱼电竞提现秒到 广告位招租 - 15元/月全站展示
??支付宝搜索579087183领大额红包 ??伍彩集团官网直营彩票
??好待遇→招代理 ??伍彩集团官网直营彩票
??络茄网 广告位招租 - 15元/月全站展示
flume集成CDH步骤与异常解决

转载   2018-05-12   浏览量:608


1、? 确定你的flume在哪台主机上


2、? 确认该台主机上的flume是否可以正常使用?

?

在指定的目录下,创建一个bigdata_page_to_hive.conf

内容可以是官网的实例://flume.apache.org/FlumeUserGuide.html

启动:

flume-ng agent --conf conf --conf-file bigdata_page_to_hive.conf--name a1 -Dflume.root.logger=INFO,console

3、? flume将数据写入到hive中

3.1:验证你的hive是否可以成功使用

3.2:创建表

create table t_pages(

date string,

user_id string,

session_id string,

page_id string,

action_time string,

search_keyword string,

click_category_id string,

click_product_id string,

order_category_ids string,

order_product_ids string,

pay_category_ids string,

pay_product_ids string,

city_id string

)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

3.3 flume的sink为hive

? 发现我们需要用到hive的metastore服务,先看一下服务是否启动

a1.sinks.k1.hive.metastore = thrift://master:9083

?????可以采用telnet的方式判断端口是否通【但是最好是通过CDH界面】

# example.conf: A single-node Flume configuration

?

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

?

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

?

# Describe the sink

a1.sinks.k1.type = hive

a1.sinks.k1.hive.metastore = thrift://master:9083

a1.sinks.k1.hive.database = default

a1.sinks.k1.hive.table = t_pages

a1.sinks.k1.useLocalTimeStamp = false

a1.sinks.k1.round = true

a1.sinks.k1.roundValue = 10

a1.sinks.k1.roundUnit = minute

a1.sinks.k1.serializer = DELIMITED

a1.sinks.k1.serializer.delimiter = "\t"

a1.sinks.k1.serializer.serdeSeparator = '\t'

a1.sinks.k1.serializer.fieldnames =date,user_id,session_id,page_id,action_time,search_keyword,click_category_id,click_product_id,order_category_ids,order_product_ids,pay_category_ids,pay_product_ids,city_id

?

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

?

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

启动:

nohup flume-ng agent --conf conf--conf-file bigdata_page_to_hive.conf --name a1 &

发现异常

java.lang.NoClassDefFoundError:org/apache/hive/hcatalog/streaming/RecordWriter

1、? 没有导入依赖

2、? 有可能maven没有下载完整

3、? 包冲突的问题

没有依赖包----flume中缺少某个包

1、? 根据异常信息,确定缺少什么包

根据网上的搜索信息,确定缺少某一个包:

https://zhidao.baidu.com/question/923836961800918739.html

?

find / -name 'hive-hcatalog-core*'

?

根据link文件过滤、版本对比、猜测等,优先选择了一个jar包

/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/hive-hcatalog-core-1.1.0-cdh5.11.1.jar

2、? 如果找到的包正好是自己要的包的话,将包放在什么地方?

通过flume-ng启动时产生的日志信息

/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/flume-ng/lib/*

3、? 问题解决

cp/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/hive-hcatalog-streaming-1.1.0-cdh5.11.1.jar/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/flume-ng/lib/

可以采用链接的方式来解决:

ln -s/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/hive-hcatalog-streaming-1.1.0-cdh5.11.1.jarhive-hcatalog-streaming-1.1.0-cdh5.11.1.jar

?

异常:java.lang.NoClassDefFoundError:org/apache/hadoop/hive/metastore/api/MetaException

解决办法:

ln -s /opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/hive-metastore-1.1.0-cdh5.11.1.jar hive-metastore-1.1.0-cdh5.11.1.jar

异常:java.lang.ClassNotFoundException:org.apache.hadoop.hive.ql.session.SessionState

解决办法:

ln -s /opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/hive-exec-1.1.0-cdh5.11.1.jar hive-exec-1.1.0-cdh5.11.1.jar

?

?

异常:java.lang.ClassNotFoundException:org.apache.hadoop.hive.cli.CliSessionState

ln -s /opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/hive-cli-1.1.0-cdh5.11.1.jar /opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/flume-ng/lib/hive-cli-1.1.0-cdh5.11.1.jar

异常:org.apache.commons.cli.MissingOptionException: Missing requiredoption: n

在执行的时候忘记输入-name

异常:java.lang.ClassNotFoundException:com.facebook.fb303.FacebookService$Iface

ln -s /opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/libfb303-0.9.3.jar /opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/flume-ng/lib/libfb303-0.9.3.jar

异常:Cannot stream to table that has not been bucketed :{metaStoreUri='thrift://master:9083', database='default', table='t_pages',partitionVals=[] }

Hive对接的时候需要将表设置成桶表

create table t_pages(

date string,

user_id string,

session_id string,

page_id string,

action_time string,

search_keyword string,

click_category_id string,

click_product_id string,

order_category_ids string,

order_product_ids string,

pay_category_ids string,

pay_product_ids string,

city_id string

)

CLUSTERED BY (city_id)? INTO 20 BUCKETS

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

异常:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat cannot becast to org.apache.hadoop.hive.ql.io.AcidOutputFormat

AcidOutputFormat的类只有OrcOutputFormat, Hive表需要stored as orc

create table t_pages(

date string,

user_id string,

session_id string,

page_id string,

action_time string,

search_keyword string,

click_category_id string,

click_product_id string,

order_category_ids string,

order_product_ids string,

pay_category_ids string,

pay_product_ids string,

city_id string

)

CLUSTERED BY (city_id)? INTO 20 BUCKETS

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS ORC;

测试,在hive当中去看是否有当前数据

?

4、? 修改sources

capacity 100 full, consider committing more frequently,increasing capacity, or increasing thread count

5、? 最好将channel的存储转为文件



转载自:https://blog.csdn.net/hexinghua0126/article/details/80293537

招聘 不方便扫码就复制添加关注:程序员招聘谷,微信号:jobs1024


上一篇:

flume学习与总结记录
1.什么是flume??Cloudera开发的框架,实时收集数据??Flume学习的核心:agent的设计??官方文档,//flume.apache.org/FlumeUserGuide.html2.flume环境的搭建?下载地址://archive.cloudera.com/cdh5/??---选择对应版本?常规的解压安装操作??1)配置环境变量????e...
大数据之:Flume安装详解
0.软件版本下载//mirror.bit.edu.cn/apache/flume/1.集群环境Master172.16.11.97Slave1172.16.11.98Slave2172.16.11.992.下载软件包#Masterwget//mirror.bit.edu.cn/apache/flume/1.6.0/apache-flume-1.6.0-bin.ta...
flume实战教学练习新手必看
使用Flume关键就是写配置文件需求一:从指定网络端口采集数据输出到控制台conf的书写exampleconf:Asingle-nodeFlumeconfigurationNa
flume源码分析
flume源码分析,Flume的程序入口是orgapacheflumenodeApplicationmain,进入后会先进行命令行参数的解析及核对,使用的组件是orgapachecommonscli?;故呛芎糜玫?。
云计算大数据技术之Flume安装教程
云计算大数据技术之Flume安装教程。
Error:Couldnotfindorloadmainclassorg.apache.flume.tools.GetJavaProperty
Error:CouldnotfindorloadmainclassorgapacheflumetoolsGetJavaProperty。
Flume的几个基础概念
Flume的几个基础概念,从数据发生器接收数据,并将接收的数据以Flume的event格式传递给一个或者多个通道channal,Flume提供多种数据接收的方式,比如Avro,Thrift,exec等。
nginx+flume+hdfs搭建实时日志收集系统
nginx+flume+hdfs搭建实时日志收集系统。
Ceontos7安装Flume和问题
Ceontos7安装Flume及问题。Flume是一个强大的采集日志信息的工具,它适用大部分的日志采集场景。它的安装配置也非常简单,下面就一起来看看吧!
flume1.7TailDirsource重复获取数据集不释放资源解决办法
flume17TailDirsource重复获取数据集不释放资源解决办法。