# 启动mysql服务 $ systemctl start mysqld $ systemctl status mysqld ● mysqld.service - MySQL Community Server Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled) Active: active (running) since 一 2023-11-06 16:08:45 CST; 12min ago ...
# 停止 mysql 服务,准备对mysql的配置进行修改 $ systemctl stop mysqld
# This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. - SELINUX=enforcing + SELINUX=disabled # SELINUXTYPE= can take one of three values: # targeted - Targeted processes are protected, # minimum - Modification of targeted policy. Only selected processes are protected. # mls - Multi Level Security protection. SELINUXTYPE=targeted
$ /opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm Enter SCM password: JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera Verifying that we can write to /etc/cloudera-scm-server Creating SCM configuration file in /etc/cloudera-scm-server Executing: /usr/java/jdk1.8.0_181-cloudera/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-connector-java.jar:/opt/cloudera/cm/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db. [ main] DbCommandExecutor INFO Successfully connected to database. All done, your SCM database is configured correctly!
启动Cloudera Manager
启动 Cloudera Manager Server,当看到 Started Jetty server.,说明启动成功:
1 2 3
systemctl start cloudera-scm-server && tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log ... 2020-05-28 03:25:09,778 INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty server.
# 常用的测试有TestDFSIO,mrbench,nnbench $ yarn jar /opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.875250/jars/hadoop-mapreduce-client-jobclient-3.0.0-cdh6.1.1-tests.jar ... An example program must be given as the first argument. Valid program names are: DFSCIOTest: Distributed i/o benchmark of libhdfs. MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures TestDFSIO: Distributed i/o benchmark. fail: a job that always fails gsleep: A sleep job whose mappers create 1MB buffer for every record. loadgen: Generic map/reduce load generator mapredtest: A map/reduce test check. mrbench: A map/reduce benchmark that can create many small jobs nnbench: A benchmark that stresses the namenode w/ MR. nnbenchWithoutMR: A benchmark that stresses the namenode w/o MR. sleep: A job that sleeps at each map and reduce task. testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce testfilesystem: A testfor FileSystem read/write. testmapredsort: A map/reduce program that validates the map-reduce framework’s sort. testsequencefile: A testfor flat files of binary key value pairs. testsequencefileinputformat: A testfor sequence file input format. testtextinputformat: A testfor text input format. threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill timelineperformance: A job that launches mappers to test timeline service performance.
# 常用的测试有randomwriter,sort,terasort $ yarn jar /opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.875250/jars/hadoop-mapreduce-examples-3.0.0-cdh6.1.1.jar Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. dbcount: An example job that count the pageview counts from a database. distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files. wordmean: A map/reduce program that counts the average length of the words in the input files. wordmedian: A map/reduce program that counts the median length of the words in the input files. wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
TestDFSIO
测试 HDFS 写性能:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
# 向HDFS集群写入10个128M的文件 $ yarn jar /opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.875250/jars/hadoop-mapreduce-client-jobclient-3.0.0-cdh6.1.1-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 128MB ... 20/07/14 10:58:12 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 20/07/14 10:58:12 INFO fs.TestDFSIO: Date & time: Tue Jul 14 10:58:12 CST 2020 20/07/14 10:58:12 INFO fs.TestDFSIO: Number of files: 10 20/07/14 10:58:12 INFO fs.TestDFSIO: Total MBytes processed: 1280 20/07/14 10:58:12 INFO fs.TestDFSIO: Throughput mb/sec: 38.51 20/07/14 10:58:12 INFO fs.TestDFSIO: Average IO rate mb/sec: 40.69 20/07/14 10:58:12 INFO fs.TestDFSIO: IO rate std deviation: 11.09 20/07/14 10:58:12 INFO fs.TestDFSIO: Test exec time sec: 47.86 20/07/14 10:58:12 INFO fs.TestDFSIO:
$ spark-shell Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 22/03/14 01:14:40 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! Spark context Web UI available at http://cdh2:4040 Spark context available as 'sc' (master = yarn, app id = application_1647191315281_0001). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.0-cdh6.1.1 /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181) Type in expressions to have them evaluated. Type :help for more information.
报错:内存不足
2023 年搭建时没有启用 HDFS 的权限检查,报了另一个错:
1 2 3 4 5 6
$ spark-shell Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 23/11/07 01:13:15 ERROR spark.SparkContext: Error initializing SparkContext. java.lang.IllegalArgumentException: Required executor memory (1024), overhead (384 MB), and PySpark memory (0 MB) is above the max threshold (1129 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'. ...
-- Map阶段压缩(测试中未启用) -- 开启Hive中MR中间文件压缩 set hive.exec.compress.intermediate=true; -- 开启MR中Map输出压缩功能 set mapreduce.map.output.compress=true; -- 设置MR中Map阶段的压缩算法(对应的编/解码器): set mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;
-- Reduce阶段压缩 -- 开启MR最终输出文件压缩 set mapreduce.output.fileoutputformat.compress=true; -- 设置MR以序列化文件输出时的压缩方式,建议设置成按BLOCK级别进行压缩,可选项包括NONE,RECORD set mapreduce.output.fileoutputformat.compress.type=BLOCK; -- 设置MR最终输出文件的压缩算法(对应的编/解码器) set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec; -- 开启Hive中MR最终查询结果文件压缩 set hive.exec.compress.output=true;
-- 开启MR最终输出文件压缩 set mapreduce.output.fileoutputformat.compress=true; -- 设置MR以序列化文件输出时的压缩方式,建议设置成按BLOCK级别进行压缩,可选项包括NONE,RECORD set mapreduce.output.fileoutputformat.compress.type=BLOCK; -- 设置MR最终输出文件的压缩算法(对应的编/解码器) set mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec;
-- 开启Hive中MR最终查询结果文件压缩 set hive.exec.compress.output=true;
-- 开启lzo压缩 set mapreduce.output.fileoutputformat.compress=true; set mapreduce.output.fileoutputformat.compress.type=BLOCK; set mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec; set hive.exec.compress.output=true;
-- 创建视频表数据 createtable quickvideo_video_lzo( videoId string, uploader string, age int, category array<string>, length int, views int, rate float, rating int, comment int, relatedId array<string>) row format delimited fields terminated by "\t" collection items terminated by "&";