HadoopTroubleShooting

1: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out

Answer:
The program which need to open multiple documents, analysis, system general default number is 1024, (with ulimit -a can see) for normal use is enough, but the procedures are concerned, too little.
The revised Measures:
To modify the 2 file.
       /etc/security/limits.conf
vi /etc/security/limits.conf
Add:
* soft nofile 102400
* hard nofile 409600

$cd /etc/pam.d/
$sudo vi login
Making making add making making session required making /lib/security/pam_limits.so

To solve the first problem I correct answers:
This is the output of reduce preprocessing stage shuffle gets completed map failures exceeds the limit caused by upper limit, default is 5. Cause this problem may have many kinds, such as the network connection is not normal, connection timeout, bandwidth is poor and port blocking etc.... The network is usually within the framework of this error is not good.

2: Too many fetch-failures
Answer:
The problem is communicated between nodes is not comprehensive enough.
1) Check , /etc/hosts
The requirements for IP corresponds to a server name
Required to contain all the server IP and server name
2) Check.Ssh/authorized_keys
Requirements include all servers (including itself) public key

3: The processing speed is very slow map soon, but reduce was slow and repeated reduce=0%
Answer:
Combined with second points, and then
Modify the conf/hadoop-env.sh export HADOOP_HEAPSIZE=4000

4: To start datanode, but cannot be accessed, is not the end of the error
To format a new distributed file, Need to delete the path the local file system configured on your NameNode dfs.name.dir the namenode NameNode name space used to store persistent storage and transaction log, At the same time, the path to the DataNode DataNode on the dfs.data.dir store path of the local file system block data directory is deleted. If the configuration is deleted in NameNode /home/hadoop/NameData, remove the /home/hadoop/DataNode1 and the DataNode on the/home/hadoop/DataNode2. This is because the Hadoop in the format of a new distributed file system, Each memory name space corresponds to the version of the time (see /home/hadoop /NameData/current directory of the VERSION file, The above records the version information), In the re formatted file when a new distributed system, It is best to delete the NameData directory. You must remove all DataNode dfs.data.dir. This can make the namedode and the information recorded in the datanode version.
Note: delete is a dangerous action, can't confirm the case deletion!! To delete files from backup！！

5: java.io.IOException: Could not obtain block: blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_log/src_20090724_log
Appear this kind of circumstance is mostly node is broken, not connected.

6: java.lang.OutOfMemoryError: Java heap space
This anomaly, obviously JVM is not enough memory to reason, to modify all of the datanode JVM memory size.
Java -Xms1024m -Xmx4096m
Maximum memory usage of general JVM should be half the total memory size, we use the 8G memory, so is set to 4096m, this value may still not optimal value.
This topic in 2009-11-20 by admin 10:50
Top, this post is very good, to the top. Attached is the information by Hadoop technology exchange group if students with ice:
(12.58 KB)
Methods Hadoop add nodes
You actually add nodes:
1 to configure in slave environment, including SSH, JDK, config, lib, copy bin etc.,
2 host was added to the cluster namenode new datanode and other datanode,
3 new datanode IP to master conf/slaves,
4 the resumption of the cluster, see the new datanode node in cluster,
Running bin/start-balancer.sh 5, this can be very time consuming
Remarks:
1 if not balance, then the cluster will send the new data are stored in the new node, which will reduce the efficiency of the MR,
2 also can call the bin/start-balancer.sh command execution, can also add parameter -threshold 5
Threshold is the balance threshold, the default is 10%, the lower the value of each node is balanced, but consumes longer time.
3 balancer can also be run on a Mr job cluster, the default dfs.balance.bandwidthPerSec is low, 1M/s. In the absence of Mr job, can improve the speed of load balancing the set time.

Other remarks:
1 must ensure that the slave firewall is down;
2 to ensure that the new slave IP has been added to the master and other slaves /etc/hosts, and also to master and other slave IP added to the new slave /etc/hosts
Mapper and reducer number
URL address: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
HowManyMapsAndReduces
Partitioning your job into maps and reduces
Picking the appropriate size for the tasks for your job can radically change the performance of Hadoop. Increasing the number of tasks increases the framework overhead, but increases load balancing and lowers the cost of failures. At one extreme is the 1 map/1 reduce case where nothing is distributed. The other extreme is to have 1,000,000 maps/ 1,000,000 reduces where the framework runs out of resources for the overhead.
Number of Maps
The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to adjust their DFS block size to adjust the number of maps. The right level of parallelism for maps seems to be around 10-100 maps/node, although we have taken it up to 300 or so for very cpu-light map tasks. Task setup takes awhile, so it is best if the maps take at least a minute to execute.
Actually controlling the number of maps is subtle. The mapred.map.tasks parameter is just a hint to the InputFormat for the number of maps. The default InputFormat behavior is to split the total number of bytes into the right number of fragments. However, in the default case the DFS block size of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapred.min.split.size. Thus, if you expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k maps, unless your mapred.map.tasks is even larger. Ultimately the [WWW] InputFormat determines the number of maps.
The number of map tasks can also be increased manually using the JobConf's conf.setNumMapTasks(int num). This can be used to increase the number of map tasks, but will not set the number below that which Hadoop determines via splitting the input data.
Number of Reduces
The right number of reduces seems to be 0.95 or 1.75 * (nodes * mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can launch immediately and start transfering map outputs as the maps finish. At 1.75 the faster nodes will finish their first round of reduces and launch a second round of reduces doing a much better job of load balancing.
Currently the number of reduces is limited to roughly 1000 by the buffer size for the output files (io.buffer.size * 2 * numReduces <<heapSize). This will be fixed at some point, but until it is it provides a pretty firm upper bound.
The number of reduces also controls the number of output files in the output directory, but usually that is not important because the next map/reduce step will split them into even smaller splits for the maps.
The number of reduce tasks can also be increased in the same way as the map tasks, via JobConf's conf.setNumReduceTasks(int num).
My own understanding:
The number of mapper setting: a relationship with input file, also has a relationship with the filesplits, filesplits line dfs.block.size, line can be set via mapred.min.split.size, finally decided by InputFormat.

Good advice:
The right number of reduces seems to be 0.95 or 1.75 multiplied by (<no. of nodes> * mapred.tasktracker.reduce.tasks.maximum).increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures.
<property>
   <name>mapred.tasktracker.reduce.tasks.maximum</name>
   <value>2</value>
   <description>The maximum number of reduce tasks that will be run
   simultaneously by a task tracker.
   </description>
</property>

A single node add a new hard disk
The 1 amendment to the new hard disk node partition dfs.data.dir, new, old files directory with a comma
2 restart DFS

Synchronization of Hadoop code
hadoop-env.sh
# host:path where hadoop code should be rsync'd from.   Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop

With small HDFS file with commands
hadoop fs -getmerge <src> <dest>

Restart the reduce job method
Introduced recovery of jobs when JobTracker restarts. This facility is off by default.
Introduced config parameters "mapred.jobtracker.restart.recover", "mapred.jobtracker.job.history.block.size", and "mapred.jobtracker.job.history.buffer.size".
Has not verified.

IO write operation problems
0-1246359584298, infoPort=50075, ipcPort=50020):Got exception while serving blk_-5911099437886836280_1292 to /172.16.100.165:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/
172.16.100.165:50010 remote=/172.16.100.165:50930]
       at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)
       at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
       at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
       at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293)
       at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387)
       at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179)
       at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94)
       at java.lang.Thread.run(Thread.java:619)

It seems there are many reasons that it can timeout, the example given in
HADOOP-3831 is a slow reading client.

Solution: try setting in hadoop-site.xml dfs.datanode.socket.write.timeout=0,
My understanding is that this issue should be fixed in Hadoop 0.19.1 so that
we should leave the standard timeout. However until then this can help
resolve issues like the one you're seeing.

Methods the HDFS out of service node
At present, the help information version of dfsadmin is not written clearly, have file a bug, following the correct method:
The 1 dfs.hosts is set to the current slaves, the file name with the full path, attention, host a node list name to name, the uname -n can be obtained.
2 the slaves to be out of service node name list such as slaves.ex in another file, using the dfs.host.exclude parameter, the full path to the file
The 3 running the command bin/hadoop dfsadmin -refreshNodes
4 web interface or the bin/hadoop dfsadmin -report can see out of service node state is Decomission in progress, until the need to copy the data replication has been completed
5 completed, from slaves (dfs.hosts points to a file) to remove node has been out of service

Incidentally the other three use -refreshNodes command:
Node 2 allowed to list (add host name to dfs.hosts.)
3 remove the node directly, do not copy of the data backup (remove the host name in the dfs.hosts)
4 out of service of the inverse operation -- stop inside exclude and dfs.hosts there are some, The ongoing decomission node out of service, Is the Decomission in progress node to become Normal (in the web interface called in service)

Hadoop learning
To solve the problem of Hadoop OutOfMemoryError 1:
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx800M -server</value>
</property>
With the right JVM size in your hadoop-site.xml , you will have to copy this
to all mapred nodes and restart the cluster.
Or: hadoop jar jarfile [main class] -D mapred.child.java.opts=-Xmx800M

2. Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) while indexing.
when i use nutch1.0,get this error:
Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) while indexing.
This is also very good:
Can remove the conf/log4j.properties, then you can see the details of the error report
I am here is out of memory
The solution is to run the main class to org.apache.nutch.crawl.Crawl plus parameters: -Xms64m -Xmx512m
You may not have this problem, but to see the error reporting problems with good solution

Distribute cache use
Similar to a global variable, but because this variable is big, so is not set in the config file, use distribute instead of cache
The specific method of use: (see the definitive guide ,P240)
The 1 in command line call: call -files, introducing the need to query file (can be local file, HDFS file (hdfs://xxx?)), or -archives (JAR, ZIP, tar etc.)
% hadoop jar job.jar MaxTemperatureByStationNameUsingDistributedCacheFile \
   -files input/ncdc/metadata/stations-fixed-width.txt input/ncdc/all output
Call 2 program:
public void configure(JobConf conf) {
   metadata = new NcdcStationMetadata();
   try {
       metadata.initialize(new File("stations-fixed-width.txt"));
   } catch (IOException e) {
       throw new RuntimeException(e);
   }
}
Another indirect method of use: in hadoop-0.19.0 does not seem to
Call addCacheFile () or addCacheArchive () add file,
Using getLocalCacheFiles () or getLocalCacheArchives () to obtain the documents

Hadoop job display Web
There are web-based interfaces to both the JobTracker (MapReduce master) and NameNode (HDFS master) which display status pages about the state of the entire system. By default, these are located at [WWW] and [WWW]

Hadoop monitoring
OnlyXP(52388483) 131702
Using Nagios as a warning, ganglia control chart

status of 255 error
Error type:
java.io.IOException: Task process exit with nonzero status of 255.
       at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)

The cause of the error:
Set mapred.jobtracker.retirejob.interval and mapred.userlog.retain.hours to higher value. By default, their values are 24 hours. These might be the reason for failure, though I'm not sure

split size
FileInputFormat input splits: (see the definitive guide P190)
mapred.min.split.size: default=1, the smallest valide size in bytes for a file split.
mapred.max.split.size: default=Long.MAX_VALUE, the largest valid size.
dfs.block.size: default = 64M, The system is set to 128M.
If you set minimum split size > block size, will increase the number of blocks. (guess from other nodes and data, with block block, leading to increase in the number of)
If you set maximum split size <block size, will be further split block.

split size = max(minimumSize, min(maximumSize, blockSize));
The minimumSize <blockSize <maximumSize.

sort by value
Hadoop does not provide sort by value directly, because this can degrade the performance of MapReduce.
But you can use a combination of means to achieve the specific implementation methods, see the definitive guide , P250
The basic idea:
A combination of 1 key/value as the new key,
2 heavy partitioner, according to old key segmentation,
conf.setPartitionerClass(FirstPartitioner.class);
3 custom keyComparator: according to the old key scheduling, according to the old value scheduling,
conf.setOutputKeyComparatorClass(KeyComparator.class);
4 heavy GroupComparator, also combined according to the old key,    conf.setOutputValueGroupingComparator(GroupComparator.class);

Treatment of small input files
For a series of small files as input file, will reduce the efficiency of Hadoop.
There are 3 ways to small file with treatment:
1 will be a series of small files into a sequneceFile, accelerate the speed of MapReduce.
See WholeFileInputFormat and SmallFilesToSequenceFileConverter, the definitive guide , P194
2 using CombineFileInputFormat integrated with FileinputFormat, but not implemented,
3 using Hadoop archives (similar packaging), reduce small files in namenode metadata memory consumption. (this method is not feasible, it is not recommended)
Method:
The /my/files file into the files.har directory and its subdirectories, and then placed in the /my directory
bin/hadoop archive -archiveName files.har /my/files /my

See the files in the archive:
bin/hadoop fs -lsr har://my/files.har

skip bad records
JobConf conf = new JobConf(ProductMR.class);
conf.setJobName("ProductMR");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Product.class);
conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);
conf.setMapOutputCompressorClass(DefaultCodec.class);
conf.setInputFormat(SequenceFileInputFormat.class);
conf.setOutputFormat(SequenceFileOutputFormat.class);
String objpath = "abc1";
SequenceFileInputFormat.addInputPath(conf, new Path(objpath));
SkipBadRecords.setMapperMaxSkipRecords(conf, Long.MAX_VALUE);
SkipBadRecords.setAttemptsToStartSkipping(conf, 0);
SkipBadRecords.setSkipOutputPath(conf, new Path("data/product/skip/"));
String output = "abc";
SequenceFileOutputFormat.setOutputPath(conf, new Path(output));
JobClient.runJob(conf);

For skipping failed tasks try : mapred.max.map.failures.percent

Restart single datanode
If a datanode problem solving, need to join the cluster and not the resumption of the cluster, as follows:
bin/hadoop-daemon.sh start datanode
bin/hadoop-daemon.sh start jobtracker

reduce exceed 100%
"Reduce Task Progress shows > 100% when the total size of map outputs (for a
single reducer) is high "
The cause of:
In the merge reduce, check progress error, lead to status > 100%, will appear the following error in statistical process: java.lang.ArrayIndexOutOfBoundsException: 3
       at org.apache.hadoop.mapred.StatusHttpServer$TaskGraphServlet.getReduceAvarageProgresses(StatusHttpServer.java:228)
       at org.apache.hadoop.mapred.StatusHttpServer$TaskGraphServlet.doGet(StatusHttpServer.java:159)
       at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
       at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
       at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
       at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
       at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
       at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
       at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
       at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
       at org.mortbay.http.HttpServer.service(HttpServer.java:954)

JIRA address:

counters
3 counters:
1. built-in counters: Map input bytes, Map output records...
2. enum counters
Call mode:
   enum Temperature {
MISSING,
MALFORMED
   }

reporter.incrCounter(Temperature.MISSING, 1)
The results show:
09/04/20 06:33:36 INFO mapred.JobClient: Air Temperature Recor
09/04/20 06:33:36 INFO mapred.JobClient:     Malformed=3
09/04/20 06:33:36 INFO mapred.JobClient:     Missing=66136856
3. dynamic countes:
Call mode:
reporter.incrCounter("TemperatureQuality", parser.getQuality(),1);

The results show:
09/04/20 06:33:36 INFO mapred.JobClient: TemperatureQuality
09/04/20 06:33:36 INFO mapred.JobClient:     2=1246032
09/04/20 06:33:36 INFO mapred.JobClient:     1=973422173
09/04/20 06:33:36 INFO mapred.JobClient:     0=1
7: Namenode in safe mode
Solution
bin/hadoop dfsadmin -safemode leave

8:java.net.NoRouteToHostException: No route to host
J solutions:
sudo /etc/init.d/iptables stop

9: Changes in namenode, select still points to run before the namenode address in hive
This is because: When youcreate a table, hive actually stores the location of the table (e.g.
hdfs://ip:port/user/root/...) in the SDS and DBS tables in the metastore . So when I bring up a new cluster the master has a new IP, but hive's metastore is still pointing to the locations within the old
cluster. I could modify the metastore to update with the new IP everytime I bring up a cluster. But the easier and simpler solution was to just use an elastic IP for the master
So we will appear before the Metastore address in the namenode for the replacement of all existing namenode address

10: Your DataNode is started and you can create directories with bin/hadoop dfs -mkdir, but you get an error message when you try to put files into the HDFS (e.g., when you run a command like bin/hadoop dfs -put).
Solution:
Go to the HDFS info web page (open your web browser and go to http://namenode:dfs_info_port where namenode is the hostname of your NameNode and dfs_info_port is the port you chose dfs.info.port; if followed the QuickStart on your personal computer then this URL will be http://localhost:50070). Once at that page click on the number where it tells you how many DataNodes you have to look at a list of the DataNodes in your cluster.
If it says you have used 100% of your space, then you need to free up room on local disk(s) of the DataNode(s).
If you are on Windows then this number will not be accurate (there is some kind of bug either in Cygwin's df.exe or in Windows). Just free up some more space and you should be okay. On one Windows machine we tried the disk had 1GB free but Hadoop reported that it was 100% full. Then we freed up another 1GB and then it said that the disk was 99.15% full and started writing data into the HDFS again. We encountered this bug on Windows XP SP2.
11: Your DataNodes won't start, and you see something like this in logs/*datanode*:
Incompatible namespaceIDs in /tmp/hadoop-ross/dfs/data
Reason:
Your Hadoop namespaceID became corrupted. Unfortunately the easiest thing to do reformat the HDFS.
Solution:
You need to do something like this:
bin/stop-all.sh
rm -Rf /tmp/hadoop-your-username/*
bin/hadoop namenode -format
12: You can run Hadoop jobs written in Java (like the grep example), but your HadoopStreaming jobs (such as the Python example that fetches web page titles) won't work.
Reason:
You might have given only a relative path to the mapper and reducer programs. The tutorial originally just specified relative paths, but absolute paths are required if you are running in a real cluster.
Solution:
Use absolute paths like this from the tutorial:
bin/hadoop jar contrib/hadoop-0.15.2-streaming.jar \
   -mapper   $HOME/proj/hadoop/multifetch.py       \
   -reducer $HOME/proj/hadoop/reducer.py          \
   -input urls/*                               \
   -output   titles
13: 2009-01-08 10:02:40,709 ERROR metadata.Hive (Hive.java:getPartitions(499)) - javax.jdo.JDODataStoreException: Required table missing : ""PARTITIONS"" in Catalog "" Schema "". JPOX requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "org.jpox.autoCreateTables"
Reason: because it is in the hive-default.xml org.jpox.fixedDatastore set to true
starting namenode, logging to /home/hadoop/HadoopInstall/hadoop/bin/../logs/hadoop-hadoop-namenode-hadoop.out
localhost: starting datanode, logging to /home/hadoop/HadoopInstall/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop.out
localhost: starting secondarynamenode, logging to /home/hadoop/HadoopInstall/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-hadoop.out
localhost: Exception in thread "main" java.lang.NullPointerException
localhost:    at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:130)
localhost:    at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
localhost:    at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:120)
localhost:    at org.apache.hadoop.dfs.SecondaryNameNode.initialize(SecondaryNameNode.java:124)
localhost:    at org.apache.hadoop.dfs.SecondaryNameNode.<init>(SecondaryNameNode.java:108)
localhost:    at org.apache.hadoop.dfs.SecondaryNameNode.main(SecondaryNameNode.java:460)
14: 09/08/31 18:25:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:Bad connect ack with firstBadLink 192.168.1.11:50010
> 09/08/31 18:25:45 INFO hdfs.DFSClient: Abandoning block blk_-8575812198227241296_1001
> 09/08/31 18:25:51 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.1.16:50010
> 09/08/31 18:25:51 INFO hdfs.DFSClient: Abandoning block blk_-2932256218448902464_1001
> 09/08/31 18:25:57 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.1.11:50010
> 09/08/31 18:25:57 INFO hdfs.DFSClient: Abandoning block blk_-1014449966480421244_1001
> 09/08/31 18:26:03 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.1.16:50010
> 09/08/31 18:26:03 INFO hdfs.DFSClient: Abandoning block blk_7193173823538206978_1001
> 09/08/31 18:26:09 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable
to create new block.
>       at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2731)
>       at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
>       at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182)
>
> 09/08/31 18:26:09 WARN hdfs.DFSClient: Error Recovery for block blk_7193173823538206978_1001
bad datanode[2] nodes == null
> 09/08/31 18:26:09 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/umer/8GB_input"
- Aborting...
> put: Bad connect ack with firstBadLink 192.168.1.16:50010

Solution:
I have resolved the issue:
What i did:

1) '/etc/init.d/iptables stop' -->stopped firewall
2) SELINUX=disabled in '/etc/selinux/config' file.-->disabled selinux
I worked for me after these two changes
To solve the problem of jline.ConsoleReader.readLine is not effective method in Windows
In the CliDriver.java main () function, There is a reader.readLine statement, Used to read the standard input, But in the Windows platform the statement always returns null, The reader is an instance of the jline.ConsoleReader instance, Bring inconvenience to Windows Eclipse debugging.
We can use java.util.Scanner.Scanner to replace it, the original
while ((line=reader.readLine(curPrompt+"> ")) != null)
Copy code
Replace:
Scanner sc = new Scanner(System.in);
while ((line=sc.nextLine()) != null)
Copy code
The re compilation released, can be normal from the standard input is read the SQL statement.

Windows Eclispe hive does not have debugging report a scheme error may be
1, In the Hive configuration file "hive.metastore.local" configuration values for the false, it needs to be modified for true, because it is a stand-alone version
2, The HIVE_HOME environment variable is not set, or setting error
3, "does not have a scheme"It may be because can not find the "hive-default.xml". The use of Eclipse debugging Hive, unable to find solutions to meet hive- default.xml:
1, Chinese problem
The Chinese from URL, but Hadoop is still in print out garbled? We used to think that Hadoop is not supported Chinese, after viewing the source code, found that Hadoop simply do not support GBK format output Chinese.
This is the code in TextOutputFormat.class, output Hadoop default is inherited from FileOutputFormat, two sub class of FileOutputFormat is a binary stream based on output, one is based on the output of the TextOutputFormat text.
public class TextOutputFormat<K, V> extends FileOutputFormat<K, V> {
   protected static class LineRecordWriter<K, V>
implements RecordWriter<K, V> {
private static final String utf8 = "UTF-8″;//Here are written into the UTF-8
private static final byte[] newline;
static {
   try {
       newline = "\n".getBytes(utf8);
   } catch (UnsupportedEncodingException uee) {
       throw new IllegalArgumentException("can't find " + utf8 + " encoding");
   }
}
…
public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {
   this.out = out;
   try {
       this.keyValueSeparator = keyValueSeparator.getBytes(utf8);
   } catch (UnsupportedEncodingException uee) {
       throw new IllegalArgumentException("can't find " + utf8 + " encoding");
   }
}
…
private void writeObject(Object o) throws IOException {
   if (o instanceof Text) {
       Text to = (Text) o;
       out.write(to.getBytes(), 0, to.getLength());//Here also need to modify the
   } else {
       out.write(o.toString().getBytes(utf8));
   }
}
…
}
Can be seen that the output Hadoop default write die for UTF-8, so if decode Chinese correctly, then the Linux client character is set to UTF-8 can see the Chinese. Because the Hadoop output of the UTF-8 format with Chinese.
Because most GBK database is used to define the field, if want to let Hadoop output in GBK format is compatible with Chinese do database?
We can define a new class:
public class GbkOutputFormat<K, V> extends FileOutputFormat<K, V> {
   protected static class LineRecordWriter<K, V>
implements RecordWriter<K, V> {
//The GBK can be written
private static final String gbk = "gbk";
private static final byte[] newline;
static {
   try {
       newline = "\n".getBytes(gbk);
   } catch (UnsupportedEncodingException uee) {
       throw new IllegalArgumentException("can't find " + gbk + " encoding");
   }
}
…
public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {
   this.out = out;
   try {
       this.keyValueSeparator = keyValueSeparator.getBytes(gbk);
   } catch (UnsupportedEncodingException uee) {
       throw new IllegalArgumentException("can't find " + gbk + " encoding");
   }
}
…
private void writeObject(Object o) throws IOException {
   if (o instanceof Text) {
//        Text to = (Text) o;
//        out.write(to.getBytes(), 0, to.getLength());
//    } else {
       out.write(o.toString().getBytes(gbk));
   }
}
…
}
Then add conf1.setOutputFormat in MapReduce code(GbkOutputFormat.class)
The GBK format output Chinese.

2, A normal operation of the MapReduce instance, throw an error

java.io.IOException: All datanodes xxx.xxx.xxx.xxx:xxx are bad. Aborting…
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2158)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
java.io.IOException: Could not get block locations. Aborting…
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
Upon investigation, the cause of the problem is the Linux machine open too many files cause. Use the command ulimit -n Linux can be found in the default file open number was 1024, modified /ect/security/limit.conf, add Hadoop soft 65535

Then run the program again (the best of all datanode changes), problem solving

3, Hadoop cannot be stop-all.sh after a period of operation, display error
no tasktracker to stop , no datanode to stop
The cause of the problem the is Hadoop at the time of stop is based on datanode mapred and DFS process. And the default process number stored in the /tmp, the Linux default will each period of time (usually a month or 7 days) to delete the files in the directory. Therefore deleted hadoop-hadoop-jobtracker.pid and hadoop- hadoop-namenode.pid two files, namenode will not find this two process on datanode.
In the configuration file in export HADOOP_PID_DIR can solve this problem

The problem:
Incompatible namespaceIDs in /usr/local/hadoop/dfs/data: namenode namespaceID = 405233244966; datanode namespaceID = 33333244
Reason:
In each execution of the Hadoop namenode -format, Will generate namespaceID NameNode, But in the hadoop.tmp.dir directory of the DataNode or retain the last namespaceID, Because of the discrepancy in namespaceID, Because DataNode cannot start, So just before each execution of the Hadoop namenode -format, Remove the hadoop.tmp.dir directory can start successfully. Please note that deletion of hadoop.tmp.dir corresponding to a local directory, instead of the HDFS directory.
Problem: Storage directory not exist
2010-02-09 21:37:53,203 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory D:\hadoop\run\dfs_name_dir does not exist.
2010-02-09 21:37:53,203 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory D:\hadoop\run\dfs_name_dir is in an inconsistent state: storage directory does not exist or is not accessible.
Solution: is because the store directory D:\hadoop\run\dfs_name_dir does not exist, so you only need to manually create the directory.
Problem: NameNode is not formatted
Solution: because HDFS is not formatted, only need to run the Hadoop namenode -format, and then start to
bin/Hadoop JPS reported the following exception:
Exception in thread "main" java.lang.NullPointerException
       at sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms(LocalVmManager.java:127)
       at sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms(MonitoredHostProvider.java:133)
       at sun.tools.jps.Jps.main(Jps.java:45)
The reason for the:
The system root /tmp folder has been deleted. The re establishment of the /tmp folder.
bin/Unable create log to making directory /tmp/... Hive also may be the reason

Thursday 13 August 2015

HadoopTroubleShooting