Solved : Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try

Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[172.27.10.191:50010,DS-51d68378-35de-4c70-b27e-7a98a53919cc,DISK], DatanodeInfoWithStorage[172.27.10.223:50010,DS-f172c682-713d-4a8f-b8af-69198ddc6756,DISK]], original=[DatanodeInfoWithStorage[172.27.10.191:50010,DS-51d68378-35de-4c70-b27e-7a98a53919cc,DISK], DatanodeInfoWithStorage[172.27.10.223:50010,DS-f172c682-713d-4a8f-b8af-69198ddc6756,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via ‘dfs.client.block.write.replace-datanode-on-failure.policy’ in its configuration.

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:925)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:988)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1156)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:454)

[2017-11-10 07:08:03,097] ERROR Task hdfs-sink-prqt-stndln-rpro-fct-0 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerSinkTask:455)

java.lang.RuntimeException: java.util.concurrent.ExecutionException: io.confluent.connect.hdfs.errors.HiveMetaStoreException: Invalid partition for default.revpro-perf-oracle-jdbc-stdln-raw-KFK_RPRO_RI_WF_SUMM_GV: partition=0

at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:226)

at io.confluent.connect.hdfs.HdfsSinkTask.put(HdfsSinkTask.java:103)

at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:435)

at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:251)

at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:180)

at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:148)

at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146)

at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:190)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.util.concurrent.ExecutionException: io.confluent.connect.hdfs.errors.HiveMetaStoreException: Invalid partition for default.revpro-perf-oracle-jdbc-stdln-raw-KFK_RPRO_RI_WF_SUMM_GV: partition=0

at java.util.concurrent.FutureTask.report(FutureTask.java:122)

at java.util.concurrent.FutureTask.get(FutureTask.java:192)

at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:220)

… 12 more

Caused by: io.confluent.connect.hdfs.errors.HiveMetaStoreException: Invalid partition for default.revpro-perf-oracle-jdbc-stdln-raw-KFK_RPRO_RI_WF_SUMM_GV: partition=0

at io.confluent.connect.hdfs.hive.HiveMetaStore.addPartition(HiveMetaStore.java:107)

at io.confluent.connect.hdfs.TopicPartitionWriter$3.call(TopicPartitionWriter.java:662)

at io.confluent.connect.hdfs.TopicPartitionWriter$3.call(TopicPartitionWriter.java:659)

… 4 more

Caused by: InvalidObjectException(message:default.revpro-perf-oracle-jdbc-stdln-raw-KFK_RPRO_RI_WF_SUMM_GV table not found)

at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$append_partition_by_name_with_environment_context_result$append_partition_by_name_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:51619)

at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$append_partition_by_name_with_environment_context_result$append_partition_by_name_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:51596)

at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$append_partition_by_name_with_environment_context_result.read(ThriftHiveMetastore.java:51519)

at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)

at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_append_partition_by_name_with_environment_context(ThriftHiveMetastore.java:1667)

at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.append_partition_by_name_with_environment_context(ThriftHiveMetastore.java:1651)

at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.appendPartition(HiveMetaStoreClient.java:606)

at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.appendPartition(HiveMetaStoreClient.java:600)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152)

at com.sun.proxy.$Proxy51.appendPartition(Unknown Source)

at io.confluent.connect.hdfs.hive.HiveMetaStore$1.call(HiveMetaStore.java:97)

at io.confluent.connect.hdfs.hive.HiveMetaStore$1.call(HiveMetaStore.java:91)

at io.confluent.connect.hdfs.hive.HiveMetaStore.doAction(HiveMetaStore.java:87)

at io.confluent.connect.hdfs.hive.HiveMetaStore.addPartition(HiveMetaStore.java:103)

… 6 more

Root cause :

This error specially occures when clusters with just 3 or less Datanodes may experience Append failures under heavy load. Here’s the config parameter to fix it.

Solution :

Append the following set of configuration to your client (not name node or data node) in hdfs-site.xml file

 

<property>
<name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
<value>true</value>
<description>
If there is a datanode/network failure in the write pipeline,
DFSClient will try to remove the failed datanode from the pipeline
and then continue writing with the remaining datanodes. As a result,
the number of datanodes in the pipeline is decreased. The feature is
to add new datanodes to the pipeline.

This is a site-wide property to enable/disable the feature.

When the cluster size is extremely small, e.g. 3 nodes or less, cluster
administrators may want to set the policy to NEVER in the default
configuration file or disable this feature. Otherwise, users may
experience an unusually high rate of pipeline failures since it is
impossible to find new datanodes for replacement.

See also dfs.client.block.write.replace-datanode-on-failure.policy
</description>
</property>

<property>
<name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
<value>DEFAULT</value>
<description>
This property is used only if the value of
dfs.client.block.write.replace-datanode-on-failure.enable is true.

ALWAYS: always add a new datanode when an existing datanode is removed.

NEVER: never add a new datanode.

DEFAULT:
Let r be the replication number.
Let n be the number of existing datanodes.
Add a new datanode only if r is greater than or equal to 3 and either
(1) floor(r/2) is greater than or equal to n; or
(2) r is greater than n and the block is hflushed/appended.
</description>
</property>

<property>
<name>dfs.client.block.write.replace-datanode-on-failure.best-effort</name>
<value>false</value>
<description>
This property is used only if the value of
dfs.client.block.write.replace-datanode-on-failure.enable is true.

Best effort means that the client will try to replace a failed datanode
in write pipeline (provided that the policy is satisfied), however, it
continues the write operation in case that the datanode replacement also
fails.

Suppose the datanode replacement fails.
false: An exception should be thrown so that the write will fail.
true : The write should be resumed with the remaining datandoes.

Note that setting this property to true allows writing to a pipeline
with a smaller number of datanodes. As a result, it increases the
probability of data loss.
</description>
</property>

This article explanation more regarding the properties.

Advertisements

Amazon Aurora MySql Commands line

Connecting to a Database on a DB Instance Running the MySQL Database Engine

Once Amazon RDS provisions your DB instance, you can use any standard SQL client application to connect to a database on the DB instance. In this example, you connect to a database on a MySQL DB instance using MySQL monitor commands. One GUI-based application you can use to connect is MySQL Workbench; for more information, go to the Download MySQL Workbench page. For more information on using MySQL, go to the MySQL documentation.

 

To connect to a database on a DB instance using MySQL monitor

  1. Find the endpoint (DNS name) and port number for your DB Instance.
    1. Open the RDS console and then choose Instances to display a list of your DB instances.
    2. Choose the MySQL DB instance and choose See details from Instance actions to display the details for the DB instance.
    3. Scroll to the Connect section and copy the endpoint. Also, note the port number. You need both the endpoint and the port number to connect to the DB instance.Screen Shot 2018-02-27 at 11.46.17 AM
  2. Type the following command at a command prompt on a client computer to connect to a database on a MySQL DB instance using the MySQL monitor. Substitute the DNS name for your DB instance for <endpoint>, the master user name you used for <mymasteruser>, and the master password you used for <password>.
    spradhan.macosx$ mysql -h <endpoint> -P 3306 -u <mymasteruser>> -p

    You should see output similar to the following.

    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 350
    Server version: 5.6.27-log MySQL Community Server (GPL)
    
    Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
    
    mysql>

 

Basic MySQL Commandline operations

To login (from unix shell) use -h only if needed.
# [mysql dir]/bin/mysql -h hostname -u username -ppassword
To login (from windows)
mysql dir/bin/mysql.exe -h hostname -u username -ppassword

Create a database.
mysql> create database [databasename];

List all databases on the server.
mysql> show databases;

Switch to a database.
mysql> use [db name];

To see all the tables in the db.
mysql> show tables;

To see table’s field formats.
mysql> describe [table name];

To delete a db.
mysql> drop database [database name];

To delete a table.
mysql> drop table [table name];

Show all data from a table.
mysql> SELECT * FROM [table name];

To return columns and column information.
mysql> show columns from [table name];

Show particular rows with the given value.
mysql> SELECT * FROM [table name] WHERE [field name] = “value”;

Show all records containing the name “Something” AND the phone number ‘0123456789’.
mysql> SELECT * FROM [table name] WHERE name = “Something” AND phone_number = ‘0123456789’;

Show all records not containing the name “Something” AND the phone number ‘0123456789’ order by the phone_number field.
mysql> SELECT * FROM [table name] WHERE name != “Something” AND phone_number = ‘0123456789’ order by phone_number;

Show all records starting with the letters ‘Something‘ AND the phone number ‘0123456789’.
mysql> SELECT * FROM [table name] WHERE name like “Something%” AND phone_number = ‘0123456789’;

Show all records starting with letters ‘Something’ AND the phone number ‘0123456789’ limit to records 1 through 5.
mysql> SELECT * FROM [table name] WHERE name like “Something%” AND phone_number = ‘0123456789‘ limit 1,5;

Use a regular expression to find records. Use “REGEXP BINARY” to force case-sensitivity. This finds any record beginning with a.
mysql> SELECT * FROM [table name] WHERE rec RLIKE “^a”;

Show unique records.
mysql> SELECT DISTINCT [column name] FROM [table name];

Show selected records sorted in an ascending (asc) or descending (desc).
mysql> SELECT [col1],[col2] FROM [table name] ORDER BY [col2] DESC;

Return number of rows.
mysql> SELECT COUNT(*) FROM [table name];

Sum column.
mysql> SELECT SUM(*) FROM [table name];

Creating a new user. Login as root. Switch to the MySQL db. Make the user. Update privs.
# mysql -u root -p
mysql> use mysql;
mysql> INSERT INTO user (Host,User,Password) VALUES(‘%’,’username’,PASSWORD(‘password’));
mysql> flush privileges;

Change a users password from unix shell.
# [mysql dir]/bin/mysqladmin -u username -h hostname -ppassword ‘new-password’

Change a users password from MySQL prompt. Login as root. Set the password. Update privileges.
# mysql -u root -p
mysql> SET PASSWORD FOR ‘user’@’hostname’ = PASSWORD(‘password’);
mysql> flush privileges;

Recover a MySQL root password :

  1. Stop the MySQL server process.
  2. Start again with no grant tables.
  3. Login to MySQL as root. Set new password.
  4. Exit MySQL and restart MySQL server.
    # /etc/init.d/mysql stop
    # mysqld_safe –skip-grant-tables
    # mysql -u root
    mysql> use mysql;
    mysql> update user set password=PASSWORD(“newpassword”) where User=’root’;
    mysql> flush privileges;
    mysql> quit
    # /etc/init.d/mysql stop
    # /etc/init.d/mysql start

Set a root password if there is no root password.
# mysqladmin -u root password newpassword

Update a root password.
# mysqladmin -u root -p oldpassword newpassword

Allow the user “Someone” to connect to the server from localhost using the password “passwd”. Login as root. Switch to the MySQL db. Give privs. Update privs.
# mysql -u root -p
mysql> use mysql;
mysql> grant usage on *.* to Someone@localhost identified by ‘passwd’;
mysql> flush privileges;

Give user privilages for a db. Login as root. Switch to the MySQL db. Grant privs. Update privs.
# mysql -u root -p
mysql> use mysql;
mysql>INSERT INTO user(Host,Db,User,Select_priv,Insert_priv,Update_priv,Delete_priv,Create_priv,Drop_priv)
VALUES (‘%’,’databasename’,’username’,’Y’,’Y’,’Y’,’Y’,’Y’,’N’);
mysql> flush privileges;

or

mysql> grant all privileges on databasename.* to username@localhost;
mysql> flush privileges;

To update info already in a table.
mysql> UPDATE [table name] SET Select_priv = ‘Y’,Insert_priv = ‘Y’,Update_priv = ‘Y’ where [field name] = ‘user’;

Delete a row(s) from a table.
mysql> DELETE from [table name] where [field name] = ‘fieldvalue’;

Update database permissions/privilages.
mysql> flush privileges;

Delete a column.
mysql> alter table [table name] drop column [column name];

Add a new column to db.
mysql> alter table [table name] add column [new column name] varchar (20);

Change column name.
mysql> alter table [table name] change [old column name] [new column name] varchar (50);

Make a unique column so you get no dupes.
mysql> alter table [table name] add unique ([column name]);

Make a column bigger.
mysql> alter table [table name] modify [column name] VARCHAR(3);

Delete unique from table.
mysql> alter table [table name] drop index [colmn name];

Load a CSV file into a table.
mysql> LOAD DATA INFILE ‘/tmp/filename.csv’ replace INTO TABLE [table name] FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’ (field1,field2,field3);

Dump all databases for backup. Backup file is sql commands to recreate all db’s.
# mysqldump -u username -ppassword –opt > /tmp/alldatabases.sql

Dump one database for backup.
# mysqldump -u username -ppassword –databases databasename > /tmp/databasename.sql

Dump a table from a database.
# mysqldump -u username -ppassword databasename tablename > /tmp/databasename.tablename.sql

Restore database (or database table) from backup.
# mysql -u username -ppassword databasename < /tmp/databasename.sql

Create Table Example 1.
mysql> CREATE TABLE employee (name VARCHAR(20));

Create Table Example 2.
mysql> create table person (personid int(50) not null auto_increment primary key,firstname varchar(35),middlename varchar(50),lastnamevarchar(50) default ‘UNKNOWN’);

Create and Insert to Hive table example

Create table :

hive> CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2));

OK
Time taken: 1.084 seconds

List tables :

hive> show tables;
OK
students
values__tmp__table__1
Time taken: 0.023 seconds, Fetched: 2 row(s)

Describe table :

hive> describe students;

OK

name                 varchar(64)

age                 int

gpa                 decimal(3,2)

Time taken: 0.052 seconds, Fetched: 3 row(s)

Insert sample data to the above created table :

hive> INSERT INTO TABLE students VALUES (‘ABC’, 35, 1.28), (‘DEF’, 32, 2.32), (‘KLM’, 37, 3.22);
Query ID = root_20180206225557_c24602db-d9bf-480c-ac98-ece3fe4381e8
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1517954394617_0001)


Show data from the above table

hive> select * from students;

OK

ABC 35 1.28

DEF 32 2.32

KLM 37 3.22

Time taken: 0.212 seconds, Fetched: 3 row(s)

Create and Insert to HBase table example

Login into master node :

[ec2-user@ip-123-45-67-89 ~]$ sudo hbase shell

HBase Shell; enter ‘help<RETURN>’ for list of supported commands.
Type “exit<RETURN>” to leave the HBase Shell
Version 1.3.1, rUnknown, Fri Sep 22 21:28:57 UTC 2017

hbase(main):001:0> list tables

Create table :

Command stntax

create ‘<table_name>’, ‘<column_family>’

Example

hbase(main):004:0> create ’employee_hbase’, ‘cf1’

Insert data into above table :

hbase(main):008:0> put ’employee_hbase’, ‘r1’, ‘cf1:empid’, ‘111’
hbase(main):008:0> put ’employee_hbase’, ‘r1’, ‘cf1:name’, ‘sudhir’
hbase(main):008:0> put ’employee_hbase’, ‘r1’, ‘cf1:dept’, ‘RnD’

Show data from table ;

hbase(main):015:0> get ’employee_hbase’, ‘r1’
COLUMN CELL
cf1:dept timestamp=1517963999636, value=RnD
cf1:empid timestamp=1517963967215, value=111
cf1:name timestamp=1517963994900, value=sudhir
1 row(s) in 0.0080 seconds

Count rows :

hbase(main):025:0> count ’employee_hbase’, { INTERVAL => 1 }

Where is emrfs-site.xml ?

The emrfs-site.xml is being create if the EMRFS is enabled when creating the EMR in AWS. You can manage other related configurations by logging into the master node and in the following location,

[ec2-user@ip-123-45-67-89 ~]$ ls -ltr /usr/share/aws/emr/emrfs/conf/emrfs-site.xml
-rw-r–r– 1 root root 609 Feb 6 21:59 /usr/share/aws/emr/emrfs/conf/emrfs-site.xml
[ec2-user@ip-123-45-67-89 ~]$

 

[ec2-user@ip-123-45-67-89 ~]$ cat /usr/share/aws/emr/emrfs/conf/emrfs-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

<configuration>

<property>
<name>fs.s3.consistent.retryPeriodSeconds</name>
<value>10</value>
</property>

<property>
<name>fs.s3.consistent.retryCount</name>
<value>5</value>
</property>

<property>
<name>fs.s3.consistent</name>
<value>true</value>
</property>

<property>
<name>fs.s3.consistent.metadata.tableName</name>
<value>EmrFSMetadataTest</value>
</property>

<property>
<name>fs.s3.maxConnections</name>
<value>10000</value>
</property>

</configuration>