ERROR when writing file to S3 bucket from EMRFS enabled Spark cluster

ERROR :

18/03/02 01:42:17 INFO RetryInvocationHandler: Exception while invoking ConsistencyCheckerS3FileSystem.mkdirs over null. Retrying after sleeping for 10000ms. com.amazon.ws.emr.hadoop.fs.consistency.exception.ConsistencyException: Directory ‘bucket/folder/_temporary’ present in the metadata but not s3 at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:506)

 

Root cause :

Mostly the consistent problem comes due to

  • Manual deletion of files and directory from S3 console
  • retry logic in spark and hadoop systems.
  • When a process of creating a file on s3 failed, but it already updated in the dynamodb.
  • when the hadoop process restarts the process as the entry is already present in the dynamodb. It throws the consistent error.

Solution :

Try re-run your spark job by cleaning up the EMRFS metadata in dynamo db.

Follow the steps to clean-up & Restore the indended specific directory in the S3 bucket….

 

Deletes all the objects in the path, emrfs delete uses the hash function to delete the records, so it may delete unwanted entries also, so we are doing the import and sync in the consequent steps

Delete all the metadata

emrfs delete   s3://<bucket>/path

Retrieves the metadata for the objects that are physically present in s3 into dynamo db

emrfs import s3://<bucket>/path 

Sync the data between s3 and the metadata.

emrfs sync s3://<bucket>/path 

After all the operations, to see whether that particular object is present in both s3 and metadata

emrfs diff s3://<bucket>/path 
Advertisements

Amazon Aurora MySql Commands line

Connecting to a Database on a DB Instance Running the MySQL Database Engine

Once Amazon RDS provisions your DB instance, you can use any standard SQL client application to connect to a database on the DB instance. In this example, you connect to a database on a MySQL DB instance using MySQL monitor commands. One GUI-based application you can use to connect is MySQL Workbench; for more information, go to the Download MySQL Workbench page. For more information on using MySQL, go to the MySQL documentation.

 

To connect to a database on a DB instance using MySQL monitor

  1. Find the endpoint (DNS name) and port number for your DB Instance.
    1. Open the RDS console and then choose Instances to display a list of your DB instances.
    2. Choose the MySQL DB instance and choose See details from Instance actions to display the details for the DB instance.
    3. Scroll to the Connect section and copy the endpoint. Also, note the port number. You need both the endpoint and the port number to connect to the DB instance.Screen Shot 2018-02-27 at 11.46.17 AM
  2. Type the following command at a command prompt on a client computer to connect to a database on a MySQL DB instance using the MySQL monitor. Substitute the DNS name for your DB instance for <endpoint>, the master user name you used for <mymasteruser>, and the master password you used for <password>.
    spradhan.macosx$ mysql -h <endpoint> -P 3306 -u <mymasteruser>> -p

    You should see output similar to the following.

    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 350
    Server version: 5.6.27-log MySQL Community Server (GPL)
    
    Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
    
    mysql>

 

Basic MySQL Commandline operations

To login (from unix shell) use -h only if needed.
# [mysql dir]/bin/mysql -h hostname -u username -ppassword
To login (from windows)
mysql dir/bin/mysql.exe -h hostname -u username -ppassword

Create a database.
mysql> create database [databasename];

List all databases on the server.
mysql> show databases;

Switch to a database.
mysql> use [db name];

To see all the tables in the db.
mysql> show tables;

To see table’s field formats.
mysql> describe [table name];

To delete a db.
mysql> drop database [database name];

To delete a table.
mysql> drop table [table name];

Show all data from a table.
mysql> SELECT * FROM [table name];

To return columns and column information.
mysql> show columns from [table name];

Show particular rows with the given value.
mysql> SELECT * FROM [table name] WHERE [field name] = “value”;

Show all records containing the name “Something” AND the phone number ‘0123456789’.
mysql> SELECT * FROM [table name] WHERE name = “Something” AND phone_number = ‘0123456789’;

Show all records not containing the name “Something” AND the phone number ‘0123456789’ order by the phone_number field.
mysql> SELECT * FROM [table name] WHERE name != “Something” AND phone_number = ‘0123456789’ order by phone_number;

Show all records starting with the letters ‘Something‘ AND the phone number ‘0123456789’.
mysql> SELECT * FROM [table name] WHERE name like “Something%” AND phone_number = ‘0123456789’;

Show all records starting with letters ‘Something’ AND the phone number ‘0123456789’ limit to records 1 through 5.
mysql> SELECT * FROM [table name] WHERE name like “Something%” AND phone_number = ‘0123456789‘ limit 1,5;

Use a regular expression to find records. Use “REGEXP BINARY” to force case-sensitivity. This finds any record beginning with a.
mysql> SELECT * FROM [table name] WHERE rec RLIKE “^a”;

Show unique records.
mysql> SELECT DISTINCT [column name] FROM [table name];

Show selected records sorted in an ascending (asc) or descending (desc).
mysql> SELECT [col1],[col2] FROM [table name] ORDER BY [col2] DESC;

Return number of rows.
mysql> SELECT COUNT(*) FROM [table name];

Sum column.
mysql> SELECT SUM(*) FROM [table name];

Creating a new user. Login as root. Switch to the MySQL db. Make the user. Update privs.
# mysql -u root -p
mysql> use mysql;
mysql> INSERT INTO user (Host,User,Password) VALUES(‘%’,’username’,PASSWORD(‘password’));
mysql> flush privileges;

Change a users password from unix shell.
# [mysql dir]/bin/mysqladmin -u username -h hostname -ppassword ‘new-password’

Change a users password from MySQL prompt. Login as root. Set the password. Update privileges.
# mysql -u root -p
mysql> SET PASSWORD FOR ‘user’@’hostname’ = PASSWORD(‘password’);
mysql> flush privileges;

Recover a MySQL root password :

  1. Stop the MySQL server process.
  2. Start again with no grant tables.
  3. Login to MySQL as root. Set new password.
  4. Exit MySQL and restart MySQL server.
    # /etc/init.d/mysql stop
    # mysqld_safe –skip-grant-tables
    # mysql -u root
    mysql> use mysql;
    mysql> update user set password=PASSWORD(“newpassword”) where User=’root’;
    mysql> flush privileges;
    mysql> quit
    # /etc/init.d/mysql stop
    # /etc/init.d/mysql start

Set a root password if there is no root password.
# mysqladmin -u root password newpassword

Update a root password.
# mysqladmin -u root -p oldpassword newpassword

Allow the user “Someone” to connect to the server from localhost using the password “passwd”. Login as root. Switch to the MySQL db. Give privs. Update privs.
# mysql -u root -p
mysql> use mysql;
mysql> grant usage on *.* to Someone@localhost identified by ‘passwd’;
mysql> flush privileges;

Give user privilages for a db. Login as root. Switch to the MySQL db. Grant privs. Update privs.
# mysql -u root -p
mysql> use mysql;
mysql>INSERT INTO user(Host,Db,User,Select_priv,Insert_priv,Update_priv,Delete_priv,Create_priv,Drop_priv)
VALUES (‘%’,’databasename’,’username’,’Y’,’Y’,’Y’,’Y’,’Y’,’N’);
mysql> flush privileges;

or

mysql> grant all privileges on databasename.* to username@localhost;
mysql> flush privileges;

To update info already in a table.
mysql> UPDATE [table name] SET Select_priv = ‘Y’,Insert_priv = ‘Y’,Update_priv = ‘Y’ where [field name] = ‘user’;

Delete a row(s) from a table.
mysql> DELETE from [table name] where [field name] = ‘fieldvalue’;

Update database permissions/privilages.
mysql> flush privileges;

Delete a column.
mysql> alter table [table name] drop column [column name];

Add a new column to db.
mysql> alter table [table name] add column [new column name] varchar (20);

Change column name.
mysql> alter table [table name] change [old column name] [new column name] varchar (50);

Make a unique column so you get no dupes.
mysql> alter table [table name] add unique ([column name]);

Make a column bigger.
mysql> alter table [table name] modify [column name] VARCHAR(3);

Delete unique from table.
mysql> alter table [table name] drop index [colmn name];

Load a CSV file into a table.
mysql> LOAD DATA INFILE ‘/tmp/filename.csv’ replace INTO TABLE [table name] FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’ (field1,field2,field3);

Dump all databases for backup. Backup file is sql commands to recreate all db’s.
# mysqldump -u username -ppassword –opt > /tmp/alldatabases.sql

Dump one database for backup.
# mysqldump -u username -ppassword –databases databasename > /tmp/databasename.sql

Dump a table from a database.
# mysqldump -u username -ppassword databasename tablename > /tmp/databasename.tablename.sql

Restore database (or database table) from backup.
# mysql -u username -ppassword databasename < /tmp/databasename.sql

Create Table Example 1.
mysql> CREATE TABLE employee (name VARCHAR(20));

Create Table Example 2.
mysql> create table person (personid int(50) not null auto_increment primary key,firstname varchar(35),middlename varchar(50),lastnamevarchar(50) default ‘UNKNOWN’);

Create and Insert to Hive table example

Create table :

hive> CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2));

OK
Time taken: 1.084 seconds

List tables :

hive> show tables;
OK
students
values__tmp__table__1
Time taken: 0.023 seconds, Fetched: 2 row(s)

Describe table :

hive> describe students;

OK

name                 varchar(64)

age                 int

gpa                 decimal(3,2)

Time taken: 0.052 seconds, Fetched: 3 row(s)

Insert sample data to the above created table :

hive> INSERT INTO TABLE students VALUES (‘ABC’, 35, 1.28), (‘DEF’, 32, 2.32), (‘KLM’, 37, 3.22);
Query ID = root_20180206225557_c24602db-d9bf-480c-ac98-ece3fe4381e8
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1517954394617_0001)


Show data from the above table

hive> select * from students;

OK

ABC 35 1.28

DEF 32 2.32

KLM 37 3.22

Time taken: 0.212 seconds, Fetched: 3 row(s)

Solved: Hive work directory creation issue

 

Exception:

smartechie:~ sudhir.pradhan$ hive

hiveSLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/usr/local/Cellar/hive/2.3.1/libexec/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/usr/local/Cellar/hadoop/2.8.0/libexec/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/Cellar/hive/2.3.1/libexec/lib/hive-common-2.3.1.jar!/hive-log4j2.properties Async: trueException in thread “main” java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D at org.apache.hadoop.fs.Path.initialize(Path.java:254) at org.apache.hadoop.fs.Path.<init>(Path.java:212) at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:659) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582) at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:549) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:750) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:234) at org.apache.hadoop.util.RunJar.main(RunJar.java:148)Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D at java.net.URI.checkPath(URI.java:1823) at java.net.URI.<init>(URI.java:745) at org.apache.hadoop.fs.Path.initialize(Path.java:251) … 12 more

 

Solution :

<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive-${user.name}</value>
</property>

<property>
 <name>hive.exec.local.scratchdir</name>
 <value>/tmp/${user.name}</value>
</property>

<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/${user.name}_resources</value>
</property>

<property>
<name>hive.scratch.dir.permission</name>
    <value>733</value>
</property>

Unzip Multiple Files from Linux Command Line

Problem :

[hadoop@spradhan]$ unzip *.zip

Archive: a.csv.zip
caution: filename not matched: b.csv.zip
caution: filename not matched: c.csv.zip
caution: filename not matched: d.csv.zip
caution: filename not matched: e.csv.zip

Solution :

[hadoop@spradhan]$ unzip ‘*.zip’

If you run in background,

[hadoop@spradhan]$ nohup unzip ‘*.zip’ &

Copy file or folder from amazon S3 to EC2

  1. Install aws cli in the ec2 instance [if not installed]
  2. $ sudo yum install aws-cli

  3. Configure aws cli
  4. $ aws configure
    AWS Access Key ID [None]: <your_access_key>
    AWS Secret Access Key [None]:<your_secret_key>
    Default region name [None]:
    Default output format [None]:

  5. Execute sync command in ec2 instance
  6. aws s3 sync s3://<path_to_file> <ec2_local_path>

Hive : Unable to start metastore issue

Exception :

smartechie:confluent-3.3.0 sudhir.pradhan$ hive

SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/usr/local/Cellar/hive/2.1.0/libexec/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/usr/local/Cellar/hadoop/2.8.0/libexec/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/Cellar/hive/2.1.0/libexec/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: trueException in thread “main” java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:578) at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:234) at org.apache.hadoop.util.RunJar.main(RunJar.java:148)Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:226) at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:366) at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:310) at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:290) at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:266) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:545) … 9 moreCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1627) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:80) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:130) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:101) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3317) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3356) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3336) at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3590) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:236) at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:221) … 14 moreCaused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1625) … 23 moreCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused) at org.apache.thrift.transport.TSocket.open(TSocket.java:226) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:466) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:279) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1625) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:80) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:130) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:101) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3317) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3356) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3336) at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3590) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:236) at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:221) at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:366) at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:310) at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:290) at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:266) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:545) at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:234) at org.apache.hadoop.util.RunJar.main(RunJar.java:148)Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:221) … 31 more) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:513) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:279) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70) … 28 more

 

Solution : Remove all lock files, you should good to go.

rm metastore_db/*.lck

smartechie:confluent-3.3.0 sudhir.pradhan$ ls metastore_db/*.lck

smartechie:confluent-3.3.0 sudhir.pradhan$ ls metastore_db/*.lck

metastore_db/db.lck

dbex.lck