Filter by Answers and Ideas
Hello, I am trying to populate below hive clustered ORC table. CREATE TABLE llos(id string,x string,y string,z string,rg string,buketkey string, cat int,scat int,usr string,org string,act int,ctm int,c1 string,c2 string,c3 string,d1 int,d2 int,doc binary) partitioned by (cdt int,catpartkey string,usrpartkey string) CLUSTERED BY (buketkey)
When I run a spark job on YARN in cluster mode, the driver and workers are all in YARN. There is still a process on the VM that ran Spark submit which is using resources though. Can this be avoided? When small clients launch many jobs, these processes can get pretty burdensome resource wise.