AnsweredAssumed Answered

Apache pig - nested block explanation

Question asked by Lokesh on Jul 27, 2017
Latest reply on Jul 28, 2017 by Murshid Chalaev

Hi All,

Below is the script from Pig Latin Reference Manual 2 

I have problem understand one small thing. Requesting help. Copied the script as-is from webpage.

 

A = LOAD 'data' AS (url:chararray,outline:chararray);  DUMP A; (www.ccc.com,www.hjk.com) (www.ddd.com,www.xyz.org) (www.aaa.com,www.cvn.org) (www.www.com,www.kpt.net) (www.www.com,www.xyz.org) (www.ddd.com,www.xyz.org)  B = GROUP A BY url;  DUMP B; (www.aaa.com,{(www.aaa.com,www.cvn.org)}) (www.ccc.com,{(www.ccc.com,www.hjk.com)}) (www.ddd.com,{(www.ddd.com,www.xyz.org),(www.ddd.com,www.xyz.org)}) (www.www.com,{(www.www.com,www.kpt.net),(www.www.com,www.xyz.org)})
X = foreach B {         FA= FILTER A BY outlink == 'www.xyz.org'; //This i believe is TYPO and should be outline.        PA = FA.outlink; //Same applies. must be FA.outline.         DA = DISTINCT PA;         GENERATE GROUP, COUNT(DA); }  DUMP X; (www.ddd.com,1L) (www.www.com,1L)
FA= FILTER A BY outlink == 'www.xyz.org'; This filter operation in nested block is where i have confusion.
Does this filter operation get each tuple from inner bag for each record in relation B?
I am finding it difficult to understand how the data flow happens between operations.

Thanks,
Lokesh

Outcomes