atom feed14 messages in org.apache.hadoop.hbase-userMR sharded Scans giving poor performa...
FromSent OnAttachments
Vidhyashankar VenkataramanJul 26, 2010 2:43 pm 
Vidhyashankar VenkataramanJul 26, 2010 3:00 pm 
Ryan RawsonJul 26, 2010 3:01 pm 
Vidhyashankar VenkataramanJul 26, 2010 3:16 pm 
Xavier StevensJul 26, 2010 3:29 pm 
Ryan RawsonJul 26, 2010 3:37 pm 
Xavier StevensJul 26, 2010 3:55 pm 
StackJul 26, 2010 11:23 pm 
Vidhyashankar VenkataramanJul 28, 2010 10:30 am 
Ted YuJul 28, 2010 10:51 am 
Vidhyashankar VenkataramanJul 28, 2010 11:03 am 
Ted YuJul 28, 2010 11:44 am 
StackJul 28, 2010 11:45 am 
Jonathan GrayJul 28, 2010 11:50 am 
Subject:MR sharded Scans giving poor performance..
From:Vidhyashankar Venkataraman (
Date:Jul 26, 2010 2:43:12 pm

I am trying to assess the performance of Scans on a 100TB db on 180 nodes
running Hbase 0.20.5..

I run a sharded scan (each Map task runs a scan on a specific range: speculative
execution is turned false so that there is no duplication in tasks) on a fully
compacted table...

1 MB block size, Block cache enabled.. Max of 2 tasks per node.. Each row is 30
KB in size: 1 big column family with just one field.. Region lease timeout is set to an hour.. And I don't get any socket timeout
exceptions so I have not reassigned the write socket timeout...

I ran experiments on the following cases:

1. The client level cache is set to 1 (default: got he number using
getCaching): The MR tasks take around 13 hours to finish in the average.. Which
gives around 13.17 MBps per node. The worst case is 34 hours (to finish the
entire job)... 2. Client cache set to 20 rows: this is much worse than the previous case: we
get around a super low 1MBps per node...

Question: Should I set it to a value such that the block size is a
multiple of the above said cache size? Or the cache size to a much lower value?

I find that these numbers are much less than the ones I get when it's running
with just a few nodes..

Can you guys help me with this problem?

Thank you Vidhya