QUESTION

profilerealityhub

 How do you efficiently design a Hive/Impala table considering the following facts?

- The table receives tool data of about 100 million rows every day. The date on which it receives the data is stored in a column in the table along with its tool id.

-  Each tool receives about 500 runs per day which is identified by column run id. Each run id contains data approximately of size 1 mb.

-  The default size of the block is 64 mb.

-  The table can be searched by date, tool id and run id in this order.

    • 5 years ago
    • 5
    Answer(0)