Inside Uber’s 350PB Data Lake: The Distcp Rewrite That 5x’d Performance
Inside Uber’s 350PB Data Lake
TLDR
250 TB to 1 PB per day. One quarter. Daily replication jobs jumped from 10,000 to 374,000. Uber’s data lake hit 350 PB and their copy tool couldn’t keep up. The P100 SLA of 4 hours became a joke.…


