Adoption of data analytics in India is growing faster than anywhere in the world: Ram Venkatesh, Cloudera
The growing pace of digital transformation and cloud adoption has also led to the generation of exabytes of structured and unstructured data. US-based cloud and data management company Cloudera is providing solutions to enterprises to analyze this vast amount of unstructured data as that is where new insights for future business growth will come from. In an interview with TechCircle, Ram Venkatesh, chief technology officer (CTO), Cloudera, discussed India’s relevance to the firm, the growing demand for data management, and what role ChatGPT-like models will play in data analytics going forward. Edited excerpts:
How has post-pandemic digital transformation affected the data management business?
Companies have realized that if you are not digital you are not going to be around. That meant the amount of data being collected as part of companies' cloud operations has gone through the roof. Now they need a place to store that data. Post-pandemic, efficiency and cost considerations are making a comeback.
What role does India play for Cloudera? Has the relevance of India grown since the pandemic?
One of the interesting things about Cloudera’s open-source heritage is that a lot of our technology was developed and incubated in Bangalore. We have a significant presence in India in terms of customer support. On the business side, we are fascinated by the massive scale of digitization. We are watching the India Stack, as it could be an interesting model for other countries to explore. The underpinning of these technologies is open source, which gives us the place to participate.
India is extremely important. Digitalisation in India has happened faster than anybody expected. The adoption of data analytics is growing at a much faster pace than in other parts of the world.
Do you see generative AI models play an important role in data management?
Natural language processing has gone through a renaissance. Data analytics is going to be conversational. Building large language models of all the data in your company is going to become relevant. New ways to analyze data will emerge. If you think about reinforcement learning, which is at the core of ChatGPT, it requires an efficient platform to run and we can provide that.
There is a new engine called Ray, which is very useful for running language models. We enable support for it in our ecosystem. Enterprises will want a ChatGPT model over their data. That is where we are going.
Cloudera announced a ₹500 crore investment in India last year. How is that money being spent?
We have made investments in core open-source projects in India and other parts of the world. We need to hire and grow talent. Our Bangalore center is operational. One-third of our workforce is in India. (Cloudera employs over 2,700 people worldwide).
How is Cloudera handling the challenge of unstructured data?
Dealing with unstructured data has three dimensions. How do you make it simple for people to line data from wide sources into data layers? Having industry-consistent APIs helps us build ecosystems for people to collect data and build it under management. Apache plays a big role in technologies that are part of the Cloudera stack. The second challenge is how do you extract value from data. The third dimension is to make sure from a risk, security, governance, and compliance standpoint that all the belts and suspenders are in place.