Smart AI-based systems can now not only spot crimes but also listen to gunshots, cries for help
Atul Rai, the founder and chief executive of Gurgaon-based Staqu, looks content. He recently bid for a tender by the Lucknow Smart City project for audio and video surveillance to improve security in the city, and is hopeful of bagging the project because he already has a solution called Jarvis that is already used by the Uttar Pradesh Police and other state police forces.
Staqu's Jarvis is not exactly the sentient system used in the movie Iron Man. But it does include technologies like closed circuit cameras (CCTVs) and artificial intelligence (AI)-based facial recognition. And now it's evolving, according to Rai.
In its new avatar, Jarvis doesn’t just use cameras to watch crimes happen; it also employs microphones to listen to the goings-on within a city. "We have used audio analytics to detect incidents such as prison fights in Uttar Pradesh on a pilot basis. Our target is to implement it in smart cities,” said Rai. The audio analytics tool is also being used by organizations in retail and manufacturing to detect distress sounds.
Staqu is one of the few companies in India that offers AI-based audio analytics tools. These systems can identify sounds like gunshots, a person’s scream or specific words that indicate distress. It uses convolutional neural networks (CNNs) to identify sound types in a scene. CNNs are typically used for image and video recognition, but in this case, they’re being used to discern patterns in sounds.
Theoretically, an audio surveillance system can alert the nearest hospital if an accident occurs, or contact the police if a group of people are discussing nefarious activities. “Every camera is capable of sending audio data using a mic. If a crime is being committed out of the field of view of this camera, audio can help in identifying if someone is in distress and needs help,” explained Rai.
According to Rai, there are parts to audio analysis. The first is to identify a scene using audio, such as fight, violence or screaming. The second is identifying a person from their voice if their face is not facing the camera. It can help in identifying people with prior criminal records through their voice even when they are out of prison. Staqu provides both solutions. Rai added that the Lucknow Smart City project has expressed interest in an audio and video solution and demos will be conducted soon, which will be followed by monetary discussions. Jarvis is also language independent and looks for specific sound symbols that can indicate distress or an accident, said Rai.
According to Rai, Jarvis’ accuracy has been tested against VoxCeleb--one of the largest audio visual datasets of human speech. He claimed the system is 98.7% accurate. The company is also working on a new natural language processing (NLP) based feature that will allow users to ask Jarvis for information and it will scan data across all the cameras.
To be sure, the use of audio symbols or voices for law enforcement has been gaining traction globally. In Europe, the Interpol built a speaker identification solution to identify criminals from voice samples back in 2018, while police forces in the US have reportedly been building databases of criminals’ voice samples as well.
That said, solutions such as these come with significant privacy concerns. Pam Dixon, founder and executive director of the World Privacy Forum, a public interest research group, cautions that “much will depend on how the system is set up, implemented, and used.” Dixon points out that even if we assume that these systems are without technical bias and accurate, will recordings be stored somewhere? “Where? And for how long?” she asked. “These kinds of monitoring systems need to be transparent and should clearly say what words and sounds are being listened for. The policies for these systems need to be in place before they are built and used,” she insists.
N.S. Nappinai, Supreme Court advocate concurs. “Even assuming it is necessary, can this be handled in some other way or is there an alternative that is less intrusive becomes an issue,” she noted. “India doesn’t have a regulatory framework for CCTV cameras that are already in place in multiple countries. The same rule applies for audio, so stakeholders are aware of what is permissible and what is not,” she added. Nappinai cautioned that the specific sounds these systems listen for may not the only data that is caputred. “We all saw what happened with digital assistants where private conversation was captured and people were listening to them,” she concluded.
According to Dixon, there are many laws in the European Union. "The US did finally pass some controls for law enforcement use of recordings. They are not perfect laws, but they have made a positive difference,” she said. India, note privacy experts, will have to balance the needs for security of the state with such regulations that prevent their misuse.