Banking On Data: Tech Tales From The TiH Data Centre
An online testing solution to assess the cognitive impact of Covid-19; an autonomous driving platform with high-performance computing capability, a portable screening tool for early cancer detection; a mobile traffic-challan generating system – these are a few of the applied research prototypes from the data collection efforts of the Technology Innovation Hub housed at the International Institute of Information Technology Hyderabad (IIITH).
A major part of the National Mission on Interdisciplinary Cyber-Physical Systems’ implementation is through 25 Technology Innovation Hubs (TiHs) that have been established in several top academic and national R&D institutes. One such TiH is at the International Institute of information Technology Hyderabad serving as a platform for executing the mission’s activities in the area of Big Data. According to Prof. Deva Priyakumar, i-Hub Data, Artificial Intelligence (AI) is breaking new frontiers and machine learning applications have found opportunities in domains such as health care, governance, mobility and more. However, in order to work effectively, AI needs data in large quantity and high quality. The mandate of the Data Foundation at IIITH is to address this. “Our domains of focus include Mobility, Healthcare, Buildings, Systems and India-specific datasets. As a start, we are focussed on Mobility and Healthcare,” he says. In order to realize the benefits of studying, researching and applying data, these initiatives are being planned in partnerships with other institutions and industry.
With the primary objective of the TiH being the custodian of data across various domains, large-scale datasets lie at the heart of the research originating here. The main aim of the Data Foundation is to build, publish and serve datasets that are geared towards decision making in data-driven research. To enable this, it will have the required infrastructure and build the required tools and platforms. Dr. Vikram Pudi who is the head of the Data Foundation says that by definition, this infrastructure is interdisciplinary in nature. “Hence, it requires collaboration across domain experts and computer scientists and carries with it all the challenges and opportunities of interdisciplinary research.” While processes for data collection, curation, annotation and processing of data are being put in place, he cautions that it is not feasible to generalise a system for (unknown) requirements in the future. “The process must be flexible taking into account the fact that requirements will emerge and evolve over time rather than being crisply defined upfront.”
Interestingly enough, IIITH has had a history of being a data bank of sorts. It was in 2017 that the Centre for Visual Information Technology (CVIT) at IIITH engaged in a collaborative tie-up with Intel for collecting traffic and road-related data. A year later in 2018, the dataset known as the India Driving Dataset (IDD) was released. “At that time, it was the largest dataset on unstructured driving conditions,” says Dr. Anbumani Subramanian, who is leading the Mobility efforts of the TiH initiative. “The IDD is a collection of annotated datasets with images from different parts of the country. We have over 4,000 users from 30 different countries who are using it,” he says, adding that the unique feature is its ability to run on any kind of laptop which is ideal for researchers lacking access to big compute. Similarly, in more recent times, the institute has been collecting data and maintaining a repository for Indian pathology, data related to Covid-19 patients as well as that necessary for data-driven drug discovery.
While utilising the existing traffic and road-based datasets and continuing to add to the data repository, the goal is to create applied research solutions that solve problems of national importance. “Applied research is our key focus in the context of mobility and road safety,” says Dr. Subramanian, adding, “Towards that, we have several research prototypes that are coming up which could be further taken up by start-ups or anyone else who wants to deploy them.”
There are a bunch of road monitoring projects in the pipeline that use computer vision and deep learning algorithms. One of the earliest is a solution to automatically count roadside trees and generate density maps of the same. The research idea originated from a request by the National Institute of Urban Management for an AI-enabled solution that could assist in city planning and urban afforestation. Similar other mobility-based solutions deal with pothole detection, road surface inspection, helmet and other traffic violations, enforcement of traffic rules through e-challans and more. In all these cases, the solutions do not require any complex, expensive setup. “With rather simple components like regular cameras, sometimes even mobile phone cameras or dash cams, one can collect data, bring it to our centre, use our algorithm and generate reports, “ says Dr. Subramanian. In the case of large-scale inspection of roads such as those that come under the National Highway Authority of India (NHAI), the usage of drones for capturing video footage is being explored. As a fillip for road safety, the Mobility team is also contemplating ways in which 2-and 3-wheelers can be fitted with low-cost safety devices that can warn drivers of distracted driving or other potential dangers.
Autonomous car platform
For anyone currently engaged in developing autonomous cars, there is great news in the form of an end-to-end software platform where their driving algorithms can be tested and validated. Equating it to a one-stop solution for all vehicles – assisted and autonomous, Dr. Subramanian says, “The car equipped with 6 cameras, a radar and several other sensors to capture data will be ready soon and we are only enabling the platform. It can either be used to benchmark data or to collect specific kinds of data based on requests from researchers or industry like night-time data or data from the rural areas and so on”. With the data and the compute provided by the hub, it promises to be an accessible haven for building algorithms related to assisted and autonomous driving.
Healthcare includes the following four focus areas: Public Health, Cancer, Neuro and Mental Health and Data-driven drug discovery and design. For all the four areas, the immediate focus is data collection of a reasonable size, development of machine learning algorithms and deploying solutions at the user level, which could be at the clinic, hospital or health centre. “It’s a three-stage process where data is very important and in many cases we see that there is no real actionable data available for developing automated solutions for many of the health conditions,” says Prof. Bapi Raju who is leading the healthcare initiative.
While explaining that the overall vision of the initiative is to scale healthcare and wellbeing through AI for the entire population, Prof. Bapi draws attention to communicable and non-communicable diseases. He remarks that over the last decade or so, there has been a shift in disease burden from communicable diseases like TB, malaria, diarrhea and others to non-communicable ones such as diabetes and hypertension with the latter claiming the top spots for mortality. “Our aim is to strike a balance between the two kinds of diseases. The focus of the public health vertical is more on disease surveillance and developing tech-enabled solutions to bring healthcare to rural primary healthcare centres (PHCs),” he says. This means providing opportunities to the rural folks for early screening of lifestyle and other disorders by connecting them with experts at the right time, enhancing the repertoire of ASHA workers and auxiliary nursing midwives through tech-enabled solutions and much more.
For disease surveillance of TB as well as Covid-19 in Telangana, a collaborative effort with the CSIR – Institute of Genomics and Integrative Biology (IGIB) is underway with the setting up of a microlab under the leadership of Dr. Anshu Sarje. The microlab will undertake genome testing and sequencing. According to Prof. Bapi, disease surveillance in the state via sample collections from district hospitals and the rural regions will be the first step. “We will do genome sequencing of preprocessed samples but the idea is to develop an end-to-end institute facility eventually. Combined with other facilities useful for working with cell cultures, bio-sensors, micro-fluidics and bio-instrumentation, the proposed bio-research and innovation will be useful for students and faculty who require bio-samples for developing algorithms and more,” he says.
An NGO called the Grace Cancer Foundation runs mobile cancer screening camps in remote rural areas of Telangana and Andhra Pradesh. A bus equipped with a mammography unit, x-ray facilities and other equipment for early detection of three cancers – breast, cervical and oral cancers, is staffed with primary healthcare workers who do the preliminary screening for detection of abnormalities. However, their current efforts are impeded by the sporadic availability of highly skilled medical professionals such as oncologists who can interpret the test results instantaneously on the bus. “If one of the 3 types of cancers is detected, a decision needs to be made whether to refer the patient to a health facility for further examination such as a biopsy to confirm whether it is cancer or not,” says Prof. Bapi Raju. It was to surmount this bottleneck that the Foundation approached IIITH for a technological solution to assist the health workers.
In the initial phase of developing the cancer screening solution, data will be collected in the presence of a doctor who can readily diagnose the x-rays and samples taken. The next phase will involve validation of the data and the final phase will witness actual deployment of our solution. With the development of this portable screening tool, the goal is to scale the reach of such health and screening camps. Alongside these efforts, the cancer-focus group led by Dr. Vinod P.K will pursue several digital pathology solutions particularly for lung cancer in collaboration with the Nizam’s Institute of Medical Sciences (NIMS) Pathology Centre. The aim is to collect histopathological samples at the centre to build a computer vision-based tool for automated screening of whole slide images.
Neuro and Mental health
In response to wide spread reports of cognition, memory and attention-loss related issues in Covid-recovered patients, one of the major efforts in this field deals with the study of cognitive impact on such patients. The faculty team of Dr. Priyanka Srivastava and Dr. Vishnu Sreekumar at the Cognitive Science lab at IIITH is designing an online web-based testing solution where a 30-40 minute session includes a demographic survey of participants’ mental health that is run in conjunction with a few cognitive tests assessing their memory, attention and executive functions – all of which are important components of cognitive functionality. Currently, as a pilot, data for this is being collected from students and the solution will be rolled out for them too. The plan is to gradually extend it to the general public and other Covid-related patients. “This is a tool we’re building based on suggestions given by clinicians to Dr. Kavita Vemuri. We would like to collect 500-800 participant data from healthy individuals as well as Covid-recovered patients. We want diversity in the data; hence we’re including those who contracted Covid with differing intensities, such as those who required hospitalization vs. those who recovered at home and so on,” says Prof. Bapi.
Another interesting project in collaboration with the National Institute of Mental Health and Neurosciences (NIMHANS) Neurology department and the Healthcare in AI (HAI) initiative at IIITH has seen the development of a tech solution for automatic staging of sleep. “It is known that disturbance in sleep is a pre-indicator of recurring stroke. A sleep lab at NIMHANS studies the stages of sleep and the percentage of time spent by individuals in each stage,” says Prof. Bapi. A deviation in the percentage of time spent in each stage is an indicator of sleep disturbance but the current process of evaluating sleep data is tedious and time-consuming. An automatic monitoring will be very helpful. In the same vein, a tool for identifying sleep apnea is also being considered.
Data driven drug discovery
The lifecycle of the development of a drug is a complex and expensive process typically spanning a period of over 12 years. The TiH aims to develop a data-driven solution with the collection of large datasets to shorten its lifecycle.
Building A Community
Over the next 5 years, the aim of the hub is to build and host 20 high-impact datasets along with many smaller datasets too. But alongside these efforts, plans are underway to build a sizeable research community of students, faculty, industry and startups. “TiH has many deliverables. While tech solutions, research translation, and applied research are no doubt important, so is also human resource creation,” remarks Prof. Bapi. In a bid to encourage current generation researchers to engage in data-driven technologies, a slew of initiatives is in the works. These include offering fellowships to researchers from tier-2 and 3 institutes, conducting workshops, hackathons, short term courses in the form of summer and winter schools, having exchange programs with international academia and industry and more. “Since the emphasis here is on developing solutions that are market-oriented, we are looking at creating market prototypes through the Entrepreneur-In-Residence (EIR) program offered by the Centre for Innovation and Entrepreneurship (CIE) and seeding new startups,” says Prof. Deva. He adds, “A great beginning has been made in the domains of Healthcare and Mobility. We will soon initiate India-centric projects like expanding the dataset on the Indian brain and building research applications for Smart Buildings.”
For more information on TiH, please click here.
Sarita Chebbi is a minimalist runner, practising yogi and baker of all things whole-wheat, and sugar-free. Currently re-learning her ABC’s…the one that goes: A for algorithm, B for Bayesian, C for convolutional (neural network)…