Talend SVP & GM Asia Pacific Stu Garrow discusses Shadow IT, its pitfalls and how to avoid its associated rising operational costs.
By Stu Garrow
Whenever an IT system, application or personal productivity tool is used inside an organisation without explicit organisational approval, we talk about shadow IT. Shadow IT is not only a security and compliance nightmare, it creates a data sprawl where each group can create its data silos.
Shadow IT, also known as Stealth IT or Client IT, are Information technology (IT) systems built and used within organisations without explicit organisational approval, for example, systems specified and deployed by departments other than the IT department.
According to a Cisco 2016 customer survey, there is 15-25x more used services without IT involvement in an organisation. Furthermore, the cloud services explosion is likely to accelerate this trend.
The more shadow IT develops, the harder it becomes for users to access and protect data. IDC estimates that data professionals spend 81% and waste 24% of their time searching, preparing and protecting data before they can actually take advantage of said data.
When data is not a team sport, everyone spends time creating silos and their version of truth, which drives up operational costs. Decisions are influenced by questionable data and ultimately put the organisation at risk.
IDC went even deeper in the analysis in a data governance webinar, highlighting the high frequency of spreadsheet usage by business users as a data integration tool. Data silos start here, as copy/paste is the most frequently used approach to bring data in.
To avoid shadow IT, equipping people with modern tools such as Talend Data Preparation is essential for avoiding creating those uncontrolled copies of data. Data citizens can then process data from sources to destination without keeping local storage or unknown or unprotected folders, systems, on premise storage or uncontrolled cloud-based storage. This is not acceptable anymore with the rise of regulations (Basel II, IRFS, GDPR, CCPA, etc.), whereby companies are mandated to take control of their data assets. If they don’t, companies run the risk of being non-compliant and being exposed to significant regulatory fines.
Data Control should take place everywhere: when the data enters the system, along data pipelines and at data consumption points thru apps, api or analytics. As more and more data professionals are getting closer to operations to drive business outcome “where the action is”, there is a growing risk of data fragmentation and misalignment. There is a need for a central organisation that can enable people with data in a governed way while tracking and tracing data flows through data lineage.
A recent data trust readiness report by Talend reveals that 46% of executives believe their organisation is always in control of data. That figure falls to 28% for data practitioners. That shows that the problem seems to be controlled at top level, but data workers are less confident.
Talend Data Catalog helps organisations to create a central, governed catalog of enriched data that can be shared and collaborated on easily. It can automatically discover, profile, organise and document organisations’ metadata and makes it easily searchable. Imagine that you find some inconsistent data in your data systems that have been created and perpetuated in one of your datasets and you are asked to explain it, identify it and correct it. The data lineage will dramatically accelerate your speed to resolution by helping you to spot the right problem at the right place. Moreover, if new datasets come to your data lake, establishing a data lineage will help you to identify these new sources very quickly.
But data lineage is not enough, organisations also have to cleanse the data without leaving local files on unsafe data systems. To avoid any shadow IT, equipping people with modern tools such as Talend Data Preparation is essential: it helps them to cleanse data while avoiding local data treatment files such as excel files. Data citizens will process data from sources to destination without keeping local storage or unknown or unprotected folders, systems, on premise storage or uncontrolled cloud-based storage.