Which step of the data life cycle is concerned with pulling data together from a variety of sources

View Discussion

Improve Article

Save Article

  • Read
  • Discuss
  • View Discussion

    Improve Article

    Save Article

    In this article, we are going to discuss life cycle phases of data analytics in which we will cover various life cycle phases and will discuss them one by one.

    Data Analytics Lifecycle :
    The Data analytic lifecycle is designed for Big Data problems and data science projects. The cycle is iterative to represent real project. To address the distinct requirements for performing analysis on Big Data, step – by – step methodology is needed to organize the activities and tasks involved with acquiring, processing, analyzing, and repurposing data.

      Phase 1: Discovery –
    • The data science team learn and investigate the problem.
    • Develop context and understanding.
    • Come to know about data sources needed and available for the project.
    • The team formulates initial hypothesis that can be later tested with data.
      Phase 2: Data Preparation –
    • Steps to explore, preprocess, and condition data prior to modeling and analysis.
    • It requires the presence of an analytic sandbox, the team execute, load, and transform, to get data into the sandbox.
    • Data preparation tasks are likely to be performed multiple times and not in predefined order.
    • Several tools commonly used for this phase are – Hadoop, Alpine Miner, Open Refine, etc.
      Phase 3: Model Planning –
    • Team explores data to learn about relationships between variables and subsequently, selects key variables and the most suitable models.
    • In this phase, data science team develop data sets for training, testing, and production purposes.
    • Team builds and executes models based on the work done in the model planning phase.
    • Several tools commonly used for this phase are – Matlab, STASTICA.
      Phase 4: Model Building –
    • Team develops datasets for testing, training, and production purposes.
    • Team also considers whether its existing tools will suffice for running the models or if they need more robust environment for executing models.
    • Free or open-source tools – Rand PL/R, Octave, WEKA.
    • Commercial tools – Matlab , STASTICA.
      Phase 5: Communication Results –
    • After executing model team need to compare outcomes of modeling to criteria established for success and failure.
    • Team considers how best to articulate findings and outcomes to various team members and stakeholders, taking into account warning, assumptions.
    • Team should identify key findings, quantify business value, and develop narrative to summarize and convey findings to stakeholders.
      Phase 6: Operationalize –
    • The team communicates benefits of project more broadly and sets up pilot project to deploy work in controlled way before broadening the work to full enterprise of users.
    • This approach enables team to learn about performance and related constraints of the model in production environment on small scale  , and make adjustments before full deployment.
    • The team delivers final reports, briefings, codes.
    • Free or open source tools – Octave, WEKA, SQL, MADlib.

    Which step of the data life cycle is concerned with pulling data together from a variety of sources

    Which step of the data life cycle is concerned with pulling data together from a variety of sources

    Data is the pulse of your business. Both structured and unstructured data is being gathered at an alarming rate – to the tune of 1.145 TRILLION MB per day, by 4.66 BILLION internet users.

    But what happens to all that data?

    Data Lifecycle Management (DLM) is the process that follows data from creation to destruction, with each phase controlled by a set of policies customized to your business needs. Your data lifecycle management policies should reflect your compliance regulations, privacy standards, and your degree of data accessibility.

    Not sure how long you are obligated to keep your business data? Nervous about tossing something you may need someday?

    Are you a data hoarder?

    Do you hide hard drives in the bottom drawer of your desk ever since your IT department threatened you with data limits?

    Don’t be that guy.

    Strive for a Type A personality – manage the flow of data by making clean and transparent decisions about data ownership, governance, and analysis - and get rid of the piles of junk plaguing your servers. Become comfortable with archiving or deleting data to optimize your resources and maintain the integrity of your data.

    If you have the right Data Lifecycle Management processes in place, you are more prepared to designate data for high availability, extra security, or elimination. The ability to control your data and put it into categories will save you time, money, and IT resources.

     
    Which step of the data life cycle is concerned with pulling data together from a variety of sources
     

    Check out the 6 Data Lifecycle phases that support business data viability:

    1. Data creation

    This one may be beyond obvious but take a moment to see where the bulk of your data is generated. You would hope that valuable active data from research and development efforts, customer interactions on your website, data entry, shared/purchased data, financial data, and transactional data constitute the bulk of your data creation phase. But if you see that your employees are saving memes, Instagram screenshots, or YouTube videos on your servers and taking up cloud space, then you need to tighten your data policies and enforcement.

    1. Data maintenance & storage

    The quality and accuracy of your data are as important as its accessibility. Bad data can directly impact revenue, as poor data hygiene is a source of inefficiency. 77% of leaders don’t trust the data available to them to make complex business decisions. Once you have curated the data that will drive business decisions, how do you ensure it remains accurate and accessible? It seems like circular logic because it is. The constant cycling of data generation, analysis, integration, storage, and elimination gives Executives the quality data they need to make decisions. But that data maintenance cycle needs governance. 

    1. Data usage

    What is the value of your data? How are you synthesizing the results of data analytics? This is the phase where you align value with action. How is your data used and moved around your enterprise? Maybe you incorporate feedback from end-users into product enhancement opportunities? Roles need to be defined around who has access to sensitive data. 

    1. Data publication

    Data publication and sharing can create issues around compliance and security restrictions. While it is important to share your valuable insights and research, you need to control the way your data leaves your enterprise. And the way recipients engage with your publications needs to be tracked and evaluated. 

    1. Data archiving

    Data archiving strategies should be built around the utility and sensitivity of the data stored. Consider privacy, data ownership, legal requirements, and the length of time you need to keep that data. Archiving removes your data from your active environment but keeps it in deep storage should you need it again.

    1. Data destruction

    Free yourself by deleting active and archived data that no longer hold value to your organization. Not only does the data need to be appropriately destroyed, but you must adhere to internal governance policies and legal standards depending on the sensitivity of the data.

    Right-size your data with TBC

    TBConsulting is a full-service Managed Services Provider with 25 years of experience in data management and storage. We can help you optimize your data lifecycle management strategy with our data center and hybrid cloud solutions. Based in Phoenix, Arizona, TBC has been awarded one of the Top Places to Work in AZ and continues to attract top IT talent to support our contracts and develop forward-focused IT solutions that enable our clients to thrive.

    Contact TBConsulting today if you need help building a Data Lifecycle Management framework. Learn more about the importance of data-driven decision-making in TBC’s eBook.

    Which step of the data life cycle is concerned with pulling data together from a variety of sources

    What are the 5 stages of data lifecycle?

    Integrity in the Data LifeCycle.
    The 5 Stages of Data LifeCycle Management. Data LifeCycle Management is a process that helps organisations to manage the flow of data throughout its lifecycle – from initial creation through to destruction. ... .
    Data Creation. ... .
    Storage. ... .
    Usage. ... .
    Archival. ... .
    Destruction..

    What are the steps of the data cycle?

    Data Life Cycle Stages.
    Collection. Not all of the data that's generated every day is collected or used. ... .
    Processing. Once data has been collected, it must be processed. ... .
    Storage. After data has been collected and processed, it must be stored for future use. ... .
    Management. ... .
    Analysis. ... .
    Visualization. ... .
    Interpretation..

    Which phase involves gathering data from various sources and bringing it into the organization?

    Correct: The capture phase involves gathering data from various sources and bringing it into the organization.

    What are the 6 stages of the data analytics life cycle?

    According to Google, there are six data analysis phases or steps: ask, prepare, process, analyze, share, and act. Following them should result in a frame that makes decision-making and problem solving a little easier.