InetSoft Webinar: Can a Data Lake Be Based on Hadoop?

This is the continuation of the transcript of a Webinar hosted by InetSoft on the topic of "10 Biggest Big Data Trends."

And the last question, it's a good one, so even for the data lake, is there a single framework? Can a data lake be based on Hadoop? Or is that not even a good approach?

And I think that the answer here is again there is no one-size-fits-all, and companies do different things. We see Hadoop being used incredibly often as the data lake, and that is probably the default option, but that Netflix scenario Larry talked about earlier Amazon S3 is really the data like, and that is the whole data store.

Amazon announced a product called Athena which will enable querying against the data in S3. So now it's opening up the data store for more interactive connectivity, but companies are taking a multi-tiered approach, especially in the cloud where there is a simple storage bucket like an Amazon S3 or Azure Blob or Azure data lake with Microsoft. That's even a layer below Hadoop, and so there isn't a one-size-fits-all which is easy.

#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index Read More

Larry: And then one thing I would like to add that is the company should choose whatever technology allows them to build and consume at the same time, and this also ties back to the waterfall questions. Build first and consume later. That paradigm has been it for years. So regardless of whether it's an analytics platform you are evaluating or a warehouse, if this is allowing you to look at your data, provision it to users, answer questions all simultaneously, that's the best platform for you.

Explaining the Waterfall Method in the Context of Building and Supporting Enterprise Dashboards

The Waterfall methodology is a traditional project management approach that follows a linear and sequential process. Developed in the 1970s, it was one of the first formalized methods for software development and has since been applied to various types of projects. The key characteristic of the Waterfall method is its structured approach, where each phase of a project must be completed before moving on to the next. This linear progression is typically divided into distinct phases: requirements gathering, design, implementation, testing, deployment, and maintenance.

In the context of building and supporting enterprise dashboards, the Waterfall methodology can be an effective framework for managing the complexities and dependencies that often arise in such projects. Enterprise dashboards, which are crucial tools for data visualization and decision-making, require careful planning, design, and execution to ensure they meet the needs of the business and are scalable, reliable, and maintainable.

The Phases of the Waterfall Methodology Applied to Dashboard Projects

  1. Requirements Gathering

    The first phase of the Waterfall methodology is gathering requirements. In the context of enterprise dashboards, this phase is critical because it sets the foundation for the entire project. During this phase, project managers, business analysts, and stakeholders collaborate to define the objectives of the dashboard, identify the key metrics and data sources, and establish the user requirements.

    Effective requirements gathering involves understanding the business goals, the decisions that the dashboard will support, and the specific needs of the end-users. This phase should result in a detailed requirements document that outlines all the necessary features, functionalities, and performance criteria for the dashboard. This document serves as a reference throughout the project, ensuring that all stakeholders have a clear and shared understanding of the project's scope.

  2. System Design

    Once the requirements have been clearly defined, the next phase is system design. In this phase, the technical team, which may include data architects, UI/UX designers, and developers, begins to design the architecture of the dashboard. This involves defining the data models, selecting the appropriate tools and technologies, and designing the user interface.

    The design phase in the Waterfall method is comprehensive and detailed. For enterprise dashboards, this phase might include creating mockups and prototypes of the dashboard, designing the data pipelines and ETL (Extract, Transform, Load) processes, and specifying the security and access control measures. The goal is to create a blueprint that guides the development team during the implementation phase and ensures that all technical requirements are met.

  3. Implementation

    The implementation phase is where the actual development of the dashboard takes place. This phase involves coding, integrating data sources, developing the user interface, and setting up the infrastructure required to support the dashboard.

    In the Waterfall methodology, the implementation phase follows the design phase without any overlap. This means that the design must be finalized before any coding begins. For enterprise dashboards, this can be both an advantage and a disadvantage. On the one hand, it ensures that the development team has a clear and unchanging set of specifications to work from, which can reduce the risk of scope creep. On the other hand, it can be inflexible, as any changes to the requirements or design later in the process can be costly and time-consuming to implement.

    During implementation, developers also integrate the dashboard with various data sources, such as databases, data warehouses, or external APIs. This integration is crucial for ensuring that the dashboard can provide real-time or near-real-time data to users, enabling them to make informed decisions.

  4. Testing

    After the implementation phase is complete, the project moves into the testing phase. In the context of enterprise dashboards, testing is vital to ensure that the dashboard functions correctly, meets performance standards, and provides accurate data visualizations.

    The Waterfall methodology emphasizes thorough testing, which typically involves multiple levels of testing, including unit testing, integration testing, system testing, and user acceptance testing (UAT). Each level of testing serves a different purpose: unit testing ensures that individual components function correctly, integration testing checks that the various components work together as expected, system testing verifies that the dashboard meets all the specified requirements, and UAT involves end-users testing the dashboard to ensure it meets their needs.

    Testing in the Waterfall method is conducted after the implementation phase is complete, which can be a drawback if significant issues are discovered. Since the Waterfall method does not easily accommodate iterative feedback, fixing major issues during the testing phase can require revisiting earlier phases, leading to delays and increased costs.

  5. Deployment

    Once the dashboard has passed all testing phases, it is ready for deployment. In the Waterfall methodology, deployment is a critical phase where the dashboard is moved from the development environment to the production environment, making it accessible to end-users.

    For enterprise dashboards, deployment can involve several steps, such as configuring servers, setting up user accounts and access controls, and migrating data to the production environment. It is also important to conduct final checks to ensure that the dashboard functions as expected in the live environment and that any potential issues, such as performance bottlenecks or security vulnerabilities, are addressed.

    The Waterfall method's approach to deployment is usually a one-time event, meaning that once the dashboard is deployed, it is expected to be fully functional. However, this can also be a limitation, as any post-deployment issues might require significant effort to resolve, given the method's linear nature.

  6. Maintenance and Support

    The final phase of the Waterfall methodology is maintenance and support. After the dashboard has been deployed, it enters the maintenance phase, where it is monitored for performance, and any issues that arise are addressed. This phase can include activities such as bug fixes, updates to accommodate new data sources or business requirements, and performance tuning.

    For enterprise dashboards, the maintenance phase is particularly important because these tools are often used to support critical business decisions. Ensuring that the dashboard remains accurate, up-to-date, and responsive is essential for maintaining user trust and satisfaction.

    The Waterfall methodology's structured approach to maintenance can be beneficial in this context, as it provides a clear framework for managing updates and addressing issues. However, the method's lack of flexibility can also be a disadvantage, as it may not easily accommodate rapid changes or the need for frequent updates that are common in today's fast-paced business environments.

Advantages of the Waterfall Methodology for Enterprise Dashboard Projects

  1. Clear Structure and Documentation

    One of the primary advantages of the Waterfall methodology is its clear structure and emphasis on documentation. Each phase of the project is well-defined, with specific deliverables that must be completed before moving on to the next phase. This can be particularly beneficial for enterprise dashboard projects, where multiple stakeholders may be involved, and clear communication is essential.

    The detailed documentation produced during the Waterfall process serves as a valuable reference for the project team and can help ensure that the final product meets the original requirements. It also provides a clear record of the project's progress, which can be useful for managing risks and identifying potential issues early on.

  2. Predictability and Planning

    The linear nature of the Waterfall methodology makes it highly predictable. Because each phase must be completed before the next begins, project managers can develop detailed timelines and budgets with a high degree of confidence. This predictability is advantageous for enterprise dashboard projects, where precise planning is often required to align with broader business initiatives or regulatory requirements.

    Additionally, the Waterfall method's emphasis on upfront requirements gathering and design can help prevent scope creep and ensure that the project stays on track. By establishing a clear plan from the outset, the project team can avoid the frequent changes and revisions that can occur in more iterative methodologies.

  3. Quality Control

    The Waterfall methodology's emphasis on thorough testing and documentation can lead to higher quality deliverables. By conducting extensive testing after the implementation phase, the project team can identify and resolve issues before the dashboard is deployed to production. This can help ensure that the final product is robust, reliable, and meets the needs of the business.

Challenges of the Waterfall Methodology for Enterprise Dashboard Projects

  1. Inflexibility

    The most significant challenge of the Waterfall methodology is its inflexibility. Once a phase is completed, it is difficult and costly to go back and make changes. This can be a major drawback for enterprise dashboard projects, where requirements may evolve over time, or new data sources may need to be integrated.

    The rigid structure of the Waterfall method can also make it difficult to accommodate feedback from end-users. If issues are discovered during the testing or deployment phases, addressing them may require significant rework, leading to delays and increased costs.

  2. Risk of Delayed Feedback

    In the Waterfall methodology, feedback is typically gathered late in the project, during the testing or user acceptance testing phases. This can be problematic for enterprise dashboard projects, where early feedback from end-users is critical to ensure that the dashboard meets their needs and provides the right insights.

    Delayed feedback can lead to a situation where significant issues are only discovered late in the project, when they are more difficult and costly to address. This can result in a dashboard that does not fully meet the business's needs or requires extensive revisions after deployment.

  3. Longer Time to Market

    The sequential nature of the Waterfall methodology can lead to longer project timelines compared to more iterative approaches, such as Agile. For enterprise dashboard projects, where timely delivery is often crucial, this longer time to market can be a disadvantage.

    In fast-paced business environments, the ability to quickly develop and deploy dashboards that provide actionable insights is essential. The Waterfall method's emphasis on completing each phase before moving on to the next can slow down the development process, making it harder to respond to changing business needs or capitalize on emerging opportunities.

Previous: Great Success in Facilitating Data Exploration