We believe in sharing knowledge and Open Source. This is why our standard training material is available both for download from this page and on our GitHub repository. Enter your email address to receive instructions on setting up. If you already know how to use GitHub just go to our repository and start contributing.
Open Data Science
«Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data»
\"Data Science\" is not just about Statistics and Machine Learning: it is about many different fields all converging together, it is an interdisciplinary activity in organizations that require knowledge and experience in everything from Data Management to High Performance Computing, from Interactive Graphics to distributed systems.
\"Open Data Science\" is an inclusive movement that makes Open Source tools for Data Science easily work together as a connected ecosystem, this is very critical because it makes it easy for organizations to adopt Open Source solutions.
Why modern organizations move to Open Data Science?
Costs: Invest in People, not in Licenses
The costs associated with Open Source are those related to the development of business processes, training and application-specific programming. In other words, you will spend your money in your organization and for your people, not to enrich other companies. By choosing an Open Source solution you wouldn't cut just the license costs but also all the costs related to the licence management.
Innovation and Software Maintenance
The proprietary software companies work under severe economic constraints. Most of the internal budget is invested in the manager's wages, marketing and commercial activities. The Open Source software is developed in much more efficient conditions: usually the people involved use the software in their own companies or are compensated for the services they provide to the final customer. This is a strong incentive to develop efficient and innovative software. Every single day people contribute. While Open Source code is made available, the community immediately starts to use it and give feedback to developers. This leads to a super-efficient bug correction process. On the contrary the proprietary software often needs one or more software releases to have the bugs and issues fixed.
Independence from the manufacturer
The Open Source software license protects the users from events like developer company bankruptcy or acquisition from a competitive company with different development plans. Even large commercial software companies are subject to similar events: enough to mention the cases of JDEdwars acquired by PeopleSoft, or Matrix X purchased by MathWorks Inc.
The Open Source adopters cannot be blackmailed to migrate to new software versions, or worse, forced to chance platform.
Adopting Open Data Science in Your organization
First Step: CO-EXISTENCE
This is a common starting point to move your company towards Open Data Science. It consists of identifying your brand new projects, whatever they are about big data, new use cases or HPC and develop them with an Open Data Science approach. Starting with a brand new project you don’t risk what is already in a production environment and when your team gets acquainted and trained on the new tools, it becomes very natural to move to the next stage: migration.
Second Step: MIGRATION
Now that your team is confident with the ODS tools and libraries it is easier to pick up the projects that are easier to migrate.
Last Step: RE-PLATFORM
The final stage in adopting Open Data Science is a re-platforming where you take your legacy Data Analytics and you move it completely over to the new technology.
At this point what you really don’t want to do is to simply refactor the code, because if you simply rewrite your code line-by-line you are missing many opportunities because most of the proprietary software has constraints that were built into the software, on the other side, the Open Source approach give you an unlimited access to hundreds of specialized scientific libraries and many options to write elegant and optimized code. You don’t have to break-up problems in ways you’ve done before. What you really do when you re-platform is to approach the analytics in a way that is ideal for you because you want to take advantage of not just the new libraries but also of the new data sources and the new computing environments.
The New Environment
Open Data Science: a full stack solution for all your team
The most important thing is that everybody in the data science team will participate: from early exploration to production. Modern roles are much more collaborative than before.
Modern Data science is about Interactivity, Collaboration and Integration
The New Software Environment
Licensing of Open Source is different than proprietari software: BSD allows commercial redistribution and is very liberal.
Python is a common language to pull together all these ODS projects and makes them available to enterprises. Python is trusted by many industry leaders.
Python is not the only language out there. In ODS there’s not a one-size fits all approach. There are many different languages available. What Python is really good at is to be a glue language that pulls together multiple languages, so you can leverage not only new libraries but also your legacy code by bringing new life in modern computer environment and applications.
- Python + IPython/Jupyter
- Machine learning
- Definitions and Advices
- Prepare the Data
- The scikit-learn interface
- Visualizing the Data
- Dealing with Bias and Variance
- Ensemble Methods
- Ensemble Methods Advanced
- Support vector machines (SVMs)
- Predict Temporal Series
- Forecasting with LSTM
- Theano Basic Concepts
- Explore Neural Network Hyperparameters with Theano and Keras
- Neural Networks with Nervana Neon library
- Tensorflow Basic concepts
- Explore Neural Network Hyperparameters with TensorFlow
- TensorFlow for beginners
- Keras - Theano Benchmark
- Neon Benchmark
- TensorFlow Benchmark
- Neural Network Benchmark Summary