Programming models
Response needed 1
The paper is well placed on the issues of the yester years data integration methods and its drawbacks such as no ‘know-how’ guides, limited tools for the pain points, difficulty in customization etc. PyData as a integration tools come with added benefits and easy of doing over the traditional data integration and it is more cutting edge in present day data science field. One of the main reasons author gave for the success of the PyData is ease of interoperability. Which I accept as the user needs better tools for efficiency and affectivity. And PyData has also more varied packages available, more so better packages than many integration tools. All these lead to a close conclusion that PyData can be a possible tool for data integration.
With that being said, author also talked about making all data science community to focus on single system, i.e. PyData. Paper put forth the ideas building systems PyData software packages, foster PyDI as a part of PyData system and extend PyDI to cloud etc.
At this point, I think paper took a tangent forcing PyData as a monopoly is the field of data science. Paper didn’t have enough reasons as to why the systems of DI should be build in PyDI and PyData. It did mention the limitations of the Di systems of past and its packages but it has very less to say about the limitations of more recent and contemporary Data systems.
And coming to the second question if this integration approach helps and facilitate the adoption of ‘R’ programming model, paper is missing the subtle difference between PyData and R. Though both PyData and R functions to same ends it has specified distinctions in its users. That’s exactly the misconception. While programmers are more proficient with python, the data scientists use the R. And PyData to facilitate the adoption of R is trying to overlap the two similar entities with different usages. At the end both are tools are the need to replace one or other is not logical step. R programming is widely used and so is PyData. They can be complementing to each other and meet each other short comings.
Based on the ideas and the concepts presented in the given article, I would say that pydata is one of the most appropriate data integration systems. One of the reasons why I would back up the use of pydata is because it is open-source and has thousands of other interoperable python packages. Most of these packages can be applied to solve some of the common user problems in python (Mattmann, 2014). It has also been proven that pydata offers users a chance to exploit different capabilities. Other common data integration methods use different capabilities to solve single data integration problems. It would be hard for a user to incorporate all the capabilities into one single data integration system. with the use of pydata it would be to utilize different packages and solve the challenge using an iteration process.
Equally, pydata offers solutions to other integration problems that could not be solved using other integration methods. For example, some packages provided by pydata can be applied to solve challenges that arise from end-to-end data integration processes. Some of the existing integration systems can only support the production stage which makes it hard for the users to specify a proper workflow (Mattmann, 2014). In the context of database management, pydata can be applied in the production stage to help reduce problems that can arise in the development stage as a result of other data integration systems that cannot build a reliable product.