Before we start: This Python tutorial is a part of our series of Python Package tutorials. You can find other Numpy related topics too!

Pandas and numpy work very well together. It is quite easy to transform a pandas dataframe into a numpy array. Simply using the to_numpy() function provided by Pandas will do the trick. 

If we wanted to turn an entire dataframe into a numpy array, we can simply use the following line of code:

df.to_numpy()

This will return us a numpy 2D array of the same size as our dataframe (df), but with the column names discarded. 

If we wanted to convert just the first two columns in a dataframe into a numpy array, we can use to_numpy() as follows:

df[["column1","column2"]].to_numpy()

If we wanted to convert just a few cells of data in a dataframe into a numpy array, we can simply slice the dataframe and convert the output using to_numpy():

df.iloc[[0,1,2],[0,1,2]].to_numpy()

The above code converts the first three columns and first three rows to a numpy array.

The following tutorials will provide you with step-by-step instructions on how to work with Numpy, including:

Get a version of Python that’s pre-compiled for Data Science

While the open source distribution of Python may be satisfactory for an individual, it doesn’t always meet the support, security, or platform requirements of large organizations.

This is why organizations choose ActivePython for their data science, big data processing and statistical analysis needs.

Pre-bundled with the most important packages Data Scientists need, ActivePython is pre-compiled so you and your team don’t have to waste time configuring the open source distribution. You can focus on what’s important–spending more time building algorithms and predictive models against your big data sources, and less time on system configuration.

Some Popular Python Packages for Data Science/Big Data/Machine Learning You Get Pre-compiled – with ActivePython

  • pandas (data analysis)
  • NumPy (multi-dimensional arrays)
  • SciPy (algorithms to use with numpy)
  • HDF5 (store & manipulate data)
  • Matplotlib (data visualization)
  • Jupyter (research collaboration)
  • PyTables (managing HDF5 datasets)
  • HDFS (C/C++ wrapper for Hadoop)
  • pymongo (MongoDB driver)
  • SQLAlchemy (Python SQL Toolkit)
Related Links