How to Label Data for Machine Learning in Python

Data labeling in Machine Learning (ML) is the process of assigning labels to subsets of data based on its characteristics. Data labeling takes unlabeled datasets and augments each piece of data with informative labels or tags. 

Most commonly, data is annotated with a text label. However, there are many use cases for labeling data with other types of labels. Labels provide context for data ranging from images to audio recordings to x-rays, and more.

Data Labeling Procedure

While data has traditionally been labeled manually, the process is slow and resource-intensive. Instead, ML models or algorithms can be used to automatically label data by first training them on a subset of data that has been labeled manually. 


One way to automate data labeling is to use a workflow that can identify when the labeling model has higher or lower confidence in its results, and pass the data to humans to do the labeling when lower confidence arises. The new human-generated labels can then be provided back to the labeling model for it to learn from and improve its ability to automatically label the next set of data.

how to label ML data workflow

Over time, the model will label more and more data automatically, and the process will accelerate. However, data labeling is often a slow and repetitive task. In order to streamline the process, various tools have been developed.  

How to Use Label Studio to Automatically Label Data

One automated labeling tool is Label Studio, an open source Python tool that lets you label various data types including text, images, audio, videos, and time series.

1. To install Label Studio, open a command window or terminal, and enter:

pip install -U label-studio


python -m pip install -U label-studio

2. To create a labeling project, run the following command:

label-studio init <project_name> 

Once the project has been created, you will receive a message stating:

Label Studio has been successfully initialized. Check project states in .\<project_name> Start the server: label-studio start .\<project_name>

3. To start the project run the following command:

label-studio start .\<project-name>


label-studio start <project-name>

The project will automatically load in your web browser at

how to label ML data workflow welcome.png
4. Click on the Import button to import your data from various sources.

how to label ML data workflow import data     

      Once the data is imported, you can scroll down the page and preview it.
how to label ML data workflow preview

 5. In the menu, click on Settings to continue:

how to label ML data workflow settings

You can now choose among the many options to finish setup for your specific project.

how to label ML data workflow configuration

