Simply providing a computer with massive amounts of data and expecting it to learn to perform a task is not enough. The data has to be presented in such a way that a computer can easily recognize patterns and inferences from the data. This is usually done by adding relevant metadata to a set of data. Any metadata tag used to mark up elements of the dataset is called an annotation over the input. The term data labelling is also used interchangeably with data annotation to refer to the technique of tagging labels in contents available in a range of formats. As such, there is no major difference between data labeling and data annotation, except the style and type of tagging the content or object of interest.
Both are used to create machine learning training data sets depending on the type of AI model development and process of training the algorithms for developing such models. Data annotation is basically the technique of labeling the data so that the machine could understand and memorize the input data using machine learning algorithms. Data labeling, also called data tagging, means to attach some meaning to different types of data in order to train a machine learning model. Labeling identifies a single entity from a set of data.