Technology of content categorisation based on the analysis of social media and social-psychological data using crowdsourcing platforms
Relevance of the project
Current models for predicting complex traits (political beliefs, social attitudes, psychological traits) of users based on social media are based on the analysis of marked-up data, assuming the primary labelling of the author of the text (or other content) as a carrier/non-carrier of the trait. However, the specificity of the data can significantly reduce the quality of the resulting social media analysis models. The solution to this problem is the formation of a technology for reliable markup of data necessary for training algorithms for analysing social media texts.
The technology proposed for development in the project is a technology for generating a reliable labelled data set for further use in training text analysis algorithms.
Project tasks:
-
To develop technological and methodological principles for generating reliable data sets
-
Formulate the requirements, capabilities and limitations of the measurement tools to be used.
This is necessary to assess complex traits and generate datasets that conform to the developed principles.
-
Create an algorithm for assessing data quality.
Data is collected using crowdsourcing platforms (ensuring that respondents who performed the proposed tasks in bad faith are weeded out).