Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Learning company embeddings from annual reports for fine-grained industry characterization

Ito, Tomoki, Camacho Collados, Jose ORCID: https://orcid.org/0000-0003-1618-7239, Sakaji, Hiroki and Schockaert, Steven ORCID: https://orcid.org/0000-0002-9256-2881 2020. Learning company embeddings from annual reports for fine-grained industry characterization. Presented at: FinNLP-2020 @ IJCAI-PRICAI 2020: The Second Workshop on Financial Technology and Natural Language Processing, Yokohama, Japan, 11-13 July 2020.

[thumbnail of IJCAI__PRICAI__20__Company2Vec.pdf]
Preview
PDF - Accepted Post-Print Version
Download (210kB) | Preview

Abstract

Organizingcompaniesbyindustrysegment(e.g.artificial intelligence, healthcare or fintech) is useful foranalyzingstockmarketperformanceandfordesigning theme base investment funds, among others. Current practice is to manually assign companies to sectors or industries from a small predefined list, which has two key limitations. First, due to the manual effort involved, this strategy is only feasible for relatively mainstream industry segments, and can thus not easily be used for niche or emerging topics. Second, the use of hard label assignments ignores the fact that different companies will be more or less exposed to a particular segment. To address these limitations, we propose to learn vector representations of companies based ontheirannualreports. Thekeychallengeistodistill the relevant information from these reports for characterizing their industries, since annual reports also contain a lot of information which is not relevant for our purpose. To this end, we introduce a multi-task learning strategy, which is based on fine-tuning the BERT language model on (i) existingsectorlabelsand(ii)stockmarketperformance. Experiments in both English and Japanese demonstrate the usefulness of this strategy.

Item Type: Conference or Workshop Item (Paper)
Status: In Press
Schools: Computer Science & Informatics
Date of First Compliant Deposit: 24 June 2020
Date of Acceptance: 2 June 2020
Last Modified: 26 Nov 2022 13:58
URI: https://orca.cardiff.ac.uk/id/eprint/132755

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics