Ito, Tomoki, Camacho Collados, Jose ORCID: https://orcid.org/0000-0003-1618-7239, Sakaji, Hiroki and Schockaert, Steven ORCID: https://orcid.org/0000-0002-9256-2881 2020. Learning company embeddings from annual reports for fine-grained industry characterization. Presented at: FinNLP-2020 @ IJCAI-PRICAI 2020: The Second Workshop on Financial Technology and Natural Language Processing, Yokohama, Japan, 11-13 July 2020. |
Preview |
PDF
- Accepted Post-Print Version
Download (210kB) | Preview |
Abstract
Organizingcompaniesbyindustrysegment(e.g.artificial intelligence, healthcare or fintech) is useful foranalyzingstockmarketperformanceandfordesigning theme base investment funds, among others. Current practice is to manually assign companies to sectors or industries from a small predefined list, which has two key limitations. First, due to the manual effort involved, this strategy is only feasible for relatively mainstream industry segments, and can thus not easily be used for niche or emerging topics. Second, the use of hard label assignments ignores the fact that different companies will be more or less exposed to a particular segment. To address these limitations, we propose to learn vector representations of companies based ontheirannualreports. Thekeychallengeistodistill the relevant information from these reports for characterizing their industries, since annual reports also contain a lot of information which is not relevant for our purpose. To this end, we introduce a multi-task learning strategy, which is based on fine-tuning the BERT language model on (i) existingsectorlabelsand(ii)stockmarketperformance. Experiments in both English and Japanese demonstrate the usefulness of this strategy.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Status: | In Press |
Schools: | Computer Science & Informatics |
Date of First Compliant Deposit: | 24 June 2020 |
Date of Acceptance: | 2 June 2020 |
Last Modified: | 26 Nov 2022 13:58 |
URI: | https://orca.cardiff.ac.uk/id/eprint/132755 |
Actions (repository staff only)
Edit Item |