Ito, Tomoki, Camacho Collados, Jose ORCID: https://orcid.org/0000-0003-1618-7239, Sakaji, Hiroki and Schockaert, Steven ORCID: https://orcid.org/0000-0002-9256-2881
2021.
Learning company embeddings from annual reports for fine-grained industry characterization.
Presented at: FinNLP-2020,
Kyoto, Japan,
11-13 July 2020.
Published in: Chen, C. -C., Huang, H. -H., Takamura, H. and Chen, H. -H. eds.
Proceedings of the Second Workshop on Financial Technology and Natural Language Processing.
pp. 27-33.
|
Preview |
PDF
- Published Version
Available under License Creative Commons Attribution. Download (232kB) | Preview |
Abstract
Organizingcompaniesbyindustrysegment(e.g.artificial intelligence, healthcare or fintech) is useful foranalyzingstockmarketperformanceandfordesigning theme base investment funds, among others. Current practice is to manually assign companies to sectors or industries from a small predefined list, which has two key limitations. First, due to the manual effort involved, this strategy is only feasible for relatively mainstream industry segments, and can thus not easily be used for niche or emerging topics. Second, the use of hard label assignments ignores the fact that different companies will be more or less exposed to a particular segment. To address these limitations, we propose to learn vector representations of companies based ontheirannualreports. Thekeychallengeistodistill the relevant information from these reports for characterizing their industries, since annual reports also contain a lot of information which is not relevant for our purpose. To this end, we introduce a multi-task learning strategy, which is based on fine-tuning the BERT language model on (i) existingsectorlabelsand(ii)stockmarketperformance. Experiments in both English and Japanese demonstrate the usefulness of this strategy.
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| Date Type: | Publication |
| Status: | Published |
| Schools: | Schools > Computer Science & Informatics |
| ISBN: | 9781713828273 |
| Date of First Compliant Deposit: | 24 June 2020 |
| Date of Acceptance: | 2 June 2020 |
| Last Modified: | 20 Nov 2025 10:20 |
| URI: | https://orca.cardiff.ac.uk/id/eprint/132755 |
Actions (repository staff only)
![]() |
Edit Item |





Download Statistics
Download Statistics