Jorge Teixeira

Jorge Teixeira

Porto, Porto, Portugal
3 mil seguidores + de 500 conexões

Sobre

I have a passionate customer-centric attitude and a solid vision for product innovation…

Atividades

3 mil seguidores

See all activities

Experiência

  • Gráfico BrightFactory

    BrightFactory

    Porto, Portugal

  • -

    Porto, Portugal

  • -

    Porto Area, Portugal

  • -

    Porto, Portugal

  • -

    Porto Area, Portugal

  • -

    Porto Area, Portugal

  • -

    Porto Area, Portugal

  • -

    Porto Area, Portugal

  • -

    Aveiro, Portugal

  • -

    Porto Area, Portugal

  • -

  • -

  • -

    Porto Area, Portugal

Formação acadêmica

Publicações

  • POPSTAR at RepLab 2013: Name ambiguity resolution on Twitter

    CLEF2013

    Filtering tweets relevant to a given entity is an important task for online reputation management systems. This contributes to a reliable analysis of opinions and trends regarding a given entity. In this paper we describe our participation at the Filtering Task of RepLab 2013. The goal of the competition is to classify a tweet as relevant or not relevant to a given entity. To address this task we studied a large set of features that can be generated to describe the relationship between an…

    Filtering tweets relevant to a given entity is an important task for online reputation management systems. This contributes to a reliable analysis of opinions and trends regarding a given entity. In this paper we describe our participation at the Filtering Task of RepLab 2013. The goal of the competition is to classify a tweet as relevant or not relevant to a given entity. To address this task we studied a large set of features that can be generated to describe the relationship between an entity and a tweet. We explored different learning algorithms as well as, different types of features: text, keyword similarity scores between entities metadata and tweets, Freebase entity graph and Wikipedia. The test set of the competition comprises more than 90000 tweets of 61 entities of four distinct categories: automotive, banking, universities and music. Results show that our approach is able to achieve a Reliability of 0.72 and a Sensitivity of 0.45 on the test set, corresponding to an F-measure of 0.48 and an Accuracy of 0.908.

    Outros autores
    Ver publicação
  • Tokenizing Micro-Bloging Messages using a Text Classification Approach

    Proceedings of the 4th Workshop on Analytics for Noisy Unstructured Text Data AND'10

    The automatic processing of microblogging messages may be prob-
    lematic, even in the case of very elementary operations such as
    tokenization. The problems arise from the use of non-standard lan-
    guage, including media-specific words (e.g. “2day”, “gr8”, “tl;dr”,
    “loool”), emoticons (e.g. “(ò_ó)”, “(=ˆ-ˆ=)”), non-standard letter
    casing (e.g. “dr. Fred”) and unusual punctuation (e.g. “.... ..”,
    “!??!!!?”, “„,”). Additionally, spelling errors are abundant (e.g.
    “I;m”), and we…

    The automatic processing of microblogging messages may be prob-
    lematic, even in the case of very elementary operations such as
    tokenization. The problems arise from the use of non-standard lan-
    guage, including media-specific words (e.g. “2day”, “gr8”, “tl;dr”,
    “loool”), emoticons (e.g. “(ò_ó)”, “(=ˆ-ˆ=)”), non-standard letter
    casing (e.g. “dr. Fred”) and unusual punctuation (e.g. “.... ..”,
    “!??!!!?”, “„,”). Additionally, spelling errors are abundant (e.g.
    “I;m”), and we can frequently find more than one language (with
    different tokenization requirements) in the same short message.
    For being efficient in such environment, manually-developed rule-
    based tokenizer systems have to deal with many conditions and ex-
    ceptions, which makes them difficult to build and maintain. We
    present a text classification approach for tokenizing Twitter mes-
    sages, which address complex cases successfully and which is rel-
    atively simple to set up and maintain. For that, we created a cor-
    pus consisting of 2500 manually tokenized Twitter messages —
    a task that is simple for human annotators — and we trained an
    SVM classifier for separating tokens at certain discontinuity char-
    acters. For comparison, we created a baseline rule-based system
    designed specifically for dealing with typical problematic situa-
    tions. Results show that we can achieve F-measures of 96% with
    the classification-based approach, much above the performance ob-
    tained by the baseline rule-based tokenizer (85%). Also, subse-
    quent analysis allowed us to identify typical tokenization errors,
    which we show that can be partially solved by adding some addi-
    tional descriptive examples to the training corpus and re-training
    the classifier.

    Outros autores
  • Complete list of publications available at Google Scholar

    https://scholar.google.com/citations?user=EF9Otn0AAAAJ&hl=en

Projetos

  • LeanBigData (FP7)

    -

    LeanBigData will deliver a Big Data platform that is ultra-efficient, improving today’s best effort systems by at least one order of magnitude in efficiency, reducing the amount resources required to process a set of data or allowing us to process more data with the same amount of resources as today.

    Ver projeto
  • StreamLine (H2020)

    -

    STREAMLINE will address the competitive advantage needs of European online media businesses (EOMB) by delivering fast reactive analytics suitable in solving a wide array of problems, including addressing customer retention, personalized recommendation, and more broadly targeted services. STREAMLINE will develop cross-­sectorial analytics drawing on multi­‐source data originating from online media consumption, online games, telecommunications services, and multilingual web content. STREAMLINE…

    STREAMLINE will address the competitive advantage needs of European online media businesses (EOMB) by delivering fast reactive analytics suitable in solving a wide array of problems, including addressing customer retention, personalized recommendation, and more broadly targeted services. STREAMLINE will develop cross-­sectorial analytics drawing on multi­‐source data originating from online media consumption, online games, telecommunications services, and multilingual web content. STREAMLINE partners face big and fast data challenges. They serve over 100 million users, offer services that produce billions of events, yielding over 10TB of data daily, and possess over a PB of data at rest. Their business use-cases are representative of EOMB, which cannot be handled efficiently & effectively by state-of-the-art technologies, as a consequence of system and human latencies.

    Ver projeto
  • Máquina do Tempo

    -

    "Máquina do Tempo" (time machine) is a dynamic web tool that allows you to interactively navigate through the last 25 years of portuguese news until today. Networks of co-occurrences of public personalities on news are the starting point for such journey. Also, additional information such as jobs, roles and citations are available for more than 100 thousand personalities. All information is automatically extracted based on Natural Language Processing and Machine Learning techniques.

    Outros criadores
    Ver projeto
  • International Conference: New Job Opportunities in Translation and Interpreting - Challenges for University Programmes and Language Services Providers

    -

    Member of the Organizing Committee.

    An analysis and debate regarding the advancements of linguistic technology and its impact on language service providers. How does machine translation, vast amounts of available data, and a tightly connected society affect the work of professional linguists, translators and other inter-language workers, and how should they prepare for the current transformation.

    Outros criadores
    Ver projeto
  • Grande Área

    -

    Global stats on the World Cup. Real time comparison between teams on field. Search for teams or players and compare their performance on three different levels: efficiency, discipline and experience.

    Outros criadores
    Ver projeto
  • International Conference: Language and the Law - Bridging the Gaps

    -

    Member of the Organizing Committee.

    Language and the Law – Bridging the Gaps is the first International Conference to be jointly sponsored by ALIDI (the newly formed Association for Language and Law for Speakers of Portuguese) and the IAFL, (the International Association of Forensic Linguists).

    Outros criadores
    Ver projeto
  • Um País Como Nós

    -

    Um país como nós é uma ferramenta interativa que estabelece uma relação entre cada um de nós e os "números" das estatísticas do seu concelho e do país.

    Outros criadores
    Ver projeto
  • REACTION - Retrieval, Extraction and Aggregation Computing Technology for Integrating and Organizing News

    -

    REACTION (funded by FCT, UT Austin - Portugal Program) is an initiative for developing a computational journalism platform (mostly) for Portuguese.
    The project is developing information extraction, social media mining and information visualisation technologies for assisting journalists in the production of news articles.

    Role: "Web Community Sensing" work-package leader.

    Outros criadores
    Ver projeto
  • Twitteuro

    -

    A website that reflects international Twitter activity related to the Euro 2012 competition.
    It shows what teams are buzzing with interest, which players are the most popular, which game generates the most comments, and how people react to the events during the games.

    Outros criadores
    Ver projeto
  • CLUP Autumn School

    -

    Creation of a workshop on Forensic Linguistics based on Twitter messages.

    Outros criadores
  • International Conference: 3rd European Conference of the International Association of Forensic Linguists

    -

    Member of the Organizing Committee.

    3rd European Conference of the International Association of Forensic Linguists on the theme of Forensic Linguistics: Bridging the Gap(s) between Language and the Law

    Outros criadores
    Ver projeto
  • Twitómetro

    -

    A website depicting user interest and opinion on the candidates to the approaching elections.
    The data used originated on Twitter posts of Portuguese users.

    Outros criadores
    Ver projeto

Reconhecimentos e prêmios

  • Best Teacher Award for PGBIA - Business Intelligence and Analytics Postgraduate Programme

    Porto Business School

  • Time Machine: Entity-Centric Search and Visualization of News Archives

    Best Demo Award ECIR 2016

    "We present a dynamic web tool that allows interactive search and visualization of large news archives using an entity-centric approach. Users are able to search entities using keyword phrases expressing news stories or events and the system retrieves the most relevant entities to the user query based on automatically extracted and indexed entity profiles. From the computational journalism perspective, TimeMachine allows users to explore media content through time using automatic identification…

    "We present a dynamic web tool that allows interactive search and visualization of large news archives using an entity-centric approach. Users are able to search entities using keyword phrases expressing news stories or events and the system retrieves the most relevant entities to the user query based on automatically extracted and indexed entity profiles. From the computational journalism perspective, TimeMachine allows users to explore media content through time using automatic identification of entity names, jobs, quotations and relations between entities from co-occurrences networks extracted from the news articles. TimeMachine demo is available at http://maquinadotempo.sapo.pt/."

    Reference: Pedro Saleiro, Jorge Teixeira, Carlos Soares, Eugénio Oliveira, TimeMachine: Entity-Centric Search and Visualization of News Archives in Advances in Information Retrieval: 38th European Conference on IR Research, ECIR 2016, Padua, Italy, March 20-23, 2016., pp. 845-848, Springer International Publishing, 2016

  • "A Bootstrapping Approach for Training a NER with Conditional Random Fields"

    Nominee for Best Paper Award

    Jorge Teixeira, Luís Sarmento, Eugénio Oliveira. (2011) “A Bootstrapping Approach for Training a NER with Conditional Random Fields” Progress in Artificial Intelligence (LNAI 7026), 15th Portuguese Conference on Artificial Intelligence, EPIA 2011, Lisbon, Portugal, October 10-13

Idiomas

  • Portuguese

    Nível nativo ou bilíngue

  • English

    Nível avançado

  • Spanish

    Nível avançado

  • Italian

    Nível básico a intermediário

Organizações

  • New Job Opportunities in Translation and Interpreting - Challenges for University Programmes and Language Services Providers

    Member of the Organizing Committee

    - o momento
  • 3rd European Conference of the International Association of Forensic Linguists

    Member of the Organising Committee

    -

Veja o perfil completo de Jorge

  • Saiba quem vocês conhecem em comum
  • Apresente-se
  • Entre em contato direto com Jorge
Cadastre-se para ver o perfil completo

Outros perfis semelhantes

Adicione novas competências com estes cursos