Convolutional neural networks are used to analyse discussion topics over time in an online forum for breast cancer patients

Abstract

Different information extraction applications depend on being able to identify the themes of discussions in online health communities (OHC), but doing so can be challenging because OHC content topics are frequently varied and domain-specific. For the purpose of categorizing discussion topics, we offer a multi-class schema, an annotated dataset, and supervised classifiers based on the convolutional neural network (CNN) and other models in this study. In order to exhibit subject distributions and topic dynamics during the course of members' engagement, we use the CNN classifier for the most well-known online breast cancer group. According to the findings of our experiments, CNN performs better than other classifiers at categorizing topics and identifying various patterns and trajectories.



1. Introduction

The use of the Internet in healthcare has changed how patients access and provide health-related information, opening up new eHealth perspectives [1]. Traditionally, healthcare professionals have been the primary source of information for individuals with life-threatening diseases. Support groups, and more recently online health communities (OHCs), can provide an additional source of support for patients whereas clinicians frequently concentrate on the clinical impact of the condition and may overlook the impact of the sickness on a patient's emotional wellness and everyday life.

Particularly popular with patients, public online health communities like the Breast Cancer Forum, the CSN network, and Facebook groups have generated an unprecedented amount of user-generated content that could be an important source for research on OHCs.

However, there are numerous difficulties in comprehending the vast volume of information written and read by members of the online health community. Some have to do with the information's accuracy as well as how community members use it to make decisions about their everyday activities and the management of diseases. Finding conversation topics is a fundamental content-related job that is crucial to downstream content analysis [10]. Prior studies claimed that subject, coupled with feeling, are two basic building blocks of content in relation to OHC. In this study, we specifically look into the dynamics and prevalence of conversation themes in a well-known online breast cancer forum. The endeavor is difficult since the themes discussed in such OHCs can differ from those in other forms of general-purpose communities like Facebook as well as those in other sorts of biomedical information like clinical notes. Topic classification has previously been a major problem with text mining in general. Although there have been a few studies on automated topic classification for online health communities, to the best of our knowledge, no studies have looked at the difficulty of multi-label classification as well as the temporal dynamics of member engagement.

The objectives of this study are to contribute an annotation schema for topic classification, annotate a dataset of sentences and posts in accordance with the coding schema, test different supervised classification tools, like convolutional neural networks, support vector machines, and labeled latent to automate the annotation process and examine the prevalence and dynamics of different discussion topics. The precise study questions we put out are as follows:

1. Which supervised learning method is most useful for categorizing discussion topics in an online health community?

2. What subjects are discussed the most frequently in the breast cancer forum?

3. Do patients with different cancer stages have varied topic foci?

4. As individuals stay active in the community for a longer period of time, how does the distribution of subjects alter over time?

1.1. Related work

In the past, staff has noted that in an online breast cancer community, subjects pertaining to fundamental tumor classifications or definitions and diagnosis are most frequently discussed, showing that in the early years, Internet support was primarily a complementary source of information. According to other studies carried out more recently, a variety of themes like relationship/family concerns became prominent in online peer talks, although disease-specific topics like therapy, diagnosis, and interpretation of lab test findings are still the most common. Additionally, specific debate points were noted. For instance, Meier and colleagues discovered through content analysis that the most popular subjects in 10 cancer mailing lists related to treatment information and how to interact with medical professionals.

2. Methods

Source of the data and the handling of the data

The IRB office at Columbia University gave its approval to our work. We used the community's publicly accessible discussion board from breastcancer.org. In January 2015, the complete discussion board's material was compiled. There are different forums on the discussion board, each with topics and posts. The data collection includes a total of 3,283,016 postings from 121,474 threads that were written by 58,177 members. The pre-processing processes below were completed.

Along with the author and the creation date, meta-data about the forum and the thread were also stored for each post. Each post's content underwent pre-processing to exclude all non-textual material.

In addition to the post text, we also gathered post signatures, which are patients' self-reported disease details (see Fig. 1 for an example). This includes the participants' histories of diagnoses and treatments. Keep in mind that not all members have access to this data. We will just use the stage of cancer as a disease variable in our study. We successfully gathered stage data for 7211 members in total.

To support cross-sectional and longitudinal analysis, use across the entire community

On the complete unannotated dataset's sentences, we ran the best-performing classifier. We assigned topic labels to each post that are connected to more than one-tenth of the sentences in that post. We can therefore determine (1) which topics are the most common overall based on the aggregated post-level topic labels.

in the neighborhood. If there are any things that members with different cancer stages disagree on. Since the cancer stage is one particular profile information that may be obtained most thoroughly from member signatures, we did not analyze any other characteristics in this study. We consider one specific criterion for each analysis: whether the post initiates a debate or relies on earlier posts. According to prior research, participants initiate conversations to request support, offer support, and provide feedback.

We also carried out the following longitudinal analysis to take timestamps into account, armed with topic labels for each post in the dataset. Assessing if community involvement has an effect on conversation subjects was the main goal of our investigation. In order to examine the distribution of post subjects with respect to the user's registration date over time, we also kept note of how they changed. Therefore, each data point represents the average frequency of a topic across all posts within a certain time slice (e.g., all posts published by their authors after 3 weeks of their joining the community). Three metrics of time progression—post, day, and week—are employed (shown as the x-axis) to display both short-term and long-term changes.

3. Results

The manually annotated dataset includes distributions of manual annotations and sample phrases for various topics. The most frequent subjects in our annotated dataset are Treatment and Miscellaneous sentences, whereas the least frequent topics are Alternative Medicine and tests. The fact that welcomes are typically used at the beginning and conclusion of entries, together with blessings and signatures, explains why there are so many miscellaneous sentences (all categorized as Miscellaneous in our coding).

Contact Us:-

Reach out to us: https://breastcancerpathology.universeconferences.com/
Mail: pathology@universeconferences.com| info@utilitarianconferences.com | breastcancer@ucgconferences.com
Whatsapp: +442033222718 Call: +12073070027

Previous Blog Post Links:-

· https://medium.com/@elizaedwards2021/10th-world-breast-pathology-and-breast-cancer-conference-6db6e4fc81c2

· https://kikoxp.com/posts/11632

· https://www.quora.com/profile/BreastCancerUCGConferences/10th-World-Breast-Pathology-and-Breast-Cancer-Conference-Dear-colleagues-and-guests-welcome-to-the-10th-World-Breast-Pa

· https://sites.google.com/d/1GjMkcTpPR0amj-1dp877__PsqdODuGcv/p/16KaMjrLJn3LkYn4nRYLnRefvkHvCqL0M/edit

· https://www.reddit.com/user/breastcancerucg1/comments/taq1kt/10th_world_breast_pathology_and_breast_cancer/

· https://www.blogger.com/blog/post/edit/3238443600245550728/7246086302346767315

· 10th World Breast Pathology and Breast Cancer Conference | LinkedIn

· https://medium.com/@elizaedwards2021/10th-world-breast-pathology-and-breast-cancer-conference-6886c15ccb37

· https://wordpress.com/post/breastpath2022.wordpress.com/6

· https://medium.com/@elizaedwards2021/breast-cancer-disease-f0324f19b8a2

· https://www.blogger.com/blog/posts/3238443600245550728

· https://www.reddit.com/user/breastcancerucg1/comments/th0lj8/breast_cancer_disease/

· https://www.blogger.com/blog/post/edit/3238443600245550728/5272365125212681129

· https://medium.com/@elizaedwards2021/breast-cancer-hysterectomy-and-removal-of-ovaries-and-tubes-954c834d8b0d

· https://kikoxp.com/posts/10351

· https://www.quora.com/profile/BreastCancerUCGConferences/Breast-cancer-hysterectomy-and-removal-of-ovaries-and-tubes-Its-likely-that-people-who-have-been-diagnosed-with-brea

https://qr.ae/pvi2Cl

https://www.tumblr.com/dashboard

https://medium.com/@Andreaross01/breast-cancer-in-men-326a71409c5

https://www.linkedin.com/pulse/breast-cancer-men-dr-priya-pujhari

https://www.linkedin.com/pulse/breast-cancer-symptoms-causes-dr-priya-pujhari

https://www.blogger.com/blog/posts/7151158548968050254

 

Comments

Popular posts from this blog

"Empowering Health: The Fight Against Non-Communicable Diseases"

عوامل خطر الإصابة بسرطان الثدي: مناقشة العوامل التي يمكن أن تزيد من خطر إصابة الشخص بسرطان الثدي، مثل العمر والتاريخ العائلي والطفرات الجينية (BRCA1/BRCA2) ، والعوامل الهرمونية، واختيارات نمط الحياة.

Methods of classification to boost breast cancer screening efficiency include:-