Convolutional neural networks are used to analyse discussion topics over time in an online forum for breast cancer patients
Different
information extraction applications depend on being able to identify the themes
of discussions in online health communities (OHC), but doing so can be
challenging because OHC content topics are frequently varied and
domain-specific. For the purpose of categorizing discussion topics, we offer a
multi-class schema, an annotated dataset, and supervised classifiers based on the convolutional neural network (CNN) and other models in this study. In order to
exhibit subject distributions and topic dynamics during the course of members'
engagement, we use the CNN classifier for the most well-known online breast
cancer group. According to the findings of our experiments, CNN performs better
than other classifiers at categorizing topics and identifying various patterns
and trajectories.
The use of
the Internet in healthcare has changed how patients access and provide
health-related information, opening up new eHealth perspectives [1].
Traditionally, healthcare professionals have been the primary source of
information for individuals with life-threatening diseases. Support groups, and
more recently online health communities (OHCs), can provide an additional
source of support for patients whereas clinicians frequently concentrate on the
clinical impact of the condition and may overlook the impact of the sickness on
a patient's emotional wellness and everyday life.
Particularly
popular with patients, public online health communities like the Breast Cancer
Forum, the CSN network, and Facebook groups have generated an unprecedented
amount of user-generated content that could be an important source for research
on OHCs.
However,
there are numerous difficulties in comprehending the vast volume of information
written and read by members of the online health community. Some have to do
with the information's accuracy as well as how community members use it to make
decisions about their everyday activities and the management of diseases.
Finding conversation topics is a fundamental content-related job that is
crucial to downstream content analysis [10]. Prior studies claimed that
subject, coupled with feeling, are two basic building blocks of content in
relation to OHC. In this study, we specifically look into the dynamics and
prevalence of conversation themes in a well-known online breast cancer forum.
The endeavor is difficult since the themes discussed in such OHCs can differ
from those in other forms of general-purpose communities like Facebook as well
as those in other sorts of biomedical information like clinical notes. Topic
classification has previously been a major problem with text mining in general.
Although there have been a few studies on automated topic classification for
online health communities, to the best of our knowledge, no studies have looked
at the difficulty of multi-label classification as well as the temporal
dynamics of member engagement.
The
objectives of this study are to contribute an annotation schema for topic
classification, annotate a dataset of sentences and posts in accordance with
the coding schema, test different supervised classification tools, like
convolutional neural networks, support vector machines, and labeled latent to
automate the annotation process and examine the prevalence and dynamics of
different discussion topics. The precise study questions we put out are as
follows:
1. Which
supervised learning method is most useful for categorizing discussion topics in
an online health community?
2. What
subjects are discussed the most frequently in the breast cancer forum?
3. Do
patients with different cancer stages have varied topic foci?
4. As
individuals stay active in the community for a longer period of time, how does
the distribution of subjects alter over time?
In the past,
staff has noted that in an online breast cancer community, subjects pertaining
to fundamental tumor classifications or definitions and diagnosis are most
frequently discussed, showing that in the early years, Internet support was
primarily a complementary source of information. According to other studies
carried out more recently, a variety of themes like relationship/family
concerns became prominent in online peer talks, although disease-specific
topics like therapy, diagnosis, and interpretation of lab test findings are
still the most common. Additionally, specific debate points were noted. For
instance, Meier and colleagues discovered through content analysis that the
most popular subjects in 10 cancer mailing lists related to treatment
information and how to interact with medical professionals.
Source of
the data and the handling of the data
The IRB
office at Columbia University gave its approval to our work. We used the
community's publicly accessible discussion board from breastcancer.org. In
January 2015, the complete discussion board's material was compiled. There are
different forums on the discussion board, each with topics and posts. The data
collection includes a total of 3,283,016 postings from 121,474 threads that
were written by 58,177 members. The pre-processing processes below were
completed.
Along with
the author and the creation date, meta-data about the forum and the thread were
also stored for each post. Each post's content underwent pre-processing to
exclude all non-textual material.
In addition
to the post text, we also gathered post signatures, which are patients'
self-reported disease details (see Fig. 1 for an example). This includes the
participants' histories of diagnoses and treatments. Keep in mind that not all
members have access to this data. We will just use the stage of cancer as a
disease variable in our study. We successfully gathered stage data for 7211
members in total.
To support cross-sectional
and longitudinal analysis, use across the entire community
On the
complete unannotated dataset's sentences, we ran the best-performing
classifier. We assigned topic labels to each post that are connected to more
than one-tenth of the sentences in that post. We can therefore determine (1)
which topics are the most common overall based on the aggregated post-level
topic labels.
in the
neighborhood. If there are any things that members with different cancer
stages disagree on. Since the cancer stage is one particular profile information
that may be obtained most thoroughly from member signatures, we did not analyze
any other characteristics in this study. We consider one specific criterion for
each analysis: whether the post initiates a debate or relies on earlier posts.
According to prior research, participants initiate conversations to request
support, offer support, and provide feedback.
We also
carried out the following longitudinal analysis to take timestamps into account,
armed with topic labels for each post in the dataset. Assessing if community
involvement has an effect on conversation subjects was the main goal of our
investigation. In order to examine the distribution of post subjects with
respect to the user's registration date over time, we also kept note of how
they changed. Therefore, each data point represents the average frequency of a
topic across all posts within a certain time slice (e.g., all posts published
by their authors after 3 weeks of their joining the community). Three metrics
of time progression—post, day, and week—are employed (shown as the x-axis) to
display both short-term and long-term changes.
The manually
annotated dataset includes distributions of manual annotations and sample
phrases for various topics. The most frequent subjects in our annotated dataset
are Treatment and Miscellaneous sentences, whereas the least frequent topics
are Alternative Medicine and tests. The fact that welcomes are typically used at
the beginning and conclusion of entries, together with blessings and
signatures, explains why there are so many miscellaneous sentences (all
categorized as Miscellaneous in our coding).
Reach out to us: https://breastcancerpathology.universeconferences.com/
Mail:
pathology@universeconferences.com| info@utilitarianconferences.com | breastcancer@ucgconferences.com
Whatsapp: +442033222718 Call: +12073070027
Previous
Blog Post Links:-
· https://kikoxp.com/posts/11632
· https://www.blogger.com/blog/post/edit/3238443600245550728/7246086302346767315
· 10th World Breast Pathology and Breast Cancer Conference |
LinkedIn
· https://wordpress.com/post/breastpath2022.wordpress.com/6
· https://medium.com/@elizaedwards2021/breast-cancer-disease-f0324f19b8a2
· https://www.blogger.com/blog/posts/3238443600245550728
· https://www.reddit.com/user/breastcancerucg1/comments/th0lj8/breast_cancer_disease/
· https://www.blogger.com/blog/post/edit/3238443600245550728/5272365125212681129
· https://kikoxp.com/posts/10351
https://www.tumblr.com/dashboard
https://medium.com/@Andreaross01/breast-cancer-in-men-326a71409c5
https://www.linkedin.com/pulse/breast-cancer-men-dr-priya-pujhari
https://www.linkedin.com/pulse/breast-cancer-symptoms-causes-dr-priya-pujhari
https://www.blogger.com/blog/posts/7151158548968050254
Comments
Post a Comment