<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd">
<article xml:lang="en" article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink">
<?release-delay 0|0?>
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">ETM</journal-id>
<journal-title-group>
<journal-title>Experimental and Therapeutic Medicine</journal-title>
</journal-title-group>
<issn pub-type="ppub">1792-0981</issn>
<issn pub-type="epub">1792-1015</issn>
<publisher>
<publisher-name>D.A. Spandidos</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3892/etm.2016.3285</article-id>
<article-id pub-id-type="publisher-id">ETM-0-0-3285</article-id>
<article-categories>
<subj-group>
<subject>Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Study of TCM clinical records based on LSA and LDA SHTDT model</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>LIN</surname><given-names>FAN</given-names></name>
<xref rid="af1-etm-0-0-3285" ref-type="aff"/></contrib>
<contrib contrib-type="author"><name><surname>ZHANG</surname><given-names>ZHIHONG</given-names></name>
<xref rid="af1-etm-0-0-3285" ref-type="aff"/>
<xref rid="c1-etm-0-0-3285" ref-type="corresp"/></contrib>
<contrib contrib-type="author"><name><surname>LIN</surname><given-names>SHU-FU</given-names></name>
<xref rid="af1-etm-0-0-3285" ref-type="aff"/></contrib>
<contrib contrib-type="author"><name><surname>ZENG</surname><given-names>JIA-SONG</given-names></name>
<xref rid="af1-etm-0-0-3285" ref-type="aff"/></contrib>
<contrib contrib-type="author"><name><surname>GAN</surname><given-names>YAN-FANG</given-names></name>
<xref rid="af1-etm-0-0-3285" ref-type="aff"/></contrib>
</contrib-group>
<aff id="af1-etm-0-0-3285">Software School of Xiamen University, Xiamen, Fujian 361009, P.R. China</aff>
<author-notes>
<corresp id="c1-etm-0-0-3285"><italic>Correspondence to</italic>: Zhihong Zhang, Software School of Xiamen University, General office 308B, Xiamen, Fujian 361009, P.R. China, E-mail: <email>zhihong@xmu.edu.cn</email></corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>07</month>
<year>2016</year></pub-date>
<pub-date pub-type="epub">
<day>20</day>
<month>04</month>
<year>2016</year></pub-date>
<volume>12</volume>
<issue>1</issue>
<fpage>288</fpage>
<lpage>296</lpage>
<history>
<date date-type="received"><day>20</day><month>10</month><year>2015</year></date>
<date date-type="accepted"><day>20</day><month>04</month><year>2016</year></date>
</history>
<permissions>
<copyright-statement>Copyright: &#x00A9; Lin et al.</copyright-statement>
<copyright-year>2016</copyright-year>
<license license-type="open-access">
<license-p>This is an open access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by-nc-nd/4.0/">Creative Commons Attribution-NonCommercial-NoDerivs License</ext-link>, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.</license-p></license>
</permissions>
<abstract>
<p>Description of syndromes and symptoms in traditional Chinese medicine are extremely complicated. The method utilized to diagnose a patient&#x0027;s syndrome more efficiently is the primary aim of clinical health care workers. In the present study, two models were presented concerning this issue. The first is the latent semantic analysis (LSA)-based semantic classification model, which is employed when the classification and words used to depict these classfications have been confirmed. The second is the symptom-herb-therapies-diagnosis topic (SHTDT), which is employed when the classification has not been confirmed or described. The experimental results showed that this method was successful, and symptoms can be diagnosed to a certain extent. The experimental results indicated that the topic feature reflected patient characteristics and the topic structure was obtained, which was clinically significant. The experimental results showed that when provided with a patient&#x0027;s symptoms, the model can be used to predict the theme and diagnose the disease, and administer appropriate drugs and treatments. Additionally, the SHTDT model prediction results did not yield completely accurate results because this prediction is equivalent to multi-label prediction, whereby the drugs, treatment and diagnosis are considered as labels. In conclusion, diagnosis, and the drug and treatment administered are based on human factors.</p>
</abstract>
<kwd-group>
<kwd>latent semantic analysis</kwd>
<kwd>tradition Chinese medicine diagnosis</kwd>
<kwd>potential Lejeune Dirichlet allocation model</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>Treatment based on syndrome differentiation (TBOSD) is the feature and essence of traditional Chinese medicine (TCM) (<xref rid="b1-etm-0-0-3285" ref-type="bibr">1</xref>,<xref rid="b2-etm-0-0-3285" ref-type="bibr">2</xref>) and is the principle that should be adhered to when making a diagnosis and administering treatment. It has also been proven by long-term medical practice that TBOSD has its specificity, superiority and necessity (<xref rid="b1-etm-0-0-3285" ref-type="bibr">1</xref>). Irrespective of whether the type of disease, TBOSD constitutes a flexible method that can be employed according to the individual patient&#x0027;s specific condition, which largely enriches the capability of handling diseases of TCM (<xref rid="b2-etm-0-0-3285" ref-type="bibr">2</xref>). Syndrome differentiation of TCM is the production of long-term clinical practice. There are many types of syndrome differentiation, including that of viscera, etiological analysis and syndrome differentiation of triple energizer (<xref rid="b3-etm-0-0-3285" ref-type="bibr">3</xref>).</p>
<p>Data mining is a method of extracting potentially useful information from a database. This process uses computer programs, automatically searches the database and identifies modes or rules (<xref rid="b3-etm-0-0-3285" ref-type="bibr">3</xref>). Networks can be used to describe the associations of individuals, kinships, and network connections via use of computer. Increasingly, investigators use networks in the medical field, and conduct searches on the connection of the brain function (<xref rid="b4-etm-0-0-3285" ref-type="bibr">4</xref>), propagations of the diseases (<xref rid="b5-etm-0-0-3285" ref-type="bibr">5</xref>), study of drug efficacy and drug targets (<xref rid="b6-etm-0-0-3285" ref-type="bibr">6</xref>), gene regulatory networks (<xref rid="b7-etm-0-0-3285" ref-type="bibr">7</xref>) and protein interactions (<xref rid="b8-etm-0-0-3285" ref-type="bibr">8</xref>).</p>
<p>The application of quantitative modes and data mining is developing rapidly. Decision tree, KNN, and bayes are classifying methods (<xref rid="b9-etm-0-0-3285" ref-type="bibr">9</xref>&#x2013;<xref rid="b12-etm-0-0-3285" ref-type="bibr">12</xref>) with their own approaches, and can be successfully employed in certain situations. However, TCM is a traditional medicine that captures the variations of the disease based on the concept of wholism. The traditional approaches did not reveal the meaning of the four diagnostic methods because the associations pertaining to the information are complex. TCM has strong correlation content, and is therefore processed more adequately in semantic space (<xref rid="b13-etm-0-0-3285" ref-type="bibr">13</xref>&#x2013;<xref rid="b15-etm-0-0-3285" ref-type="bibr">15</xref>). In the present study, two models are posited, depending on whether classifications have been confirmed according to different situations.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and methods</title>
<sec>
<title/>
<sec>
<title>Latent semantic analysis (LSA) based semantic classification model</title>
<p>The LSA model is used when a classification and the words thereof have been confirmed. The LSA-based semantic classification model of syndrome differentiation, is dependent on the feature of TCM whereby each syndrome and organ has their own major clinical manifestation collection (<xref rid="tI-etm-0-0-3285" ref-type="table">Table I</xref>). This model includes three major steps: i) Decomposition of the matrix of syndromes/organs and clinical manifestation using singular value decomposition (SVD), ii) construction of the semantic space of syndromes/organs and clinical manifestation, and iii) conducting semantic matching of syndromes and organs as per correlative degrees, which are in descending order.</p>
<p>If the syndrome has the highest correlative degree with a particular organ, the syndrome was classified into that organ.</p>
</sec>
<sec>
<title>Symptom-herb-therapies-diagnosis topic (SHTDT)</title>
<p>SHTDT is used in a situation where classification and the relevant words have yet to be confirmed. The core ideas of the SHTDT model posited in the present study involve the assumption that a patient has multiple combinations of symptoms and the corresponding TCM, diagnosis and treatment. The first step involves combination of the symptoms and TCM, extracting the symptoms theme and considering the treatment and diagnosis as the description of symptoms, and extract the multinomial distribution on the theme. The SHTDT model allows selection of drug therapy based on the specific symptom, the treatment selection for combating the combination of symptoms, the possible disease patients suffer from, and can predict the possible drugs, treatment and diagnosis for the patients.</p>
</sec>
<sec>
<title>LSA model</title>
<sec>
<title>Constructing model</title>
<p>As an algebraic model of information retrieval, LSA was suggested by Susan and other investigators working at Bell Telephone laboratories in 1988 (<xref rid="b16-etm-0-0-3285" ref-type="bibr">16</xref>&#x2013;<xref rid="b18-etm-0-0-3285" ref-type="bibr">18</xref>). It is a calculation theory and method that has been used for knowledge acquisition and representation. With 20 years of development, LSA, which has advantages including strong computability and a decreased requirement of patient involvement, surpasses the disadvantage of the vector space model (VSM) analytical method. In the present study, an LSA-based semantic classification model of syndrome differentiation was used (<xref rid="f1-etm-0-0-3285" ref-type="fig">Fig. 1</xref>).</p>
<p>The latent semantic space constructed by SVD was the core of the model. As the basic semantic meaning of syndromes and organs was described in the clinical manifestation collection, the semantic meaning of syndromes and organs is labeled with clinical manifestation collection. Subsequently, classification occurs in the latent semantic space and the correlative degrees with space vectors of syndromes and organs are computed and sorted, and the corresponding organ whose correlative degree is highest as the belonging class is selected.</p>
<p>This model is relatively easy to extend and can be used to classify any aspect on the condition that the classes and objects to be classified have the same description collection, such as the 5-classes classification (the 5 elements) in the present study.</p>
</sec>
<sec>
<title>SVD</title>
<p>The definition of SVD is as follows:</p>
<p><bold>i)</bold></p>
<disp-formula>
<alternatives>
<mml:math id="umml1" display="block"><mml:mrow><mml:mi>A</mml:mi><mml:mo>=</mml:mo><mml:mi>U</mml:mi><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:msup><mml:mi>V</mml:mi><mml:mi>T</mml:mi></mml:msup></mml:mrow></mml:mrow></mml:math>
<graphic xlink:href="etm-12-01-0288-g00.jpg"/>
</alternatives>
</disp-formula>
<p>where U is a mxm dimensional orthogonal matrix whose column vectors are left singular vectors of matrix A. V is an nxn dimensional orthogonal matrix whose row vectors are right singular vectors of matrix A. &#x03A3; is an mxn dimensional diagonal matrix whose elements, &#x03C3;1&#x2265;&#x03C3;2&#x2265;&#x2026;&#x2265;&#x03C3;r [r&#x2264; min (m, n)], are singular values of matrix A. Decomposition such as this can be applicable to any matrix. In addition, the rank of matrix A can be the total numbers of non-zero singular values. The definition is as follows:</p>
<p><bold>ii)</bold></p>
<disp-formula>
<alternatives>
<mml:math id="umml2" display="block"><mml:mrow><mml:mrow><mml:mo>&#x2016;</mml:mo><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>F</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2016;</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x2016;</mml:mo><mml:mrow><mml:mi>U</mml:mi><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:msup><mml:mi>V</mml:mi><mml:mi>T</mml:mi></mml:msup></mml:mrow></mml:mrow><mml:mo>&#x2016;</mml:mo></mml:mrow></mml:mrow><mml:mi>F</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x2016;</mml:mo><mml:mrow><mml:mo>&#x2211;</mml:mo><mml:mrow /></mml:mrow><mml:mo>&#x2016;</mml:mo></mml:mrow></mml:mrow><mml:mi>F</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mn>1</mml:mn><mml:mo>=</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x03B3;</mml:mi><mml:mi>A</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:msubsup><mml:mi>&#x03B4;</mml:mi><mml:mi>j</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:mrow></mml:msqrt></mml:mrow></mml:math>
<graphic xlink:href="etm-12-01-0288-g02.jpg"/>
</alternatives>
</disp-formula>
<p>where the top <italic><sup>&#x03B3; &#x0391;</sup></italic> columns of matrix U is based on the column vectors of matrix A, and the top <italic><sup>&#x03B3; &#x0391;</sup></italic> rows of matrix V are based on the row vectors of matrix A. To obtain the similar matrix of matrix A, &#x0391;<sub><italic>k</italic></sub> (<italic><sup>k &#x2264; &#x03B3; &#x0391;</sup></italic>), singular values (in addition to the k highest ones) are altered to zeros. As is shown in the theory of SVD by Brain (<xref rid="b19-etm-0-0-3285" ref-type="bibr">19</xref>), the distance between matrix A and its similar matrix is determined by minimizing similar matrix &#x0391;<sub><italic>k</italic></sub>. This is indicated as follows:</p>
<p><bold>iii)</bold></p>
<disp-formula>
<alternatives>
<mml:math id="umml3" display="block"><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x2016;</mml:mo><mml:mrow><mml:mi>A</mml:mi><mml:mo>&#x2013;</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2016;</mml:mo></mml:mrow></mml:mrow><mml:mi>F</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mo>min</mml:mo></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">rnak</mml:mtext><mml:mo stretchy="false">(</mml:mo><mml:mi>X</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x003C;</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:munder><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x2016;</mml:mo><mml:mrow><mml:mi>A</mml:mi><mml:mo>&#x2013;</mml:mo><mml:mi>X</mml:mi></mml:mrow><mml:mo>&#x2016;</mml:mo></mml:mrow></mml:mrow><mml:mi>F</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:msubsup><mml:mi>&#x03B4;</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mn>2</mml:mn></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:msubsup><mml:mi>&#x03B4;</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x002B;</mml:mo><mml:mn>2</mml:mn></mml:mrow><mml:mn>2</mml:mn></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x002B;</mml:mo><mml:msubsup><mml:mi>&#x03B4;</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>A</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:msqrt></mml:mrow></mml:math>
<graphic xlink:href="etm-12-01-0288-g03.jpg"/>
</alternatives>
</disp-formula>
<p>where &#x0391;<sub><italic>k</italic></sub>=<italic>U</italic><sub><italic>k</italic></sub> &#x03A3;<italic><sub>k</sub>V<sup>T</sup><sub>k</sub>, U<sub>k</sub></italic> is a tx k dimensional matrix whose columns are the top k columns of matrix U, and <italic><sup>V</sup><sub>k</sub></italic> is a dxk dimensional matrix whose rows are the top k rows of matrix V. &#x03A3; is a kxk dimensional diagonal matrix whose diagonal elements are the highest k singular values of matrix A. In the case that k is known, it is possible to identify the optimal similar matrix &#x0391;<sub><italic>k</italic></sub> by using SVD (<xref rid="b19-etm-0-0-3285" ref-type="bibr">19</xref>).</p>
</sec>
</sec>
<sec>
<title>Matrix generation and weighting function</title>
<sec>
<title>Matrix generation</title>
<p>Major features from samples of each organ are extracted, which represents a certain organ by degrees of vertices. The more degrees the vertex is, the higher the possibility the vertex is to be selected. As shown in <xref rid="tII-etm-0-0-3285" ref-type="table">Table II</xref>, palpitation has a high degree and is likely to be selected. The frequency matrix of syndromes and organs are then constructed based on <xref rid="tI-etm-0-0-3285" ref-type="table">Table I</xref>, as shown in <xref rid="tIII-etm-0-0-3285" ref-type="table">Table III</xref>. Subsequently, the frequency matrix in the first step with weighting function was processed, in order to calculate the final mxn dimensional matrix of syndromes and organs <italic><sup>X</sup> = [X<sub>ij</sub>]</italic>:</p>
<p><bold>iv)</bold></p>
<disp-formula>
<alternatives>
<mml:math id="umml4" display="block"><mml:mrow><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>11</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>12</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mo>&#x22EF;</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>21</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>22</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mo>&#x22EF;</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>2</mml:mn><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x22EE;</mml:mo></mml:mtd><mml:mtd><mml:mo>&#x22EE;</mml:mo></mml:mtd><mml:mtd><mml:mrow /></mml:mtd><mml:mtd><mml:mo>&#x22EE;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mo>&#x22EF;</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mtext mathvariant="italic">Xmn</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math>
<graphic xlink:href="etm-12-01-0288-g04.jpg"/>
</alternatives>
</disp-formula>
<p>where <italic>X<sub>ij</sub></italic> is the weight of the clinical manifestation <italic>i</italic> in syndromes/organs <italic>j</italic>; row vector <italic>X<sub>i</sub></italic> = [<italic>X<sub>i</sub></italic><sub>1</sub>, <italic>X<sub>i</sub></italic><sub>2</sub>, &#x2026;, <italic>X<sub>i</sub></italic><sub>n</sub>], <italic>i</italic> = {1, 2, &#x2026; <italic>m</italic>} is the weight of clinical manifestation <italic>i</italic> in each syndrome/organ corresponding to one row of matrix <italic>X</italic>; column vector <italic>X<sub>j</sub> =</italic> [<italic>x<sub>1j</sub>, x<sub>2j</sub>, &#x2026;, x<sub>mj</sub></italic>]<italic><sup>T</sup>, J = {1, 2, &#x2026;, N}</italic> is the syndromes/organs vector corresponding to one column of matrix <italic>X</italic> (<xref rid="b20-etm-0-0-3285" ref-type="bibr">20</xref>,<xref rid="b21-etm-0-0-3285" ref-type="bibr">21</xref>).</p>
</sec>
<sec>
<title>Weighting function</title>
<p>The weight in traditional vector space is obtained using the method term frequency/inverse document frequency (TF/IDF) from statistical computing on the marked frequency of clinical manifestation in syndromes/organs. However, the simple structure TF/IDF cannot effectively provide the expression that indicates the importance and distribution of clinical manifestation. Therefore, it is inappropriate to continue using TF/IDF in the LSA-based semantic annotation. Thus, the improved computing method shown earlier (<xref rid="b22-etm-0-0-3285" ref-type="bibr">22</xref>) was employed to compute the weight, which was divided into a) the global weight of clinical manifestation, and b) the global weight of syndromes/organs.</p>
</sec>
<sec>
<title>Global weight of clinical manifestation</title>
<p>Global weight of clinical manifestation, marked <italic>T<sub>w</sub></italic> (<italic>i</italic>), is defined as:</p>
<p><bold>v)</bold></p>
<disp-formula>
<alternatives>
<mml:math id="umml5" display="block"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi>W</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2248;</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2013;</mml:mo><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mfrac><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mi>R</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:munderover><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mrow><mml:mo>log</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mfrac><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mi>R</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:munderover><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>
<graphic xlink:href="etm-12-01-0288-g06.jpg"/>
</alternatives>
</disp-formula>
<p>where |R| is the quantity of syndromes/organs as the total frequencies of clinical manifestation Ti in the whole syndromes/organs collection Rj. H(R|Ti) can be zero. Adding &#x2018;&#x002B;1&#x2019; created a positive number.</p>
</sec>
<sec>
<title>Global weight of syndromes/organs</title>
<p>The global weight of syndromes/organs, marked-<italic>R<sub>w</sub></italic> (<italic>j</italic>), was:</p>
<p><bold>vi)</bold></p>
<disp-formula>
<alternatives>
<mml:math id="umml6" display="block"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>W</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2248;</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x002B;</mml:mo><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mfrac><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mi>T</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:munderover><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mrow><mml:mo>log</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mfrac><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mi>T</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:munderover><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math>
<graphic xlink:href="etm-12-01-0288-g07.jpg"/>
</alternatives>
</disp-formula>
<p>where |T| is the quantity of clinical manifestation; &#x03A3;<sup>|</sup><italic><sup>&#x03B3;</sup></italic><sup>|</sup><sub>i=1</sub> <italic>P</italic>(i, j) is the total frequency of clinical manifestation Ti in syndromes/organs Rj. Adding &#x2018;&#x002B;1&#x2019; in the formula <italic>R<sub>w</sub></italic> (<italic>j</italic>) created a positive number.</p>
</sec>
<sec>
<title>Definition of weighting function</title>
<p>The weighting function is composed by equations (v) and (vi) and was defined as:</p>
<p><bold>vii)</bold></p>
<disp-formula>
<alternatives>
<mml:math id="umml7" display="block"><mml:mrow><mml:mi>W</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi>W</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>W</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math>
<graphic xlink:href="etm-12-01-0288-g08.jpg"/>
</alternatives>
</disp-formula>
<p>The advantage of the weighting function is that it considers the TCM organ classification in its entirety: a) Each syndrome/organ is regarded as a point in the space that uses clinical manifestation as the dimension, and b) each clinical manifestation is regarded as a point in the space that uses syndrome/organ as the dimension.</p>
</sec>
<sec>
<title>Correlation calculation</title>
<p>Correlation calculation is considered based on the formula of similarity calculation. Commonly used similarity calculating formulas are the inner product formula, Pearson formula, Dice coefficient method formula, Jaccard coefficient method formula and cosine formula (<xref rid="b22-etm-0-0-3285" ref-type="bibr">22</xref>). As the information of the syndromes and organs was expressed as vectors, the cosine formula was used to calculate the degree of correlation. The syndromes and organ vectors D processed by LSA, were divided into part of a) organ, and b) syndrome vectors:</p>
<p><bold>viii)</bold></p>
<disp-formula>
<alternatives>
<mml:math id="umml8" display="block"><mml:mrow><mml:mi>D</mml:mi><mml:mo>=</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msubsup><mml:mi>S</mml:mi><mml:mi>k</mml:mi><mml:mrow><mml:mo>&#x2013;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:math>
<graphic xlink:href="etm-12-01-0288-g09.jpg"/>
</alternatives>
</disp-formula>
<p>Therefore, the degree of correlation was calculated in the k-dimension semantic space with <italic>D<sub>d</sub></italic> and <italic>D<sub>q</sub></italic>. The formula used to calculate <italic>C<sub>q</sub></italic> was:</p>
<p><bold>ix)</bold></p>
<disp-formula>
<alternatives>
<mml:math id="umml9" display="block"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>q</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mtext mathvariant="italic">sim</mml:mtext><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mi>q</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>k</mml:mi></mml:munderover><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>&#x22C5;</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>k</mml:mi></mml:munderover><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac></mml:mrow></mml:math>
<graphic xlink:href="etm-12-01-0288-g10.jpg"/>
</alternatives>
</disp-formula>
<p>where <italic>sim</italic> (<italic>D<sub>q</sub></italic>, <italic>D<sub>dj</sub></italic>) was the angle cosine value of the syndrome q vector and organ vector dj. It was determined that the bigger the value, the greater the degree of correlation.</p>
</sec>
</sec>
<sec>
<title>SHTDT model</title>
<sec>
<title>Lejeune Dirichlet allocation (LDA)</title>
<p>Given a document collection, LDA expresses each document as a theme set, each topic is a multinomial distribution and is used for capturing the relevant information between words (<xref rid="b23-etm-0-0-3285" ref-type="bibr">23</xref>). In the LDA, the themes are shared by all the documents and embodied by the specific vocabulary in the text (<xref rid="b24-etm-0-0-3285" ref-type="bibr">24</xref>). Therefore, the implicit theme may be considered as the probability distribution of the vocabulary, and a single document as the mixture of the implicit theme in specific proportion.</p>
<p>The LDA is a modeling method of the text theme information using probability (<xref rid="b25-etm-0-0-3285" ref-type="bibr">25</xref>). As shown in <xref rid="f2-etm-0-0-3285" ref-type="fig">Fig. 2</xref>, it contains the words, topics and document of three institutions. The (&#x03B1;, &#x03B2;) is the parameters of the document collection layer, which determines the LDA model. In the document collection, &#x03B1; is used to describe the relative strength between the themes, &#x03B2; is used to describe the probability distribution of the implicit theme, and &#x03B8; constitutes a document layer parameter, with the component of &#x03B8; indicating the weight of each implicit theme of the target. The (<italic>z</italic>, <italic>w</italic>) constitutes a word layer parameter, <italic>z</italic> is the share of implicit theme each word accounts for, and <italic>w</italic> denotes the word vector of the target document.</p>
</sec>
<sec>
<title>SHTDT model</title>
<p>In the theory framework of the theme model and the background of the application of the TCM (<xref rid="b24-etm-0-0-3285" ref-type="bibr">24</xref>), an SHTDT model, inspired by the hidden structure method was established (<xref rid="f3-etm-0-0-3285" ref-type="fig">Fig. 3</xref>). The four dark circles are the significant variables, <italic>w</italic> is the sample symptoms, m is the corresponding TCM for a collection of symptoms, <italic>t</italic> denotes the corresponding treatment under the theme, <italic>t</italic> denotes the corresponding diagnosis set under the theme, open circles are the hidden variables, the outermost rectangles is the number of patients, the internal rectangular box indicates the sample of N types of symptoms and their corresponding themes and drugs with regard to the patient, the sample of the treatment methods and the diagnosis in the corresponding theme concerning the patient.</p>
</sec>
<sec>
<title>Estimating the SHTDT model parameters using the Gibbs sampling method</title>
<p>Given the symptoms of the first n (<italic>w<sub>i</sub>=n</italic>) and the drug vector m of the current patient, the Gibbs method was used to estimate the probability of current symptoms assigned to the topic <italic>Z<sub>i</sub></italic>=<italic>k</italic> according to the distribution of the theme and the drugs of symptoms except <italic>w<sub>i</sub></italic>, based on the sampling form shown in formula i. At the same time, the Gibbs method was used to estimate the probability of each drug of the m (<italic>x<sub>i</sub></italic>=<italic>j</italic>) to the current symptoms with the corresponding theme. Subsequently, according to the treatment vector u and theme distribution <italic>Z<sub>i</sub></italic>=<italic>k</italic>, based on the form shown in formula ii, the Gibbs method was used to estimate the probability of each treatment (<italic>u<sub>i</sub></italic>=1) of u to the theme <italic>Z<sub>i</sub></italic>=<italic>k</italic>. Furthermore, according to the diagnosis vector r and theme distribution <italic>Z<sub>i</sub></italic>=<italic>k</italic>, based on the form shown in formula iii, the Gibbs method was used to estimate the probability of each diagnosis (<italic>r<sub>i</sub></italic>=<italic>s</italic>) of r to the theme <italic>Z<sub>i</sub></italic>=<italic>k</italic>.</p>
<p><bold>x</bold></p>
<disp-formula>
<alternatives>
<mml:math id="umml10" display="block"><mml:mrow><mml:mi>P</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mo>&#x2013;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mo>&#x2013;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x221E;</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>V</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>&#x03B2;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x0027;</mml:mo></mml:mrow></mml:msub><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x0027;</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>V</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>V</mml:mi><mml:mi>&#x03B2;</mml:mi></mml:mrow></mml:mrow></mml:mfrac><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>&#x03B1;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0027;</mml:mo></mml:mrow></mml:msub><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi><mml:mo>&#x0027;</mml:mo></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>K</mml:mi><mml:mi>&#x03B1;</mml:mi></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:math>
<graphic xlink:href="etm-12-01-0288-g12.jpg"/>
</alternatives>
</disp-formula>
<p><bold>xi</bold></p>
<disp-formula>
<alternatives>
<mml:math id="umml11" display="block"><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mo>&#x2013;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x221E;</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>H</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>&#x03B3;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0027;</mml:mo></mml:mrow></mml:msub><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mi>k</mml:mi><mml:mo>&#x0027;</mml:mo></mml:mrow><mml:mrow><mml:mi>H</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>K</mml:mi><mml:mi>&#x03B3;</mml:mi></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:math>
<graphic xlink:href="etm-12-01-0288-g13.jpg"/>
</alternatives>
</disp-formula>
<p><bold>xii</bold></p>
<disp-formula>
<alternatives>
<mml:math id="umml12" display="block"><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mo>&#x2013;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x221E;</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0027;</mml:mo></mml:mrow></mml:msub><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>k</mml:mi><mml:mo>&#x0027;</mml:mo></mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>K</mml:mi><mml:mi>&#x03BB;</mml:mi></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:math>
<graphic xlink:href="etm-12-01-0288-g14.jpg"/>
</alternatives>
</disp-formula>
<p>According to the Gibbs sampling process, the process was iterated to obtain the distribution of &#x03C6;, &#x03B8;, &#x03B7;, &#x03B5;, as indicated:</p>
<disp-formula>
<alternatives>
<mml:math id="umml13" display="block"><mml:mrow><mml:msub><mml:mi>&#x03C6;</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>V</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>&#x03B2;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x0027;</mml:mo></mml:mrow></mml:msub><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x0027;</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>V</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>K</mml:mi><mml:mi>&#x03B2;</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>&#x03B1;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0027;</mml:mo></mml:mrow></mml:msub><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0027;</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>K</mml:mi><mml:mi>&#x03B1;</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:msub><mml:mi>&#x03B7;</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>H</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>&#x03B3;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0027;</mml:mo></mml:mrow></mml:msub><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0027;</mml:mo><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>H</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>K</mml:mi><mml:mi>&#x03B3;</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:msub><mml:mi>&#x03B5;</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x0027;</mml:mo></mml:mrow></mml:msub><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>k</mml:mi><mml:mo>&#x0027;</mml:mo></mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x002B;</mml:mo><mml:mi>K</mml:mi><mml:mi>&#x03BB;</mml:mi></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:math>
<graphic xlink:href="etm-12-01-0288-g15.jpg"/>
</alternatives>
</disp-formula>
</sec>
<sec>
<title>Determining the theme number</title>
<p>The theme vector was identified according to the distribution of the theme (from &#x03B2; matrix) in the V-dimensional word space. The similarity between the theme vector was measured by the standard vector cosine distance based on the formula shown in (iv).</p>
<p><bold>xiii</bold></p>
<disp-formula>
<alternatives>
<mml:math id="umml14" display="block"><mml:mrow><mml:mtext mathvariant="italic">corre</mml:mtext><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mtext mathvariant="italic">corre</mml:mtext><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>&#x03B2;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:msub><mml:msub><mml:mi>&#x03B2;</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>v</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mi>V</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>&#x03B2;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>v</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>&#x03B2;</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>v</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mi>V</mml:mi></mml:munderover><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>&#x03B2;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>v</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>v</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mi>v</mml:mi></mml:munderover><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>&#x03B2;</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>v</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac></mml:mrow></mml:math>
<graphic xlink:href="etm-12-01-0288-g16.jpg"/>
</alternatives>
</disp-formula>
<p><bold>xiv</bold></p>
<disp-formula>
<alternatives>
<mml:math id="umml15" display="block"><mml:mrow><mml:mtext mathvariant="italic">avgcorre</mml:mtext><mml:mo stretchy="false">(</mml:mo><mml:mtext mathvariant="italic">struc</mml:mtext><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x2013;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>k</mml:mi></mml:munderover><mml:mrow><mml:mtext mathvariant="italic">corre</mml:mtext><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mrow><mml:mi>K</mml:mi><mml:mo>&#x002A;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>K</mml:mi><mml:mo>&#x2013;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:math>
<graphic xlink:href="etm-12-01-0288-g17.jpg"/>
</alternatives>
</disp-formula>
<p>where the smaller the <italic>corre</italic> (<italic>z<sub>i</sub></italic>, <italic>z<sub>j</sub></italic>), the smaller the correlation of the theme, and the stability of the structure of the theme according to the average similarity between all subjects based on the formula shown in (v).</p>
</sec>
<sec>
<title>SHTDT model based on the weight</title>
<p>Improvements to the SHTDT model were made based on the weight, initially when stacking for each set, and adding the <italic>weight<sub>i</sub></italic>, instead of not adding 1. The weight was derived from the distribution based on the Gaussian function. Since for the diagnosis and treatment data, each symptom in each patient sample generally appeared only once, TF=1. However, if weighted using TF-IDF, it would lead to an increase in the weight of the low-frequency words, and a decrease in the weights of the high-frequency words, which was not a viable result. In addition, during the SHTDT model&#x0027;s initialization, the random assignment of the variables was provided from a wider range of variables. However, the assignment accuracy was not high, which affected the subsequent circulation sampling link. Thus, use of the statistical principle leads to yielding statistics of the drug-symptom, treatment-symptom, and diagnosis-symptom correlations of the patients&#x0027; record, resulting in the sorting of each combination. For example, to allocate drugs for certain symptoms, ininitally <italic>x<sub>i</sub></italic>=<italic>rand</italic>.<italic>next</italic> (0, the patient of the corresponding drugs). Based on the correlation of the statistics (drug-symptoms), in the corresponding drug episodes of the patients, the most frequent drug was employed. If several drugs were equally frequent, one drug was randomly selected. Of note is that the smaller the selected range, the higher the assignment accuracy. The algorithm of SHTDT model based on weight is shown in <xref rid="tIV-etm-0-0-3285" ref-type="table">Table IV</xref>.</p>
</sec>
</sec>
</sec>
</sec>
<sec sec-type="results|discussion">
<title>Results and Discussion</title>
<sec>
<title/>
<sec>
<title>Results of LSA model</title>
<p>The dataset used in the present study is derived from the clinical data of the Zhongshan Hospital of Xiamen University. There were 588 clinical manifestations used to semantically describe 251 syndromes and 5 organ cards (<xref rid="b23-etm-0-0-3285" ref-type="bibr">23</xref>). According to the dataset, a comparison was made of the performance of non-LSA and LSA prior to and following the experiment (<xref rid="f4-etm-0-0-3285" ref-type="fig">Fig. 4</xref>). As there are many matrix computing in the model, so MATLAB is applied to do SVD. And cosine vector method is applied to compute correlative degree. Part of syndrome vectors and organ vectors that has been processed by LSA (k=7) are shown as <xref rid="tV-etm-0-0-3285" ref-type="table">Tables V</xref> and <xref rid="tVI-etm-0-0-3285" ref-type="table">VI</xref>.</p>
<p><xref rid="f4-etm-0-0-3285" ref-type="fig">Fig. 4</xref> shows that, the performance of LSA-based semantic classification model of the syndrome differentiation classifier was more effective than that of the non-LSA classifier (<xref rid="b24-etm-0-0-3285" ref-type="bibr">24</xref>). The main reason for this finding is that LSA maps the high dimension VSM to the low dimensional latent semantic space. At the same time, the &#x2018;noise&#x2019; (irrelevant information) was also removed. Compared with the traditional vector space, the dimension of the latent semantic space is smaller and a semantic relationship is clearer.</p>
<p>As shown in <xref rid="f5-etm-0-0-3285" ref-type="fig">Fig. 5</xref>, weighted-LSA performs more effectively than non-weighted-LSA. Thus, feature weighting improves the performance of classification. Feature weighting reduces the interference of high frequency words and stop words (reducing their representative) and improves the representative keywords&#x0027; function (improving the differentiation) so as to increase the classification accuracy.</p>
<p>The experimental result showed that LSA was successfully applied in the TCM field, although additional studies should be conducted to confirm the results. Although the experiment was small scale, an advantage of LSA was identified. Thus, this method may be applied successfully in future.</p>
</sec>
<sec>
<title>Results of SHTDT model</title>
<p>The attributes of symptoms, Chinese medicine, treatment and diagnosis for each case were screened. After supplementing any lacking data, the deletion of redundant data, the uniform regulation of the symptoms of the term, the unitary drug name, and the specification of representation format, the best theme number was determined in accordance. In <xref rid="f6-etm-0-0-3285" ref-type="fig">Fig. 6</xref>, k is identified as 12, following which the SHTDT model was run based on the weight. Two typical themes were identified, each of which lists the symptoms, Chinese medicine, treatment and diagnosis of the first 10.</p>
<p>It is extremely difficult to comprehend semantic meaning at present. However, latent semantic comprehension is practically feasible. The application of LSA makes the meaning of vectors change as they reflect the distributed relationship of clinical manifestation, and reinforce the semantic meaning of vectors. Thus, vectors are based on lexemic and semantic strata. Performing a correlative analysis in such a new semantic space yields better results compared to the original feature vector. Because of SVD, the LSA-based semantic classification model of syndrome differentiation suppresses the &#x2018;noise&#x2019; and reduces the dimensions of matrix. The semantic relationship between organs and syndromes is guaranteed. Additionally, it has high computability and strong operability and solves the issue of matrix sparsity. However, there are factors that remain to be investigated, such as obtaining k in SVD, and a more viable option of clinical manifestation. These factors are likely to affect the whole classification effect.</p>
<p><xref rid="tVII-etm-0-0-3285" ref-type="table">Table VII</xref> shows that, the theme pertains to spleen deficiency syndrome, and refers to lack of temper, weak symptoms of transport and dereliction of digestion, with the general performance of consuming less, lack of blood, Shenpi fatigue, heart palpitations, shortness of breath, pale complexion, less bloating, pale tongue, white coating, and weak pulse. The nourishing Qi Diet is recommended for symptoms including food overconsumption, overexertion, lassitude, inadequate endowment, the elderly and infirm, chronic diseases, and critical diseases.</p>
<p><xref rid="tVIII-etm-0-0-3285" ref-type="table">Table VIII</xref> shows that the theme primarily pertains to constipation symptoms. Patients with constipation often experience headache, fatigue, poor appetite, bloating, indigestion and other symptoms, which may be associated with poor diet, sedentary lifestyle and personal spiritual factors (<xref rid="b25-etm-0-0-3285" ref-type="bibr">25</xref>). Therapies include becoming involved in the Qi, laxatives, and regulation of blood lipids for relief (<xref rid="b26-etm-0-0-3285" ref-type="bibr">26</xref>). In addition, attention should be focused on symptoms with the occult that may be confused with other clinical signs of disease. For example, the cause of hyperlipidemia is that the content of cholesterol, triglycerides, &#x03B2;-lipoprotein and other lipid components in the blood are higher than normal, reflecting a series of pathological changes in the body, including clinical dizziness, chest tightness, palpitations, Shenpi fatigue, insomnia, forgetfulness, numbness and other symptoms as the main performance (27). Similar symptoms are shown in <xref rid="tVII-etm-0-0-3285" ref-type="table">Table VII</xref>. Hyperlipidemia, a type of &#x2018;rich disease&#x2019;, because of its slow onset, which may trigger coronary heart disease, stroke, diabetes, obesity, and fatty liver disease. Therefore, manifestation of the above symptoms requires seeking medical assistance to prevent disease progression.</p>
<p>In conclusion, it is difficult to comprehend semantic meaning at present, although latent semantic comprehension is practically feasible. The application of LSA makes the meaning of vectors change. They reflect the distributed relationship of clinical manifestation, and reinforce the semantic meaning of vectors. Thus, vectors are based on lexemic and semantic strata. Performing a correlative analysis in such a new semantic space yields a better result compared to the original feature vector. Because of SVD, the LSA-based semantic classification model of syndrome differentiation suppresses the &#x2018;noise&#x2019; and reduces the dimensions of matrix. The semantic relationship between organs and syndromes is guaranteed. In addition, it has high computability and strong operability and solves the issue of matrix sparsity. However, there are factors that remain to be investigated, such as obtaining k in SVD, and the optimal choice of clinical manifestation. These factors may affect the whole classification effect.</p>
<p>For the TCM diagnosis, a variety of subjective factors exist, but the symptoms and drugs may be considered objective factors. To identify clinic rules from these two objective factors, a model of TCM rules based on the statistics was created and the SHTDT model was suggested. The experimental results have been identified by the Chinese clinical doctors, and the model generated and the results obtained are of great clinical significance. Given patient symptoms, we can predict the theme, drug application, the treatments and diagnosis of the patient using this model. The results of the experiments show that the SHTDT model prediction results were approximate to the actual results, albeit completely accurate results were not yielded owing to the fact that this prediction is equivalent to multi-label prediction, i.e., considering the drugs, treatment and diagnosis as labels. Thus, for a patient, the drugs selected, approach to treatment and diagnosis essentially constitute human-made factors.</p>
</sec>
</sec>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="b1-etm-0-0-3285"><label>1</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>LW</given-names></name><name><surname>Duan</surname><given-names>CL</given-names></name><name><surname>Xiong</surname><given-names>ZW</given-names></name><name><surname>WU</surname><given-names>H</given-names></name></person-group><article-title>Study on the application of naive Bayesian methods in identifying syndrome in TCM</article-title><source>J Inner Mongolia Univ</source><volume>38</volume><fpage>568</fpage><lpage>571</lpage><year>2007</year></element-citation></ref>
<ref id="b2-etm-0-0-3285"><label>2</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Peng</surname><given-names>JF</given-names></name></person-group><article-title>Syndrome Element Differentiation Methodology based on Data Mining Technology (unpublished PhD thesis)</article-title><source>Hunan University of Chinese Medicine</source><year>2007</year></element-citation></ref>
<ref id="b3-etm-0-0-3285"><label>3</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Witten</surname><given-names>IH</given-names></name><name><surname>Frank</surname><given-names>E</given-names></name></person-group><source>Data Mining: Practical Machine Learning Tools and Techniques</source><volume>3</volume><edition>2nd</edition><publisher-name>China Machine, Press</publisher-name><publisher-loc>Beijing</publisher-loc><fpage>105</fpage><lpage>106</lpage><year>2005</year></element-citation></ref>
<ref id="b4-etm-0-0-3285"><label>4</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>FF</given-names></name><name><surname>Chen</surname><given-names>CH</given-names></name><name><surname>Jiang</surname><given-names>L</given-names></name></person-group><article-title>Brain functional connection research based on complex network</article-title><source>Fuza Xitong Yu Fuzaxing Kexue</source><volume>8</volume><fpage>18</fpage><lpage>23</lpage><year>2011</year></element-citation></ref>
<ref id="b5-etm-0-0-3285"><label>5</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Qin</surname><given-names>XH</given-names></name><name><surname>Guan</surname><given-names>YJ</given-names></name></person-group><article-title>Viruses spread of in&#xFB02;uenza AH1N1 based on complex networks</article-title><source>Statistics and Information Forum</source><volume>25</volume><fpage>86</fpage><lpage>90</lpage><year>2010</year></element-citation></ref>
<ref id="b6-etm-0-0-3285"><label>6</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yildirim</surname><given-names>MA</given-names></name><name><surname>Goh</surname><given-names>KI</given-names></name><name><surname>Cusick</surname><given-names>ME</given-names></name><name><surname>Barab&#x00E1;si</surname><given-names>AL</given-names></name><name><surname>Vidal</surname><given-names>M</given-names></name></person-group><article-title>Drug-target network</article-title><source>Nat Biotechnol</source><volume>25</volume><fpage>1119</fpage><lpage>1126</lpage><year>2007</year><pub-id pub-id-type="doi">10.1038/nbt1338</pub-id><pub-id pub-id-type="pmid">17921997</pub-id></element-citation></ref>
<ref id="b7-etm-0-0-3285"><label>7</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname><given-names>HJ</given-names></name></person-group><article-title>Application of complex network theory in gene regulatory networks</article-title><source>J Chongqing Univ Sci Technol</source><volume>11</volume><fpage>141</fpage><lpage>144</lpage><year>2009</year></element-citation></ref>
<ref id="b8-etm-0-0-3285"><label>8</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fang</surname><given-names>Z</given-names></name><name><surname>Li</surname><given-names>YZ</given-names></name><name><surname>Xiao</surname><given-names>JM</given-names></name><name><surname>Li</surname><given-names>GB</given-names></name><name><surname>Wen</surname><given-names>ZN</given-names></name><name><surname>Li</surname><given-names>ML</given-names></name></person-group><article-title>Complex network-based random forest algorithm for predicting the impact of amino acid mutation on protein stability</article-title><source>Chem Res Appl</source><volume>23</volume><fpage>554</fpage><lpage>558</lpage><year>2011</year></element-citation></ref>
<ref id="b9-etm-0-0-3285"><label>9</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Doukas</surname><given-names>C</given-names></name><name><surname>Maglogiannis</surname><given-names>I</given-names></name></person-group><article-title>Enabling human status awareness in assistive environments based on advanced sound and motion data classification</article-title><source>Proceedings of the 1st international conference on PErvasive Technologies Related to Assistive Environments</source><publisher-loc>Athens, Greece</publisher-loc><month>Jul</month><year>2008</year><uri>http://dx.doi.org/10.1145/1389586.1389588</uri></element-citation></ref>
<ref id="b10-etm-0-0-3285"><label>10</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Doukas</surname><given-names>C</given-names></name><name><surname>Maglogiannis</surname><given-names>I</given-names></name></person-group><article-title>Human Distress Sound Analysis and Characterization using Advanced Classification Techniques. In: Artificial Intelligence: Theories, Models and Applications</article-title><source>5th Hellenic Conference on AI, SETN 2008, Syros Greece, October 2008</source><person-group person-group-type="editor"><name><surname>Darzentas</surname><given-names>J</given-names></name><name><surname>Vouros</surname><given-names>GA</given-names></name><name><surname>Vosinakis</surname><given-names>S</given-names></name><name><surname>Arnellos</surname><given-names>A</given-names></name></person-group><publisher-name>Springer-Verlag GmbH</publisher-name><publisher-loc>Berlin</publisher-loc><fpage>73</fpage><lpage>84</lpage><year>2008</year></element-citation></ref>
<ref id="b11-etm-0-0-3285"><label>11</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>H</given-names></name><name><surname>Huang</surname><given-names>ST</given-names></name></person-group><article-title>A fuzzy method to learn text classifier from labeled and unlabeled examples</article-title><source>J Harbin Inst Technol</source><volume>11</volume><fpage>98</fpage><lpage>102</lpage><year>2004</year></element-citation></ref>
<ref id="b12-etm-0-0-3285"><label>12</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname><given-names>G</given-names></name><name><surname>Zhu</surname><given-names>L</given-names></name><name><surname>Yan</surname><given-names>G</given-names></name><name><surname>Chen</surname><given-names>D</given-names></name></person-group><article-title>Kernel Method for Building Fuzzy Classifiers</article-title><source>Proceedings of the The sixth world congress on intelligent control and automation, 2006</source><volume>6</volume><comment>WCICA</comment><fpage>4307</fpage><lpage>4311</lpage><year>2006</year></element-citation></ref>
<ref id="b13-etm-0-0-3285"><label>13</label><element-citation publication-type="journal"><article-title>World Wide Web Consortium: (W3C)</article-title><source>Semantic Web Activity</source><uri>http://www.w3.org/2001/sw/</uri><comment>Accessed</comment><date-in-citation content-type="access-date"><month>Jun</month><day>04</day><year>2008</year></date-in-citation></element-citation></ref>
<ref id="b14-etm-0-0-3285"><label>14</label><element-citation publication-type="journal"><person-group person-group-type="editor"><name><surname>McGuinness</surname><given-names>DL</given-names></name><name><surname>Harmelen</surname><given-names>FV</given-names></name></person-group><source>OWL Web Ontology Language Overview: W3C Recommendation 10 February 2004</source><uri>http://www.w3.org/TR/owl-features/</uri><comment>Accessed</comment><date-in-citation content-type="access-date"><month>Jun</month><day>04</day><year>2008</year></date-in-citation></element-citation></ref>
<ref id="b15-etm-0-0-3285"><label>15</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname><given-names>CH</given-names></name><name><surname>Nan</surname><given-names>LL</given-names></name><name><surname>Ren</surname><given-names>YP</given-names></name></person-group><article-title>Research on the text clustering algorithm based on latent semantic analysis and optimization</article-title><source>Proceedings of the Computer Science and Automation Engineering (CSAE), 2011 IEEE International Conference</source><volume>4</volume><comment>IEEE</comment><fpage>470</fpage><lpage>473</lpage><year>2011</year><pub-id pub-id-type="doi">10.1109/CSAE.2011.5952891</pub-id></element-citation></ref>
<ref id="b16-etm-0-0-3285"><label>16</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>XG</given-names></name><name><surname>Huang</surname><given-names>GJ</given-names></name><name><surname>Cao</surname><given-names>LH</given-names></name><name><surname>Guo</surname><given-names>HT</given-names></name></person-group><article-title>Web services filtrate technologies based on latent semantic analysis</article-title><source>Comput Eng</source><volume>34</volume><fpage>39</fpage><lpage>41</lpage><year>2008</year></element-citation></ref>
<ref id="b17-etm-0-0-3285"><label>17</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ishii</surname><given-names>N</given-names></name><name><surname>Murai</surname><given-names>T</given-names></name><name><surname>Yamada</surname><given-names>T</given-names></name><name><surname>Bao</surname><given-names>Y</given-names></name></person-group><article-title>Text classification by combining grouping, LSA and kNN. In: Proceedings of the Computer and Information Science, 2006 and 2006 1st IEEE/ACIS International Workshop on Component-Based Software Engineering, Software Architecture and Reuse. ICIS-COMSAR 2006</article-title><source>5th IEEE/ACIS International Conference on IEEE</source><fpage>148</fpage><lpage>154</lpage><year>2006</year></element-citation></ref>
<ref id="b18-etm-0-0-3285"><label>18</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jiang</surname><given-names>Z</given-names></name><name><surname>Lu</surname><given-names>C</given-names></name></person-group><article-title>A latent semantic analysis based method of getting the category attribute of words</article-title><source>Proceedings of the Electronic Computer Technology, 2009 International Conference on IEEE</source><fpage>141</fpage><lpage>146</lpage><year>2009</year><pub-id pub-id-type="doi">10.1109/ICECT.2009.19</pub-id></element-citation></ref>
<ref id="b19-etm-0-0-3285"><label>19</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>He</surname><given-names>ZL</given-names></name><name><surname>Wang</surname><given-names>CH</given-names></name></person-group><article-title>Application of matrix singular value decomposition (SVD) in latent semantic information retrieval</article-title><source>Mod Comput</source><volume>6</volume><fpage>pp21</fpage><lpage>23</lpage><year>2011</year></element-citation></ref>
<ref id="b20-etm-0-0-3285"><label>20</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hu</surname><given-names>M</given-names></name><name><surname>He</surname><given-names>Y</given-names></name><name><surname>Li</surname><given-names>J</given-names></name></person-group><article-title>Fault diagnosis method based on LSA and SVM. In: Proceedings of the Information Engineering and Computer Science, 2009. ICIECS 2009</article-title><source>International Conference on IEEE</source><fpage>1</fpage><lpage>4</lpage><year>2009</year></element-citation></ref>
<ref id="b21-etm-0-0-3285"><label>21</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname><given-names>JT</given-names></name><name><surname>Zhang</surname><given-names>QY</given-names></name><name><surname>Yuan</surname><given-names>ZT</given-names></name></person-group><article-title>A Junk Mail Filtering Method Based on LSA and FSVM. In: Proceedings of the Fuzzy Systems and Knowledge Discovery, 2008. FSKD &#x0027;08</article-title><source>Fifth International Conference on</source><volume>3</volume><comment>IEEE</comment><fpage>111</fpage><lpage>115</lpage><year>2008</year></element-citation></ref>
<ref id="b22-etm-0-0-3285"><label>22</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Xuan</surname><given-names>Y</given-names></name><name><surname>Zhu</surname><given-names>Q</given-names></name></person-group><article-title>Research on tag semantic retrieval in social tagging system based on LSA</article-title><source>Libr Inf Serv</source><volume>55</volume><fpage>11</fpage><lpage>14</lpage><year>2011</year></element-citation></ref>
<ref id="b23-etm-0-0-3285"><label>23</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>HP</given-names></name><name><surname>Yu</surname><given-names>HK</given-names></name><name><surname>Xiong</surname><given-names>DY</given-names></name><name><surname>Liu</surname><given-names>Q</given-names></name></person-group><article-title>HHMM-based Chinese Lexical Analyzer ICTCLAS</article-title><source>Proceedings of the 41st Annual Meeting of the Association for Computational Linguistic</source><fpage>1231</fpage><lpage>1235</lpage><year>2003</year></element-citation></ref>
<ref id="b24-etm-0-0-3285"><label>24</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chu</surname><given-names>Keming</given-names></name><name><surname>Li</surname><given-names>Fang</given-names></name></person-group><article-title>LDA model-based news topic evolution</article-title><source>Computer Applications and Software</source><volume>4</volume><year>2011</year><comment>DOI:10.3969/j.issn.1000-386X.2011.04.002</comment></element-citation></ref>
<ref id="b25-etm-0-0-3285"><label>25</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jing</surname><given-names>S</given-names></name><name><surname>Meng</surname><given-names>F</given-names></name><name><surname>Li</surname><given-names>WL</given-names></name></person-group><article-title>The analysis of the themes based on LDA model</article-title><source>Acta Automatica Sinica</source><volume>35</volume><fpage>1586</fpage><lpage>1592</lpage><year>2009</year></element-citation></ref>
<ref id="b26-etm-0-0-3285"><label>26</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yao</surname><given-names>L</given-names></name><name><surname>Zhang</surname><given-names>Y</given-names></name><name><surname>Wei</surname><given-names>B</given-names></name><name><surname>Wang</surname><given-names>W</given-names></name><name><surname>Zhang</surname><given-names>Y</given-names></name><name><surname>Ren</surname><given-names>X</given-names></name><name><surname>Bian</surname><given-names>Y</given-names></name></person-group><article-title>Discovering treatment pattern in Traditional Chinese Medicine clinical cases by exploiting supervised topic model and domain knowledge</article-title><source>J Biomed Inform</source><volume>58</volume><fpage>260</fpage><lpage>267</lpage><year>2015</year><pub-id pub-id-type="doi">10.1016/j.jbi.2015.10.012</pub-id><pub-id pub-id-type="pmid">26524127</pub-id></element-citation></ref>
</ref-list>
</back>
<floats-group>
<fig id="f1-etm-0-0-3285" position="float">
<label>Figure 1.</label>
<caption><p>The latent semantic analysis-based semantic classification model of syndrome differentiation.</p></caption>
<graphic xlink:href="etm-12-01-0288-g01.jpg"/>
</fig>
<fig id="f2-etm-0-0-3285" position="float">
<label>Figure 2.</label>
<caption><p>Lejeune Dirichlet allocation model graph.</p></caption>
<graphic xlink:href="etm-12-01-0288-g05.jpg"/>
</fig>
<fig id="f3-etm-0-0-3285" position="float">
<label>Figure 3.</label>
<caption><p>Symptom-herb-therapies-diagnosis topic model graph.</p></caption>
<graphic xlink:href="etm-12-01-0288-g11.jpg"/>
</fig>
<fig id="f4-etm-0-0-3285" position="float">
<label>Figure 4.</label>
<caption><p>The performance of non-latent semantic analysis (LSA) and LSA.</p></caption>
<graphic xlink:href="etm-12-01-0288-g18.jpg"/>
</fig>
<fig id="f5-etm-0-0-3285" position="float">
<label>Figure 5.</label>
<caption><p>The performance of non-weighted-latent semantic analysis (LSA) and weighted-LSA.</p></caption>
<graphic xlink:href="etm-12-01-0288-g19.jpg"/>
</fig>
<fig id="f6-etm-0-0-3285" position="float">
<label>Figure 6.</label>
<caption><p>Determine the best theme number.</p></caption>
<graphic xlink:href="etm-12-01-0288-g20.jpg"/>
</fig>
<table-wrap id="tI-etm-0-0-3285" position="float">
<label>Table I.</label>
<caption><p>Semantic description form of syndromes and organs.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Syndromes/organs</th>
<th align="center" valign="bottom">Major clinical manifestation</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Xin qi (kui) xu zheng (syndromes)</td>
<td align="left" valign="top">Palpitation, shortness of breath, mental weariness, spontaneous sweating, pale face, pale tongue, weak pulse</td>
</tr>
<tr>
<td align="left" valign="top">Fei qi (kui) xu zheng (syndromes)</td>
<td align="left" valign="top">Cough, shortness of breath, asthma, clear thin phlegm, low voice, spontaneous sweating, anemophobia, pale tongue, weak pulse</td>
</tr>
<tr>
<td align="left" valign="top">Pi qi (kui) xu zheng (syndromes)</td>
<td align="left" valign="top">Consumption of less food, abdominal distension, thin loose stools, mental weariness, Physical weariness, pale tongue, weak pulse</td>
</tr>
<tr>
<td align="left" valign="top">Xin qi xu xue hen zheng (syndromes)</td>
<td align="left" valign="top">Palpitation, shortness of breath, chest tightness, cardiodynia, mental weariness, dark purple face, lilac tongue, weak pulse</td>
</tr>
<tr>
<td align="left" valign="top">Shen qi (kui) xu zheng (syndromes)</td>
<td align="left" valign="top">Tinnitus, soreness of waist, attenuated libido, dizziness, unconsciousness, weak pulse</td>
</tr>
<tr>
<td align="left" valign="top">Xin xi lei zheng (organs)</td>
<td align="left" valign="top">Palpitation, hang-ups, chest tightness, dreaminess, insomnia, dizziness, red tongue, thirst, cardiodynia, intermittent pulse, fever, red face, mental weariness, cold chills, thready weak pulse, disorderly speech, unconsciousness, limb cooling, weak pulse, shortness of breath</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tII-etm-0-0-3285" position="float">
<label>Table II.</label>
<caption><p>Semantic description form of syndromes and organs.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Val</th>
<th align="center" valign="bottom">Label</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">188</td>
<td align="left" valign="top">Palpitation</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;48</td>
<td align="left" valign="top">Shortness of breath</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;24</td>
<td align="left" valign="top">Mental weariness</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;24</td>
<td align="left" valign="top">Spontaneous sweating</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;28</td>
<td align="left" valign="top">Pale face</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;64</td>
<td align="left" valign="top">Pale tongue</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;66</td>
<td align="left" valign="top">Weak pulse</td>
</tr>
<tr>
<td align="left" valign="top">148</td>
<td align="left" valign="top">Chest tightness</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;66</td>
<td align="left" valign="top">Cardiodynia</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;26</td>
<td align="left" valign="top">Dark purple face</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;54</td>
<td align="left" valign="top">Lilac tongue</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;26</td>
<td align="left" valign="top">Astringent weak pulse</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tIII-etm-0-0-3285" position="float">
<label>Table III.</label>
<caption><p>Frequency matrix of syndromes and organs.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Clinical manifestation</th>
<th align="center" valign="bottom">Xin qi (kui) xu zheng</th>
<th align="center" valign="bottom">Fei qi (kui) xu zheng</th>
<th align="center" valign="bottom">Pi qi (kui) xu zheng</th>
<th align="center" valign="bottom">Xin qi xu xue hen zheng</th>
<th align="center" valign="bottom">Shen qi (kui) xu zheng</th>
<th align="center" valign="bottom">Xin xi lei zheng</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Cough</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
</tr>
<tr>
<td align="left" valign="top">Palpitation</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">1</td>
</tr>
<tr>
<td align="left" valign="top">Shortness of breath</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
</tr>
<tr>
<td align="left" valign="top">Pale tongue</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
</tr>
<tr>
<td align="left" valign="top">Spontaneous sweating</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
</tr>
<tr>
<td align="left" valign="top">Chest tightness</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">0</td>
<td align="center" valign="top">1</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tIV-etm-0-0-3285" position="float">
<label>Table IV.</label>
<caption><p>The Gibbs sampling process of SHTDT model based on weight.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Characteristics</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">1) For i=1 to n</td>
</tr>
<tr>
<td align="left" valign="top">2) Assign topic randomly <italic>z<sub>i</sub></italic> &#x0404;<italic>1 &#x2026; T</italic></td>
</tr>
<tr>
<td align="left" valign="top">3) According to the symptoms-drug frequency, select the drug of the greatest probability for the corresponding symptoms <italic>x<sub>i</sub></italic> &#x0404; |<italic>m <sub>q</sub></italic>| (<italic>m <sub>q</sub></italic> &#x0404; <italic>m <sub>p</sub></italic>)</td>
</tr>
<tr>
<td align="left" valign="top">4) According to the symptoms-treatment methods word frequency, select the treatment of the greatest probability for the corresponding symptoms. //|<italic>m <sub>p</sub></italic>| is the treatment set with patient p, with |<italic>m <sub>p</sub></italic>| being the diagnosis set with patient p</td>
</tr>
<tr>
<td align="left" valign="top">5) According to the symptoms-diagnosis word frequency, select the diagnosis of the greatest probability for the corresponding symptoms. yi &#x0404; |<italic>r</italic><sub>q</sub>| (<italic>r</italic><sub>q</sub> &#x0404; <italic>r</italic><sub>p</sub>)</td>
</tr>
<tr>
<td align="left" valign="top">6) Generate the initial distribution of the symptoms, Chinese medicine, treatments and diagnostic (&#x03C6;, &#x03B8;, &#x03B7;, &#x03B5;) according to the formula (iv).</td>
</tr>
<tr>
<td align="left" valign="top">7) Repeat</td>
</tr>
<tr>
<td align="left" valign="top">8) For i=1 to n</td>
</tr>
<tr>
<td align="left" valign="top">9) For j=1 to |<italic>m <sub>p</sub></italic>|, where |<italic>m <sub>p</sub></italic>| is the corresponding TCM for patient p</td>
</tr>
<tr>
<td align="left" valign="top">10) For k=1 to T</td>
</tr>
<tr>
<td align="left" valign="top">11) According to the formula (i), calculate the corresponding probability value, and obtain the theme k and TCM j that meet the condition of arg max<italic><sub>j, k</sub> p (z<sub>i</sub>=k, x<sub>i</sub>=j</italic> | <italic>w<sub>i</sub>=n, z<sub>-i</sub>, x<sub>-i</sub>, weight<sub>i</sub></italic>)</td>
</tr>
<tr>
<td align="left" valign="top">12) Update the symptoms and drug distribution. &#x03C6;, &#x03B8; according to formula (i)</td>
</tr>
<tr>
<td align="left" valign="top">13) For l=1 to |<italic>t <sub>p</sub></italic>|</td>
</tr>
<tr>
<td align="left" valign="top">14) Calculate the probability value of 1 to each theme j, obtain the treatment that meets the condition of arg max<italic><sub>i</sub> p</italic> (<italic>u<sub>i</sub>=l | w<sub>i</sub>=n, z<sub>i</sub>=k, u<sub>-i</sub>, t<sub>p</sub>, weight<sub>i</sub></italic>), and then update the treatment distribution &#x03B7; according to formula (ii)</td>
</tr>
<tr>
<td align="left" valign="top">15) For s=1 to |<italic>r <sub>p</sub></italic>|</td>
</tr>
<tr>
<td align="left" valign="top">16) Calculate the probability value of s to each theme j, obtain the diagnosis that meets the condition of arg max<italic><sub>s</sub> p</italic> (<italic>y<sub>i</sub>=s | w<sub>i</sub>=n, z<sub>i</sub>=k, u<sub>-i</sub>, r<sub>p</sub>, weight<sub>i</sub></italic>), and then update the diagnosis distribution &#x03B5; according to formula (iii)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="tfn1-etm-0-0-3285"><p>Repeat the process until the change is small enough to oversee or the the number of iterations reach the limit. SHTDT, symptom-herb-therapies-diagnosis topic; TCM, traditional Chinese medicine.</p></fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="tV-etm-0-0-3285" position="float">
<label>Table V.</label>
<caption><p>Part of syndromes vector set.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Part of syndromes vector set</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">1.90945757e-002 &#x2212;4.24046812e-002 3.65377378e-003 2.40735542e-002 &#x2212;6.16202858e &#x2212;003&#x2013;7.23660460e-003 4.30610050e-002</td>
</tr>
<tr>
<td align="left" valign="top">4.00585125e-002 &#x2212;3.72604253e-002 &#x2212;2.31163110e-002 4.73920401e-002&#x2013;3.88408266e &#x2212;002&#x2013;2.76001384e-002 4.73858843e-002</td>
</tr>
<tr>
<td align="left" valign="top">1.00939593e-002 &#x2212;3.38087295e-002 1.44325399e-002 &#x2212;5.04584184e-002 1.75236553e-002 3.83762814e-002 &#x2212;9.25199848e-002</td>
</tr>
<tr>
<td align="left" valign="top">3.30016986e-002 &#x2212;8.91442827e-002 1.60408602e-002 &#x2212;4.24170056e-003 5.49615913e-003 1.34018652e-002 4.74668365e-002</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tVI-etm-0-0-3285" position="float">
<label>Table VI.</label>
<caption><p>Part of organs vector set.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Part of organs vector set</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">1.52142266e-001 5.43477370e-002 2.68646576e-002 &#x2212;1.55594463e-001 &#x2212;1.89159278e-001 &#x2212;3.98234074e-002 &#x2212;3.36992832e-002</td>
</tr>
<tr>
<td align="left" valign="top">1.84964369e-001 &#x2212;7.19531062e-003 &#x2212;1.15767339e-001 &#x2212;1.66075937e-001 8.40538933e-002 3.99222783e-002 &#x2212;6.79457783e-002</td>
</tr>
<tr>
<td align="left" valign="top">1.85001434e-001 &#x2212;1.14156252e-001 &#x2212;6.39053502e-003 &#x2212;6.37092024e-002</td>
</tr>
<tr>
<td align="left" valign="top">&#x2212;1.12233101e-002 6.45070181e-002 &#x2212;1.45155973e-001</td>
</tr>
<tr>
<td align="left" valign="top">1.71031085e-001 7.05841533e-002 1.79376694e-001 &#x2212;9.93727433e-002 2.75777159e-002 3.29369106e-002 &#x2212;4.84512266e-002</td>
</tr>
<tr>
<td align="left" valign="top">1.69728908e-001 &#x2212;4.58665353e-002 2.18632149e-002 &#x2212;1.04202173e-001 2.83948127e-002 8.66105955e-002 &#x2212;3.05467470e-001</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tVII-etm-0-0-3285" position="float">
<label>Table VII.</label>
<caption><p>The probability distribution of symptoms, Chinese medicine, treatment and diagnosis concerning theme 3.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Symptoms probability</th>
<th align="center" valign="bottom">TCM probability</th>
<th align="center" valign="bottom">Treatment probability</th>
<th align="center" valign="bottom">Diagnosis probability</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Shortness of breath 0.0564</td>
<td align="left" valign="top">Lobelia 0.6766</td>
<td align="left" valign="top">Help breathing 0.2166</td>
<td align="left" valign="top">Deficiency of lung 0.4020</td>
</tr>
<tr>
<td align="left" valign="top">Pale complexion 0.0432</td>
<td align="left" valign="top">Hyacinth 0.6303</td>
<td align="left" valign="top">Detoxification 0.1715</td>
<td align="left" valign="top">Qi phlegmy heat 0.3987</td>
</tr>
<tr>
<td align="left" valign="top">Epigastric discomfort 0.043</td>
<td align="left" valign="top">Nourish &#x2018;Yin&#x2019; 0.1687</td>
<td align="left" valign="top">Nourish &#x2018;Yin&#x2019; 0.1687</td>
<td align="left" valign="top">Spleen-lost-all-blood 0.3906</td>
</tr>
<tr>
<td align="left" valign="top">Less bloating 0.0268</td>
<td align="left" valign="top">Yuan hu 0.5210</td>
<td align="left" valign="top">Anti-cancer 0.1592</td>
<td align="left" valign="top">Moisture to stay 0.3359</td>
</tr>
<tr>
<td align="left" valign="top">Moderate sleep effect 0.0251</td>
<td align="left" valign="top">Scutellaria barbata 0.5840</td>
<td align="left" valign="top">Invigorating spleen 0.1561</td>
<td align="left" valign="top">Qi and Yin injury 0.3245</td>
</tr>
<tr>
<td align="left" valign="top">Fatigue 0.01988</td>
<td align="left" valign="top">North Adenophora 0.4391</td>
<td align="left" valign="top">Reinforcing stomach 0.1522</td>
<td align="left" valign="top">Spleen Qi deficiency 0.3133</td>
</tr>
<tr>
<td align="left" valign="top">Poor appetite 0.0181</td>
<td align="left" valign="top">Bai ji 0.4079</td>
<td align="left" valign="top">Eliminate bloating 0.1381</td>
<td align="left" valign="top">Blood stagnation 0.2968</td>
</tr>
<tr>
<td align="left" valign="top">Sweating 0.0166</td>
<td align="left" valign="top">Amomum 0.3831</td>
<td align="left" valign="top">Consumer product 0.1368</td>
<td align="left" valign="top">Gas-and-Yin-deficiency 0.2756</td>
</tr>
<tr>
<td align="left" valign="top">Anorexia 0.0165</td>
<td align="left" valign="top">Lily 0.37715</td>
<td align="left" valign="top">Moist lung 0.1332</td>
<td align="left" valign="top">Gas and blood deficiency 0.2683</td>
</tr>
<tr>
<td align="left" valign="top">Emaciation 0.01589</td>
<td align="left" valign="top">Pseudostellaria-heterophylla 0.3668</td>
<td align="left" valign="top">Antiperspirant 0.1271</td>
<td align="left" valign="top">Physically weak and poison accumulation 0.2614</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="tfn2-etm-0-0-3285"><p>TCM, traditional Chinese medicine.</p></fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="tVIII-etm-0-0-3285" position="float">
<label>Table VIII.</label>
<caption><p>The probability distribution of symptoms, Chinese medicine, treatment and diagnosis concerning theme 7.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Symptoms probability</th>
<th align="center" valign="bottom">TCM probability</th>
<th align="center" valign="bottom">Treatment probability</th>
<th align="center" valign="bottom">Diagnosis probability</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Poor sleep 0.0198</td>
<td align="left" valign="top">Sanqi powder 0.2998</td>
<td align="left" valign="top">Digestion 0.1061</td>
<td align="left" valign="top">Diarrhea 0.1744</td>
</tr>
<tr>
<td align="left" valign="top">Moss thin white 0.0105</td>
<td align="left" valign="top">Rhubarb 0.2579</td>
<td align="left" valign="top">Nourishing blood 0.1053</td>
<td align="left" valign="top">Food retention abdominal pain 0.1732</td>
</tr>
<tr>
<td align="left" valign="top">Consuming less 0.0059</td>
<td align="left" valign="top">Hawthorn 0.2420</td>
<td align="left" valign="top">Solid off 0.1028</td>
<td align="left" valign="top">Stagnation stomach 0.1706</td>
</tr>
<tr>
<td align="left" valign="top">Poor appetite 0.0058</td>
<td align="left" valign="top">Coke hawthorn 0.2417</td>
<td align="left" valign="top">Warming the kidney 0.1017</td>
<td align="left" valign="top">Kidney deficiency blood stagnation 0.1498</td>
</tr>
<tr>
<td align="left" valign="top">Poor appetite 0.0056</td>
<td align="left" valign="top">Psoralea corylifolia 0.2153</td>
<td align="left" valign="top">Removing stagnation 0.0957</td>
<td align="left" valign="top">Cold blood 0.1459</td>
</tr>
<tr>
<td align="left" valign="top">Constipation 0.0055</td>
<td align="left" valign="top">Pear skin 0.2080</td>
<td align="left" valign="top">Synthesis 0.0954</td>
<td align="left" valign="top">Heart deficiency and timidity 0.1410</td>
</tr>
<tr>
<td align="left" valign="top">Anorexia 0.0051</td>
<td align="left" valign="top">Corydalis 0.2059</td>
<td align="left" valign="top">Consumer product 0.0933</td>
<td align="left" valign="top">Qi stagnation 0.1377</td>
</tr>
<tr>
<td align="left" valign="top">Dizziness 0.0045</td>
<td align="left" valign="top">Curcuma 0.2052</td>
<td align="left" valign="top">Transfer Qi 0.0924</td>
<td align="left" valign="top">Chill condensation 0.1354</td>
</tr>
<tr>
<td align="left" valign="top">Nausea, vomit 0.0043</td>
<td align="left" valign="top">Japonica rice 0.2034</td>
<td align="left" valign="top">Sweet moisturizing 0.0920</td>
<td align="left" valign="top">Colorectal hot and humid 0.1348</td>
</tr>
<tr>
<td align="left" valign="top">Irritability 0.0042</td>
<td align="left" valign="top">Notopterygium 0.2028</td>
<td align="left" valign="top">Resuscitation 0.0912</td>
<td align="left" valign="top">Alpine dysentery 0.1344</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="tfn3-etm-0-0-3285"><p>TCM, traditional Chinese medicine.</p></fn>
</table-wrap-foot>
</table-wrap>
</floats-group>
</article>
