"text": "'\n\twith 302875 stored elements in Compressed Sparse Column format>" "input": "#vectorize the matrix of text in articles\nX = vectorizer.fit_transform(section_corp)\nX", join them\n str_article = (\" \").join(article)\n # make list of article texts\n section_corp.append(str_article)\n # various labels for plotting\n joint_label.append(article + '-' + article)\n section_label.append(article)\n section_headline.append(article)\n", "input": "#prep data by labeling - helpful in plotting\nsection_corp = \nsection_headline = \nsection_label = \njoint_label = \n\n# for article arrays and labeling\nfor article in corpus:\n # some of the articles are split into multiple lists. "input": "#read in json file\ncorpus = json.load(open(\"articles_html1000.json\"))", "text": "CountVectorizer(analyzer=u'word', binary=False, charset=None,\n charset_error=None, decode_error=u'strict',\n dtype=, encoding=u'utf-8', input=u'content',\n lowercase=True, max_df=1.0, max_features=None, min_df=1,\n ngram_range=(1, 1), preprocessor=None, stop_words=None,\n strip_accents=None, token_pattern=u'(?u)\\\\b\\\\w\\\\w+\\\\b',\n tokenizer=None, vocabulary=None)" "input": "#initialize vectorizer\nvectorizer = CountVectorizer(min_df=1)\nvectorizer", "input": "from sklearn.feature_extraction.text import CountVectorizer", "input": "import pandas as pd\nimport numpy as np\nimport scipy.spatial\nimport json\nimport random",
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |