pdftools
pdftools是一个专门用来处理pdf文件的包 pdftools
pdf_text()
pdf_text()#将pdf每页返回成(return)成一个character vector.
> #举个例子
> a <- pdf_text("41375_2012_BFleu2012127_MOESM29_ESM.pdf")
> #查看pdf页数
> length(a)
[1] 23
> #看看第一页,不过好像不能读取公式
> a[1]
接着按照自己的需求提取pdf中的信息就好了~
#还是举个例子,我想提取第十页的gene symbol
> b <- pdf_text("41375_2012_BFleu2012127_MOESM29_ESM.pdf")
> #查看pdf页数
> length(b)
[1] 23
> #看看第一页,不过好像不能读取公式
> b[1]
[1] " SUPPLEMENTAL METHODS\n\nPatients in training dataset\nThe HOVON-65/GMMG-HD4 randomized clinical trial (ISRCTN64455289) consists of newly diagnosed,\ntransplant-eligible patients with multiple myeloma. Patients were randomly assigned to either bortezomib based\ntreatment or vincristine based treatment. Vincristine based treatment: three cycles of induction treatment with\nvincristine 0.4 mg intravenously on days 1-4, doxorubicin 9 mg/m² intravenously on days 1-4, and dexamethasone\n40 mg orally on days 1-4, 9-12, and 17-20; bortezomib based treatment: bortezomib 1.3 mg/m² intravenously on\ndays 1, 4, 8, and 11, doxorubicin 9 mg/m² intravenously on days 1-4, and dexamethasone 40 mg orally on days 1-4,\n9-12, and 17-20. Stem-cells were mobilized by use of cyclophosphamide 1000 mg/m² intravenously on day 1,\ndoxorubicin 15 mg/m² intravenously on days 1-4, dexamethasone 40 mg orally on days 1-4, and granulocyte colony-\nstimulating factor (filgrastim) 10 μg/kg per day subcutaneously, divided in two doses per day, from day 5 until last\nstem cell collection. A minimum of 2.5 × 106 CD34+ cells per transplantation procedure was required. After\ninduction therapy, patients received one (HOVON-65) or two (GMMG-HD4) cycles of high-dose melphalan (200\nmg/m² intravenously) with autologous stem-cell rescue followed by maintenance treatment with thalidomide (50 mg\nper day orally; group assigned to vincristine-based induction treatment) or bortezomib (1.3 mg/m² intravenously\nonce every 2 weeks; group assigned to bortezomib-based induction treatment) for 2 years. Treatment was not\nmasked for physicians and patients (see Figure S1).\nInformed consent to treatment protocols and sample procurement was obtained for all cases included in this study, in\naccordance with the Declaration of Helsinki. Use of diagnostic tumour material was approved by the institutional\nreview board of the Erasmus Medical Centre.\n\nPatients in validation datasets\nUAMS-TT2 is a randomized trial in which patients received thalidomide during all treatment phases (UAMS-TT2;\nn=351; GSE2658; NCT00573391).1 UAMS-TT3 is a similar regimen with the addition of bortezomib to the\nthalidomide arm (UAMS-TT3; n=142; E-TABM-1138; NCT00081939).2 The MRC-IX trial (n=247; GSE15695;\nISRCTN68454111) included both transplant-eligible and non-transplant-eligible newly diagnosed patients. For\ntransplant-eligible patients treatment consisted of induction high-dose therapy while non-transplant-eligible patients\nwere treated initially with either thalidomide or melphalan. Maintenance for both age classes was a comparison of\nthalidomide vs. no thalidomide.3, 4 The trial and dataset denoted here as APEX consisted of the three trials APEX,\nSUMMIT and CREST (n=264; GSE9782; registered under M34100-024, M34100-025 and\nNCT00049478/NCT00048230).5-8 The APEX trial included patients with relapsed myeloma who received either\nbortezomib or high-dose dexamethasone, with the possibility to cross-over to receive bortezomib after disease\nprogression.5 In the SUMMIT trial patients received bortezomib. In patients with a suboptimal response, oral\ndexamethasone was added to the regimen.8 The CREST trial included relapsed or refractory patients who received\nbortezomib. Dexamethasone was permitted in patients with progressive or stable disease. 7\n\nSurvival signature\nThe MAS5 normalized, log2 transformed and mean-variance scaled HOVON-65/GMMG-HD4 dataset was used as a\ntraining set for building a GEP based survival classifier.9, 10 The model was built using a Supervised Principal\nComponent Analysis (SPCA) framework.11 This technique is widely used in biological settings.12-19 The underlying\nassumption is the existence of a high-risk group which can be separated from a standard-risk group on the basis of\nprogression free survival. A Principal Component Analysis (PCA) is a rotation of a n m centered feature space\nX in such a way that the largest variance in the data is projected on the top principal components. 20 This rotation\ncan be described by a m m rotation matrix R pca\n\n X rot XR pca\n\nX rot is rotated in such a way that the first principal component (PC) is the axis that points in the direction\nexhibiting the largest variance. Every subsequent PC is perpendicular to all previous while capturing as much as\npossible of the remaining variance. SPCA is a PCA whereby the feature space has undergone a selection X sel . In\nthis study the initial selection is based on selecting the top probe sets that were ranked by a univariate Cox\nproportional hazard regression. This will result in high variance due to survival so it is likely survival is projected\nonto the top PC's on which a Cox proportional hazard regression is applied. This yields regression coefficients β.\nThe resulting model can be summarized as:\n\n\n 1\n"
> #提取第十页信息
> b[10]#杂乱无章!
[1] " SUPPLEMENTAL TABLES\n\nTable S1. EMC-92 gene signature. Probe sets are ordered by decreasing magnitude of weighting coefficient\n(beta)\n Weighting Symbol\nRank Probes GO-term/description1\n coefficient (beta)\n 1 202728_s_at -0.1105 LTBP1 negative regulation of TGFbeta receptor signaling\n 2 239054_at -0.1088 SFMBT1 regulation of transcription\n 3 208942_s_at -0.0997 SEC62 cotranslational protein targeting to membrane\n 4 208747_s_at -0.0874 C1S proteolysis\n 5 202542_s_at 0.0870 AIMP1 negative regulation of endothelial cell proliferation\n 6 214482_at 0.0861 ZBTB25 transcription\n 7 228416_at -0.0778 ACVR2A transmembrane receptor protein serine/threonine kinase signaling\n 8 217728_at 0.0773 S100A6 signal transduction\n 9 215177_s_at -0.0768 ITGA6 cell-substrate junction assembly\n 10 225601_at 0.0750 HMGB3 multicellular organismal development\n 11 207618_s_at 0.0746 BCS1L mitochondrion organization\n 12 231989_s_at 0.0730 LOC100271836 ---\n 13 202884_s_at 0.0714 PPP2R1B control of cell growth and division\n 14 231738_at 0.0686 PCDHB7 calcium-dependent cell-cell adhesion\n 15 238116_at 0.0661 DYNLRB2 microtubule-based movement\n 16 226218_at -0.0644 IL7R regulation of DNA recombination\n 17 202842_s_at -0.0626 DNAJB9 protein folding\n 18 208732_at -0.0618 RAB2A ER to Golgi vesicle-mediated transport\n 19 204379_s_at 0.0594 FGFR3 MAPKKK cascade\n 20 242180_at -0.0585 TSPAN16 cellular activation and adhesion\n 21 216473_x_at -0.0576 DUX4 regulation of transcription, DNA-dependent\n 22 209683_at -0.0561 FAM49A ---\n 23 219550_at 0.0559 ROBO3 axon guidance\n 24 223811_s_at 0.0556 SUN1 / GET4 cytoskeletal anchoring at nuclear membrane\n 25 202813_at 0.0548 TARBP1 regulation of transcription from RNA polymerase II promoter\n 26 212282_at 0.0530 TMEM97 cholesterol homeostasis\n 27 238780_s_at -0.0529 EST/ BX647543 ---\n 28 M97935_MA_at2 0.0525 STAT1 transcription from RNA polymerase II promoter\n 29 221041_s_at -0.0520 SLC17A5 anion transport\n 30 224009_x_at -0.0520 DHRS9 androgen metabolic process\n 31 214612_x_at 0.0496 MAGEA6 ---\n 32 208232_x_at -0.0493 --- ---\n 33 238662_at 0.0490 ATPBD4 ---\n 34 206204_at 0.0477 GRB14 signal transduction\n 35 233437_at 0.0446 GABRA4 transport\n 36 200875_s_at 0.0437 NOP56 rRNA processing\n 37 38158_at 0.0423 ESPL1 apoptosis\n 38 217548_at -0.0423 C15orf38 ---\n 39 220351_at 0.0420 CCRL1 chemotaxis\n 40 213002_at -0.0418 MARCKS actin filament crosslinking\n 41 243018_at 0.0407 EST/BE568408 ---\n 42 221755_at 0.0396 EHBP1L1 ---\n 43 208667_s_at -0.0390 ST13 protein folding\n 44 212055_at 0.0384 C18orf10 cytoskeleton\n 45 201292_at -0.0372 TOP2A DNA ligation\n 46 201102_s_at 0.0349 PFKL fructose 6-phosphate metabolic process\n 47 214150_x_at -0.0349 ATP6V0E1 proton transport\n 48 226742_at -0.0345 SAR1B transport\n 49 215181_at -0.0342 CDH22 cell adhesion\n 50 208904_s_at -0.0334 RPS28 rRNA processing\n\n 10\n"
> #去掉分隔符"\n"
> b[10] %>% str_split("\n")
[[1]][1] " SUPPLEMENTAL TABLES" [2] "" [3] "Table S1. EMC-92 gene signature. Probe sets are ordered by decreasing magnitude of weighting coefficient" [4] "(beta)" [5] " Weighting Symbol" [6] "Rank Probes GO-term/description1" [7] " coefficient (beta)" [8] " 1 202728_s_at -0.1105 LTBP1 negative regulation of TGFbeta receptor signaling" [9] " 2 239054_at -0.1088 SFMBT1 regulation of transcription"
[10] " 3 208942_s_at -0.0997 SEC62 cotranslational protein targeting to membrane"
[11] " 4 208747_s_at -0.0874 C1S proteolysis"
[12] " 5 202542_s_at 0.0870 AIMP1 negative regulation of endothelial cell proliferation"
[13] " 6 214482_at 0.0861 ZBTB25 transcription"
[14] " 7 228416_at -0.0778 ACVR2A transmembrane receptor protein serine/threonine kinase signaling"
[15] " 8 217728_at 0.0773 S100A6 signal transduction"
[16] " 9 215177_s_at -0.0768 ITGA6 cell-substrate junction assembly"
[17] " 10 225601_at 0.0750 HMGB3 multicellular organismal development"
[18] " 11 207618_s_at 0.0746 BCS1L mitochondrion organization"
[19] " 12 231989_s_at 0.0730 LOC100271836 ---"
[20] " 13 202884_s_at 0.0714 PPP2R1B control of cell growth and division"
[21] " 14 231738_at 0.0686 PCDHB7 calcium-dependent cell-cell adhesion"
[22] " 15 238116_at 0.0661 DYNLRB2 microtubule-based movement"
[23] " 16 226218_at -0.0644 IL7R regulation of DNA recombination"
[24] " 17 202842_s_at -0.0626 DNAJB9 protein folding"
[25] " 18 208732_at -0.0618 RAB2A ER to Golgi vesicle-mediated transport"
[26] " 19 204379_s_at 0.0594 FGFR3 MAPKKK cascade"
[27] " 20 242180_at -0.0585 TSPAN16 cellular activation and adhesion"
[28] " 21 216473_x_at -0.0576 DUX4 regulation of transcription, DNA-dependent"
[29] " 22 209683_at -0.0561 FAM49A ---"
[30] " 23 219550_at 0.0559 ROBO3 axon guidance"
[31] " 24 223811_s_at 0.0556 SUN1 / GET4 cytoskeletal anchoring at nuclear membrane"
[32] " 25 202813_at 0.0548 TARBP1 regulation of transcription from RNA polymerase II promoter"
[33] " 26 212282_at 0.0530 TMEM97 cholesterol homeostasis"
[34] " 27 238780_s_at -0.0529 EST/ BX647543 ---"
[35] " 28 M97935_MA_at2 0.0525 STAT1 transcription from RNA polymerase II promoter"
[36] " 29 221041_s_at -0.0520 SLC17A5 anion transport"
[37] " 30 224009_x_at -0.0520 DHRS9 androgen metabolic process"
[38] " 31 214612_x_at 0.0496 MAGEA6 ---"
[39] " 32 208232_x_at -0.0493 --- ---"
[40] " 33 238662_at 0.0490 ATPBD4 ---"
[41] " 34 206204_at 0.0477 GRB14 signal transduction"
[42] " 35 233437_at 0.0446 GABRA4 transport"
[43] " 36 200875_s_at 0.0437 NOP56 rRNA processing"
[44] " 37 38158_at 0.0423 ESPL1 apoptosis"
[45] " 38 217548_at -0.0423 C15orf38 ---"
[46] " 39 220351_at 0.0420 CCRL1 chemotaxis"
[47] " 40 213002_at -0.0418 MARCKS actin filament crosslinking"
[48] " 41 243018_at 0.0407 EST/BE568408 ---"
[49] " 42 221755_at 0.0396 EHBP1L1 ---"
[50] " 43 208667_s_at -0.0390 ST13 protein folding"
[51] " 44 212055_at 0.0384 C18orf10 cytoskeleton"
[52] " 45 201292_at -0.0372 TOP2A DNA ligation"
[53] " 46 201102_s_at 0.0349 PFKL fructose 6-phosphate metabolic process"
[54] " 47 214150_x_at -0.0349 ATP6V0E1 proton transport"
[55] " 48 226742_at -0.0345 SAR1B transport"
[56] " 49 215181_at -0.0342 CDH22 cell adhesion"
[57] " 50 208904_s_at -0.0334 RPS28 rRNA processing"
[58] ""
[59] " 10"
[60] "" > #变整齐了,再去掉空行
> b[10] %>% str_split("\n") %>% .[[1]] %>% .[-c(1:7)] %>% .[-c(51:53)][1] " 1 202728_s_at -0.1105 LTBP1 negative regulation of TGFbeta receptor signaling" [2] " 2 239054_at -0.1088 SFMBT1 regulation of transcription" [3] " 3 208942_s_at -0.0997 SEC62 cotranslational protein targeting to membrane" [4] " 4 208747_s_at -0.0874 C1S proteolysis" [5] " 5 202542_s_at 0.0870 AIMP1 negative regulation of endothelial cell proliferation" [6] " 6 214482_at 0.0861 ZBTB25 transcription" [7] " 7 228416_at -0.0778 ACVR2A transmembrane receptor protein serine/threonine kinase signaling"[8] " 8 217728_at 0.0773 S100A6 signal transduction" [9] " 9 215177_s_at -0.0768 ITGA6 cell-substrate junction assembly"
[10] " 10 225601_at 0.0750 HMGB3 multicellular organismal development"
[11] " 11 207618_s_at 0.0746 BCS1L mitochondrion organization"
[12] " 12 231989_s_at 0.0730 LOC100271836 ---"
[13] " 13 202884_s_at 0.0714 PPP2R1B control of cell growth and division"
[14] " 14 231738_at 0.0686 PCDHB7 calcium-dependent cell-cell adhesion"
[15] " 15 238116_at 0.0661 DYNLRB2 microtubule-based movement"
[16] " 16 226218_at -0.0644 IL7R regulation of DNA recombination"
[17] " 17 202842_s_at -0.0626 DNAJB9 protein folding"
[18] " 18 208732_at -0.0618 RAB2A ER to Golgi vesicle-mediated transport"
[19] " 19 204379_s_at 0.0594 FGFR3 MAPKKK cascade"
[20] " 20 242180_at -0.0585 TSPAN16 cellular activation and adhesion"
[21] " 21 216473_x_at -0.0576 DUX4 regulation of transcription, DNA-dependent"
[22] " 22 209683_at -0.0561 FAM49A ---"
[23] " 23 219550_at 0.0559 ROBO3 axon guidance"
[24] " 24 223811_s_at 0.0556 SUN1 / GET4 cytoskeletal anchoring at nuclear membrane"
[25] " 25 202813_at 0.0548 TARBP1 regulation of transcription from RNA polymerase II promoter"
[26] " 26 212282_at 0.0530 TMEM97 cholesterol homeostasis"
[27] " 27 238780_s_at -0.0529 EST/ BX647543 ---"
[28] " 28 M97935_MA_at2 0.0525 STAT1 transcription from RNA polymerase II promoter"
[29] " 29 221041_s_at -0.0520 SLC17A5 anion transport"
[30] " 30 224009_x_at -0.0520 DHRS9 androgen metabolic process"
[31] " 31 214612_x_at 0.0496 MAGEA6 ---"
[32] " 32 208232_x_at -0.0493 --- ---"
[33] " 33 238662_at 0.0490 ATPBD4 ---"
[34] " 34 206204_at 0.0477 GRB14 signal transduction"
[35] " 35 233437_at 0.0446 GABRA4 transport"
[36] " 36 200875_s_at 0.0437 NOP56 rRNA processing"
[37] " 37 38158_at 0.0423 ESPL1 apoptosis"
[38] " 38 217548_at -0.0423 C15orf38 ---"
[39] " 39 220351_at 0.0420 CCRL1 chemotaxis"
[40] " 40 213002_at -0.0418 MARCKS actin filament crosslinking"
[41] " 41 243018_at 0.0407 EST/BE568408 ---"
[42] " 42 221755_at 0.0396 EHBP1L1 ---"
[43] " 43 208667_s_at -0.0390 ST13 protein folding"
[44] " 44 212055_at 0.0384 C18orf10 cytoskeleton"
[45] " 45 201292_at -0.0372 TOP2A DNA ligation"
[46] " 46 201102_s_at 0.0349 PFKL fructose 6-phosphate metabolic process"
[47] " 47 214150_x_at -0.0349 ATP6V0E1 proton transport"
[48] " 48 226742_at -0.0345 SAR1B transport"
[49] " 49 215181_at -0.0342 CDH22 cell adhesion"
[50] " 50 208904_s_at -0.0334 RPS28 rRNA processing"
> #看着更规整了,再去掉空格
> b[10] %>% str_split("\n") %>% .[[1]] %>% .[-c(1:7)] %>% .[-c(51:53)] %>% str_split(" ")
[[1]][1] "" "" "1" "" "" "" [7] "202728_s_at" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "" "-0.1105" "" "" "" ""
[25] "" "" "" "" "" "LTBP1"
[31] "" "" "" "" "" ""
[37] "" "" "" "" "negative" "regulation"
[43] "of" "TGFbeta" "receptor" "signaling" [[2]][1] "" "" "2" "" "" [6] "" "239054_at" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "" ""
[21] "" "-0.1088" "" "" ""
[26] "" "" "" "" ""
[31] "" "SFMBT1" "" "" ""
[36] "" "" "" "" ""
[41] "" "regulation" "of" "transcription"[[3]][1] "" "" "3" "" [5] "" "" "208942_s_at" "" [9] "" "" "" ""
[13] "" "" "" ""
[17] "" "" "" "-0.0997"
[21] "" "" "" ""
[25] "" "" "" ""
[29] "" "SEC62" "" ""
[33] "" "" "" ""
[37] "" "" "" ""
[41] "cotranslational" "protein" "targeting" "to"
[45] "membrane" [[4]][1] "" "" "4" "" "" "" [7] "208747_s_at" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "" "-0.0874" "" "" "" ""
[25] "" "" "" "" "" "C1S"
[31] "" "" "" "" "" ""
[37] "" "" "" "" "" ""
[43] "proteolysis"[[5]][1] "" "" "5" "" "" [6] "" "202542_s_at" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "" ""
[21] "0.0870" "" "" "" ""
[26] "" "" "" "" ""
[31] "AIMP1" "" "" "" ""
[36] "" "" "" "" ""
[41] "" "negative" "regulation" "of" "endothelial"
[46] "cell" "proliferation"[[6]][1] "" "" "6" "" "" [6] "" "214482_at" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "" ""
[21] "" "" "0.0861" "" ""
[26] "" "" "" "" ""
[31] "" "" "ZBTB25" "" ""
[36] "" "" "" "" ""
[41] "" "" "transcription"[[7]][1] "" "" "7" "" [5] "" "" "228416_at" "" [9] "" "" "" ""
[13] "" "" "" ""
[17] "" "" "" ""
[21] "" "-0.0778" "" ""
[25] "" "" "" ""
[29] "" "" "" "ACVR2A"
[33] "" "" "" ""
[37] "" "" "" ""
[41] "" "transmembrane" "receptor" "protein"
[45] "serine/threonine" "kinase" "signaling" [[8]][1] "" "" "8" "" "" [6] "" "217728_at" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "" ""
[21] "" "" "0.0773" "" ""
[26] "" "" "" "" ""
[31] "" "" "S100A6" "" ""
[36] "" "" "" "" ""
[41] "" "" "signal" "transduction"[[9]][1] "" "" "9" "" "" [6] "" "215177_s_at" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "" "-0.0768"
[21] "" "" "" "" ""
[26] "" "" "" "" "ITGA6"
[31] "" "" "" "" ""
[36] "" "" "" "" ""
[41] "cell-substrate" "junction" "assembly" [[10]][1] "" "" "10" "" "" [6] "225601_at" "" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "" ""
[21] "" "0.0750" "" "" ""
[26] "" "" "" "" ""
[31] "" "HMGB3" "" "" ""
[36] "" "" "" "" ""
[41] "" "" "multicellular" "organismal" "development" [[11]][1] "" "" "11" "" "" [6] "207618_s_at" "" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "" "0.0746"
[21] "" "" "" "" ""
[26] "" "" "" "" "BCS1L"
[31] "" "" "" "" ""
[36] "" "" "" "" ""
[41] "mitochondrion" "organization" [[12]][1] "" "" "12" "" "" [6] "231989_s_at" "" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "" "0.0730"
[21] "" "" "" "" ""
[26] "" "" "" "" "LOC100271836"
[31] "" "" "" "---" [[13]][1] "" "" "13" "" "" "202884_s_at"[7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "" "0.0714" "" "" "" ""
[25] "" "" "" "" "" "PPP2R1B"
[31] "" "" "" "" "" ""
[37] "" "" "control" "of" "cell" "growth"
[43] "and" "division" [[14]][1] "" "" "14" "" [5] "" "231738_at" "" "" [9] "" "" "" ""
[13] "" "" "" ""
[17] "" "" "" ""
[21] "" "0.0686" "" ""
[25] "" "" "" ""
[29] "" "" "" "PCDHB7"
[33] "" "" "" ""
[37] "" "" "" ""
[41] "" "calcium-dependent" "cell-cell" "adhesion" [[15]][1] "" "" "15" "" [5] "" "238116_at" "" "" [9] "" "" "" ""
[13] "" "" "" ""
[17] "" "" "" ""
[21] "" "0.0661" "" ""
[25] "" "" "" ""
[29] "" "" "" "DYNLRB2"
[33] "" "" "" ""
[37] "" "" "" ""
[41] "microtubule-based" "movement" [[16]][1] "" "" "16" "" "" [6] "226218_at" "" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "" ""
[21] "-0.0644" "" "" "" ""
[26] "" "" "" "" ""
[31] "IL7R" "" "" "" ""
[36] "" "" "" "" ""
[41] "" "" "regulation" "of" "DNA"
[46] "recombination"[[17]][1] "" "" "17" "" "" "202842_s_at"[7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "-0.0626" "" "" "" "" ""
[25] "" "" "" "" "DNAJB9" ""
[31] "" "" "" "" "" ""
[37] "" "" "protein" "folding" [[18]][1] "" "" "18" "" [5] "" "208732_at" "" "" [9] "" "" "" ""
[13] "" "" "" ""
[17] "" "" "" ""
[21] "-0.0618" "" "" ""
[25] "" "" "" ""
[29] "" "" "RAB2A" ""
[33] "" "" "" ""
[37] "" "" "" ""
[41] "" "ER" "to" "Golgi"
[45] "vesicle-mediated" "transport" [[19]][1] "" "" "19" "" "" "204379_s_at"[7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "" "0.0594" "" "" "" ""
[25] "" "" "" "" "" "FGFR3"
[31] "" "" "" "" "" ""
[37] "" "" "" "" "MAPKKK" "cascade" [[20]][1] "" "" "20" "" "" "242180_at" [7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "" "" "-0.0585" "" "" ""
[25] "" "" "" "" "" ""
[31] "TSPAN16" "" "" "" "" ""
[37] "" "" "" "cellular" "activation" "and"
[43] "adhesion" [[21]][1] "" "" "21" "" "" [6] "216473_x_at" "" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "-0.0576" ""
[21] "" "" "" "" ""
[26] "" "" "" "DUX4" ""
[31] "" "" "" "" ""
[36] "" "" "" "" ""
[41] "regulation" "of" "transcription," "DNA-dependent" [[22]][1] "" "" "22" "" "" "209683_at" "" [8] "" "" "" "" "" "" ""
[15] "" "" "" "" "" "" "-0.0561"
[22] "" "" "" "" "" "" ""
[29] "" "" "FAM49A" "" "" "" ""
[36] "" "" "" "" "" "---" [[23]][1] "" "" "23" "" "" "219550_at" "" [8] "" "" "" "" "" "" ""
[15] "" "" "" "" "" "" ""
[22] "0.0559" "" "" "" "" "" ""
[29] "" "" "" "ROBO3" "" "" ""
[36] "" "" "" "" "" "" ""
[43] "axon" "guidance" [[24]][1] "" "" "24" "" "" [6] "223811_s_at" "" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "" "0.0556"
[21] "" "" "" "" ""
[26] "" "" "" "" "SUN1"
[31] "/" "GET4" "" "" ""
[36] "" "cytoskeletal" "anchoring" "at" "nuclear"
[41] "membrane" [[25]][1] "" "" "25" "" "" [6] "202813_at" "" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "" ""
[21] "" "0.0548" "" "" ""
[26] "" "" "" "" ""
[31] "" "TARBP1" "" "" ""
[36] "" "" "" "" ""
[41] "" "regulation" "of" "transcription" "from"
[46] "RNA" "polymerase" "II" "promoter" [[26]][1] "" "" "26" "" "" "212282_at" [7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "" "" "" "0.0530" "" ""
[25] "" "" "" "" "" ""
[31] "" "TMEM97" "" "" "" ""
[37] "" "" "" "" "" "cholesterol"
[43] "homeostasis"[[27]][1] "" "" "27" "" "" "238780_s_at"[7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "-0.0529" "" "" "" "" ""
[25] "" "" "" "" "EST/" "BX647543"
[31] "" "" "---" [[28]][1] "" "" "28" "" "" [6] "M97935_MA_at2" "" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "0.0525" "" ""
[21] "" "" "" "" ""
[26] "" "" "STAT1" "" ""
[31] "" "" "" "" ""
[36] "" "" "" "transcription" "from"
[41] "RNA" "polymerase" "II" "promoter" [[29]][1] "" "" "29" "" "" "221041_s_at"[7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "-0.0520" "" "" "" "" ""
[25] "" "" "" "" "SLC17A5" ""
[31] "" "" "" "" "" ""
[37] "" "anion" "transport" [[30]][1] "" "" "30" "" "" "224009_x_at"[7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "-0.0520" "" "" "" "" ""
[25] "" "" "" "" "DHRS9" ""
[31] "" "" "" "" "" ""
[37] "" "" "" "androgen" "metabolic" "process" [[31]][1] "" "" "31" "" "" "214612_x_at"[7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "" "0.0496" "" "" "" ""
[25] "" "" "" "" "" "MAGEA6"
[31] "" "" "" "" "" ""
[37] "" "" "" "---" [[32]][1] "" "" "32" "" "" "208232_x_at"[7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "-0.0493" "" "" "" "" ""
[25] "" "" "" "" "---" ""
[31] "" "" "" "" "" ""
[37] "" "" "" "" "" "---" [[33]][1] "" "" "33" "" "" "238662_at" "" [8] "" "" "" "" "" "" ""
[15] "" "" "" "" "" "" ""
[22] "0.0490" "" "" "" "" "" ""
[29] "" "" "" "ATPBD4" "" "" ""
[36] "" "" "" "" "" "" "---" [[34]][1] "" "" "34" "" "" [6] "206204_at" "" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "" ""
[21] "" "0.0477" "" "" ""
[26] "" "" "" "" ""
[31] "" "GRB14" "" "" ""
[36] "" "" "" "" ""
[41] "" "" "signal" "transduction"[[35]][1] "" "" "35" "" "" "233437_at" "" [8] "" "" "" "" "" "" ""
[15] "" "" "" "" "" "" ""
[22] "0.0446" "" "" "" "" "" ""
[29] "" "" "" "GABRA4" "" "" ""
[36] "" "" "" "" "" "" "transport"[[36]][1] "" "" "36" "" "" "200875_s_at"[7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "" "0.0437" "" "" "" ""
[25] "" "" "" "" "" "NOP56"
[31] "" "" "" "" "" ""
[37] "" "" "" "" "rRNA" "processing" [[37]][1] "" "" "37" "" "" "38158_at" "" [8] "" "" "" "" "" "" ""
[15] "" "" "" "" "" "" ""
[22] "" "0.0423" "" "" "" "" ""
[29] "" "" "" "" "ESPL1" "" ""
[36] "" "" "" "" "" "" ""
[43] "" "apoptosis"[[38]][1] "" "" "38" "" "" "217548_at" "" [8] "" "" "" "" "" "" ""
[15] "" "" "" "" "" "" "-0.0423"
[22] "" "" "" "" "" "" ""
[29] "" "" "C15orf38" "" "" "" ""
[36] "" "" "" "---" [[39]][1] "" "" "39" "" "" "220351_at" [7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "" "" "" "0.0420" "" ""
[25] "" "" "" "" "" ""
[31] "" "CCRL1" "" "" "" ""
[37] "" "" "" "" "" ""
[43] "chemotaxis"[[40]][1] "" "" "40" "" "" [6] "213002_at" "" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "" ""
[21] "-0.0418" "" "" "" ""
[26] "" "" "" "" ""
[31] "MARCKS" "" "" "" ""
[36] "" "" "" "" ""
[41] "actin" "filament" "crosslinking"[[41]][1] "" "" "41" "" "" [6] "243018_at" "" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "" ""
[21] "" "0.0407" "" "" ""
[26] "" "" "" "" ""
[31] "" "EST/BE568408" "" "" ""
[36] "---" [[42]][1] "" "" "42" "" "" "221755_at" "" [8] "" "" "" "" "" "" ""
[15] "" "" "" "" "" "" ""
[22] "0.0396" "" "" "" "" "" ""
[29] "" "" "" "EHBP1L1" "" "" ""
[36] "" "" "" "" "" "---" [[43]][1] "" "" "43" "" "" "208667_s_at"[7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "-0.0390" "" "" "" "" ""
[25] "" "" "" "" "ST13" ""
[31] "" "" "" "" "" ""
[37] "" "" "" "" "protein" "folding" [[44]][1] "" "" "44" "" "" [6] "212055_at" "" "" "" ""
[11] "" "" "" "" ""
[16] "" "" "" "" ""
[21] "" "0.0384" "" "" ""
[26] "" "" "" "" ""
[31] "" "C18orf10" "" "" ""
[36] "" "" "" "" "cytoskeleton"[[45]][1] "" "" "45" "" "" "201292_at" "" [8] "" "" "" "" "" "" ""
[15] "" "" "" "" "" "" "-0.0372"
[22] "" "" "" "" "" "" ""
[29] "" "" "TOP2A" "" "" "" ""
[36] "" "" "" "" "" "" "DNA"
[43] "ligation" [[46]][1] "" "" "46" "" "" "201102_s_at"[7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "" "0.0349" "" "" "" ""
[25] "" "" "" "" "" "PFKL"
[31] "" "" "" "" "" ""
[37] "" "" "" "" "" "fructose"
[43] "6-phosphate" "metabolic" "process" [[47]][1] "" "" "47" "" "" "214150_x_at"[7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "-0.0349" "" "" "" "" ""
[25] "" "" "" "" "ATP6V0E1" ""
[31] "" "" "" "" "" ""
[37] "proton" "transport" [[48]][1] "" "" "48" "" "" "226742_at" "" [8] "" "" "" "" "" "" ""
[15] "" "" "" "" "" "" "-0.0345"
[22] "" "" "" "" "" "" ""
[29] "" "" "SAR1B" "" "" "" ""
[36] "" "" "" "" "" "" "transport"[[49]][1] "" "" "49" "" "" "215181_at" "" [8] "" "" "" "" "" "" ""
[15] "" "" "" "" "" "" "-0.0342"
[22] "" "" "" "" "" "" ""
[29] "" "" "CDH22" "" "" "" ""
[36] "" "" "" "" "" "" "cell"
[43] "adhesion" [[50]][1] "" "" "50" "" "" "208904_s_at"[7] "" "" "" "" "" ""
[13] "" "" "" "" "" ""
[19] "-0.0334" "" "" "" "" ""
[25] "" "" "" "" "RPS28" ""
[31] "" "" "" "" "" ""
[37] "" "" "" "rRNA" "processing" > #只用把""去掉然后提取gene就好啦,可用for循环,也可以用lapply函数,道理都是相同的
> # gene <- list()
> # for (i in 1:50){
> # gene_name <- b2 %>% .[[i]] %>% .[.!=""] %>% .[4] # gene名排在第四个,根据不同的数据做不同的处理
> # gene <- rbind(gene,gene_name)
> # }
> b[10] %>% str_split("\n") %>% .[[1]] %>% .[-c(1:7)] %>% .[-c(51:53)] %>% str_split(" ") %>% lapply(\(x){x[x!=""]%>%.[4]})
[[1]]
[1] "LTBP1"[[2]]
[1] "SFMBT1"[[3]]
[1] "SEC62"[[4]]
[1] "C1S"[[5]]
[1] "AIMP1"[[6]]
[1] "ZBTB25"[[7]]
[1] "ACVR2A"[[8]]
[1] "S100A6"[[9]]
[1] "ITGA6"[[10]]
[1] "HMGB3"[[11]]
[1] "BCS1L"[[12]]
[1] "LOC100271836"[[13]]
[1] "PPP2R1B"[[14]]
[1] "PCDHB7"[[15]]
[1] "DYNLRB2"[[16]]
[1] "IL7R"[[17]]
[1] "DNAJB9"[[18]]
[1] "RAB2A"[[19]]
[1] "FGFR3"[[20]]
[1] "TSPAN16"[[21]]
[1] "DUX4"[[22]]
[1] "FAM49A"[[23]]
[1] "ROBO3"[[24]]
[1] "SUN1"[[25]]
[1] "TARBP1"[[26]]
[1] "TMEM97"[[27]]
[1] "EST/"[[28]]
[1] "STAT1"[[29]]
[1] "SLC17A5"[[30]]
[1] "DHRS9"[[31]]
[1] "MAGEA6"[[32]]
[1] "---"[[33]]
[1] "ATPBD4"[[34]]
[1] "GRB14"[[35]]
[1] "GABRA4"[[36]]
[1] "NOP56"[[37]]
[1] "ESPL1"[[38]]
[1] "C15orf38"[[39]]
[1] "CCRL1"[[40]]
[1] "MARCKS"[[41]]
[1] "EST/BE568408"[[42]]
[1] "EHBP1L1"[[43]]
[1] "ST13"[[44]]
[1] "C18orf10"[[45]]
[1] "TOP2A"[[46]]
[1] "PFKL"[[47]]
[1] "ATP6V0E1"[[48]]
[1] "SAR1B"[[49]]
[1] "CDH22"[[50]]
[1] "RPS28"
pdf_data()
pdf_data() 可将pdf每页返回为数据帧
pdf_render_page()
render into a raw bitmap array for further processing in R
pdf_convert()
High quality conversion of pdf page(s) to png, jpeg or tiff format