Faust는 Windowing을 쉽게 지원할 수 있는 기능을 가지고 있다. Windowing을 통해 지난 10분간 데이터의 분석 혹은 매 5분마다 1시간 간격의 데이터 분석과 같은 내용을 수행할 수 있다. 이번 포스팅에서는 Windowing을 사용하는 방법에 대해 알아보고자 한다.
Faust는 Hopping, Tumbling window를 지원한다.
Tumbling Windows
Tumbling window는 중복되지 않은 데이터에 대해 일정간격으로 분석할때 사용된다.
from dataclasses import asdict, dataclass
from datetime import timedelta
import json
import random
import faust
@dataclass
class ClickEvent(faust.Record):
email: str
timestamp: str
uri: str
number: int
app = faust.App("exercise7", broker="kafka://localhost:9092")
clickevents_topic = app.topic("com.udacity.streams.clickevents", value_type=ClickEvent)
uri_summary_table = app.Table("uri_summary", default=int).tumbling(
timedelta(seconds=10)
) # Tumbling window 설정, 간격:10초
@app.agent(clickevents_topic)
async def clickevent(clickevents):
async for ce in clickevents.group_by(ClickEvent.uri):
uri_summary_table[ce.uri] += ce.number
print(f"{ce.uri}: {uri_summary_table[ce.uri].current()}")
if __name__ == "__main__":
app.main()
위 코드는 'com.udacity.streams.clickevents' topic에서 나온 데이터를 기준으로 10초 간격동안 uri의 group by당 개수를 uri_summary_table에 등록하여 print를 수행한다.
root@2f57ea8fd2c8:/home/workspace# python exercise6.7.solution.py worker
┌ƒaµS† v1.7.4─┬──────────────────────────────────────────┐
│ id │ exercise7 │
│ transport │ [URL('kafka://localhost:9092')] │
│ store │ memory: │
│ web │ http://localhost:6066/ │
│ log │ -stderr- (warn) │
│ pid │ 815 │
│ hostname │ 2f57ea8fd2c8 │
│ platform │ CPython 3.7.3 (Linux x86_64) │
│ drivers │ │
│ transport │ aiokafka=1.0.6 │
│ web │ aiohttp=3.6.2 │
│ datadir │ /home/workspace/exercise7-data │
│ appdir │ /home/workspace/exercise7-data/v1 │
└─────────────┴──────────────────────────────────────────┘
starting➢ ◣
😊
[2019-11-22 01:31:40,569: WARNING]: https://www.smith.com/main/tags/home/: 783
[2019-11-22 01:31:40,625: WARNING]: https://robinson.com/app/search.php: 148
[2019-11-22 01:31:40,627: WARNING]: http://martinez.com/tags/terms/: 267
[2019-11-22 01:31:40,639: WARNING]: https://www.gomez.com/blog/category/main/register.html: 753
[2019-11-22 01:31:40,641: WARNING]: https://sullivan-hamilton.com/tags/category/: 489
[2019-11-22 01:31:40,642: WARNING]: https://eaton-butler.com/blog/wp-content/tags/about.php: 989
[2019-11-22 01:31:40,643: WARNING]: https://weaver.com/post/: 281
[2019-11-22 01:31:40,644: WARNING]: https://marquez.com/faq/: 202
[2019-11-22 01:31:40,646: WARNING]: https://bradley.com/main.htm: 237
[2019-11-22 01:31:40,646: WARNING]: https://rios.com/main/tags/tag/privacy.html: 177
[2019-11-22 01:31:40,647: WARNING]: https://www.robinson-combs.com/: 327
[2019-11-22 01:31:40,648: WARNING]: https://west.com/register.html: 771
[2019-11-22 01:31:40,649: WARNING]: http://reyes.org/: 296
[2019-11-22 01:31:40,653: WARNING]: http://www.mcdaniel.com/categories/wp-content/post.php: 662
[2019-11-22 01:31:40,664: WARNING]: http://oliver-williams.com/author/: 718
[2019-11-22 01:31:40,665: WARNING]: https://www.martinez-haney.com/index.asp: 15
[2019-11-22 01:31:40,667: WARNING]: http://www.alvarez-smith.info/wp-content/terms.php: 517
[2019-11-22 01:31:40,668: WARNING]: https://www.garcia-smith.com/tag/categories/post/: 839
[2019-11-22 01:31:40,670: WARNING]: https://petersen.biz/: 199
[2019-11-22 01:31:40,671: WARNING]: https://guzman-zimmerman.com/posts/posts/search/homepage.jsp: 847
[2019-11-22 01:31:40,672: WARNING]: https://thomas.com/wp-content/blog/post/: 842
[2019-11-22 01:31:40,673: WARNING]: http://clark.net/: 599
[2019-11-22 01:31:40,674: WARNING]: http://www.mcclure.info/: 929
[2019-11-22 01:31:40,675: WARNING]: http://barrett-martin.net/category/search/main/about/: 402
[2019-11-22 01:31:40,676: WARNING]: https://www.farmer-deleon.com/posts/main/terms/: 202
[2019-11-22 01:31:40,677: WARNING]: https://thomas.com/wp-content/blog/post/: 1748
[2019-11-22 01:31:40,678: WARNING]: https://larson.info/faq.htm: 661
[2019-11-22 01:31:40,678: WARNING]: https://kline.biz/main.htm: 7
[2019-11-22 01:31:40,759: WARNING]: https://nichols-bailey.info/tags/blog/author/: 44
[2019-11-22 01:31:40,762: WARNING]: https://www.jones-gomez.biz/homepage/: 39
[2019-11-22 01:31:40,762: WARNING]: https://hood.com/search.html: 807
[2019-11-22 01:31:40,763: WARNING]: http://johnson.com/tag/explore/category/: 414
[2019-11-22 01:31:40,764: WARNING]: https://www.davis.info/register.asp: 677
[2019-11-22 01:31:40,766: WARNING]: http://www.wright.net/register/: 291
[2019-11-22 01:31:40,767: WARNING]: https://www.allen.info/blog/main/main/: 724
[2019-11-22 01:31:40,768: WARNING]: http://wilkins.com/: 242
[2019-11-22 01:31:40,769: WARNING]: http://www.moran-french.org/terms/: 357
[2019-11-22 01:31:40,770: WARNING]: https://bailey.org/: 124
[2019-11-22 01:31:40,771: WARNING]: http://chavez-thompson.info/main/: 190
[2019-11-22 01:31:40,773: WARNING]: http://barrett-martin.net/category/search/main/about/: 1221
[2019-11-22 01:31:40,774: WARNING]: https://www.lopez.net/category/index/: 959
[2019-11-22 01:31:40,775: WARNING]: https://www.price.org/list/categories/blog/terms.htm: 682
[2019-11-22 01:31:40,776: WARNING]: http://www.spears.net/home.html: 424
Hopping Windows
Hopping window는 지난 x시간 동안의 데이터를 y간격을 반복하여 데이터를 분석하는 것이다. y간격은 x시간보다 길 수 없으며 일부 데이터는 2개의 window간에 중복되어 처리된다.
from dataclasses import asdict, dataclass
from datetime import timedelta
import json
import random
import faust
@dataclass
class ClickEvent(faust.Record):
email: str
timestamp: str
uri: str
number: int
app = faust.App("exercise8", broker="kafka://localhost:9092")
clickevents_topic = app.topic("com.udacity.streams.clickevents", value_type=ClickEvent)
uri_summary_table = app.Table("uri_summary", default=int).hopping(
size=timedelta(minutes=1),
step=timedelta(seconds=10)
) # hopping window선언을 위한 size와 step
@app.agent(clickevents_topic)
async def clickevent(clickevents):
async for ce in clickevents.group_by(ClickEvent.uri):
uri_summary_table[ce.uri] += ce.number
print(f"{ce.uri}: {uri_summary_table[ce.uri].current()}")
if __name__ == "__main__":
app.main()
위 코드는 10초 간격으로 1분 동안의 데이터에 대해 분석하는 코드이다.
root@68f789ccd9f1:/home/workspace# python exercise6.8.solution.py worker
┌ƒaµS† v1.7.4─┬──────────────────────────────────────────┐
│ id │ exercise8 │
│ transport │ [URL('kafka://localhost:9092')] │
│ store │ memory: │
│ web │ http://localhost:6066/ │
│ log │ -stderr- (warn) │
│ pid │ 818 │
│ hostname │ 68f789ccd9f1 │
│ platform │ CPython 3.7.3 (Linux x86_64) │
│ drivers │ │
│ transport │ aiokafka=1.0.6 │
│ web │ aiohttp=3.6.2 │
│ datadir │ /home/workspace/exercise8-data │
│ appdir │ /home/workspace/exercise8-data/v1 │
└─────────────┴──────────────────────────────────────────┘
starting➢ ◠
😊
[2019-11-22 01:38:32,252: WARNING]: https://wise.com/: 953
[2019-11-22 01:38:32,317: WARNING]: https://www.coleman-edwards.com/: 214
[2019-11-22 01:38:32,321: WARNING]: http://www.calderon.com/index/: 559
[2019-11-22 01:38:32,323: WARNING]: https://khan-butler.info/app/about.html: 938
[2019-11-22 01:38:32,326: WARNING]: http://jackson.org/category/index/: 739
[2019-11-22 01:38:32,335: WARNING]: http://www.benton.com/explore/privacy.html: 377
[2019-11-22 01:38:32,340: WARNING]: http://herman.biz/search/: 871
[2019-11-22 01:38:32,342: WARNING]: https://burch-holt.com/post/: 335
[2019-11-22 01:38:32,344: WARNING]: https://www.molina-rivera.com/: 198
[2019-11-22 01:38:32,346: WARNING]: http://www.maldonado.info/main.php: 462
[2019-11-22 01:38:32,348: WARNING]: http://www.patterson.com/app/list/register.asp: 542
[2019-11-22 01:38:32,352: WARNING]: https://www.west.info/main/categories/posts/faq.htm: 676
[2019-11-22 01:38:32,355: WARNING]: http://white-miller.com/search/homepage/: 259
[2019-11-22 01:38:32,357: WARNING]: https://smith.com/login/: 498
[2019-11-22 01:38:32,360: WARNING]: http://mays.com/: 715
[2019-11-22 01:38:32,361: WARNING]: https://roberts.com/login.htm: 446
[2019-11-22 01:38:32,363: WARNING]: https://everett-lee.com/author/: 857
[2019-11-22 01:38:32,365: WARNING]: https://cruz.com/wp-content/app/login/: 250
[2019-11-22 01:38:32,368: WARNING]: http://www.lee.com/privacy.htm: 862
[2019-11-22 01:38:32,370: WARNING]: https://wallace.com/post.php: 456
[2019-11-22 01:38:32,372: WARNING]: https://www.atkins.com/search.jsp: 995
[2019-11-22 01:38:32,374: WARNING]: https://www.chan.com/search/main/blog/category/: 659
[2019-11-22 01:38:32,375: WARNING]: https://www.wright.com/category/login/: 461
[2019-11-22 01:38:32,377: WARNING]: https://martinez.biz/home/: 877
[2019-11-22 01:38:32,378: WARNING]: http://www.keith.com/terms/: 873
[2019-11-22 01:38:32,380: WARNING]: http://www.hamilton-lee.net/: 735
[2019-11-22 01:38:32,383: WARNING]: http://rice.org/index/: 946
[2019-11-22 01:38:32,385: WARNING]: http://www.carpenter.com/categories/blog/posts/author/: 497
[2019-11-22 01:38:32,387: WARNING]: https://chang.com/blog/explore/blog/terms/: 861
[2019-11-22 01:38:32,388: WARNING]: https://adams-brown.biz/explore/tag/tag/post.php: 293
[2019-11-22 01:38:32,390: WARNING]: https://ayala.com/: 622
[2019-11-22 01:38:32,392: WARNING]: https://www.wilson-harris.com/category/wp-content/terms.php: 790
[2019-11-22 01:38:32,395: WARNING]: https://erickson-hamilton.com/login.htm: 405
[2019-11-22 01:38:32,396: WARNING]: http://www.herrera-navarro.info/explore/explore/login/: 234
[2019-11-22 01:38:32,400: WARNING]: https://www.miller.com/homepage/: 45
[2019-11-22 01:38:32,402: WARNING]: https://www.morales.com/search.htm: 196
[2019-11-22 01:38:32,404: WARNING]: http://hardin.org/index.htm: 314
[2019-11-22 01:38:32,406: WARNING]: http://www.davenport-ryan.com/: 976
[2019-11-22 01:38:32,407: WARNING]: https://moore.info/main/wp-content/categories/search.html: 231
[2019-11-22 01:38:32,409: WARNING]: http://ross-gutierrez.org/: 569
[2019-11-22 01:38:32,410: WARNING]: https://jordan-oliver.net/tags/search/search/about.html: 895
[2019-11-22 01:38:32,412: WARNING]: http://www.smith.info/search/wp-content/category/privacy.php: 488
[2019-11-22 01:38:32,415: WARNING]: http://www.garcia.com/: 984
[2019-11-22 01:38:32,416: WARNING]: http://ibarra-reese.com/blog/wp-content/posts/homepage/: 964
[2019-11-22 01:38:32,418: WARNING]: http://www.hopkins.com/: 946
[2019-11-22 01:38:32,420: WARNING]: http://www.stephens-barnes.biz/about/: 389
[2019-11-22 01:38:32,421: WARNING]: https://www.molina-rivera.com/: 762
[2019-11-22 01:38:32,422: WARNING]: http://www.reynolds.com/search/app/app/index/: 141
[2019-11-22 01:38:32,425: WARNING]: https://www.dyer.info/explore/search/main/main/: 23
[2019-11-22 01:38:32,426: WARNING]: http://www.white-carpenter.com/explore/main/posts/home/: 627
[2019-11-22 01:38:32,428: WARNING]: https://juarez.com/categories/main/: 796
[2019-11-22 01:38:32,431: WARNING]: https://www.robles.com/app/homepage.html: 22
[2019-11-22 01:38:32,437: WARNING]: https://www.cline-wilson.biz/main.html: 909
[2019-11-22 01:38:32,448: WARNING]: http://www.fuentes-holland.com/blog/category/blog/category/: 811
[2019-11-22 01:38:32,450: WARNING]: http://www.maldonado.info/main.php: 595
[2019-11-22 01:38:32,452: WARNING]: http://bell.com/explore/posts/wp-content/category/: 261
반응형
'빅데이터' 카테고리의 다른 글
pyspark UDF(User Defined Functions) 만들기 방법 및 예제 (0) | 2020.02.13 |
---|---|
AvroFlumeEvent 포멧 java Decoding source (0) | 2020.01.31 |
빅데이터에서 사용하는 포멧 종류 및 설명 (0) | 2019.12.19 |
스트림 프로세싱 with Faust - Table (0) | 2019.11.21 |
스트림 프로세싱 with Faust - Processors, Operations (0) | 2019.11.21 |
스트림 프로세싱 with Faust - kafka consumer/producer (1) | 2019.11.21 |